SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Exploring Capturable Everyday
Memory for Autobiographical
Authentication
Sauvik Das, Eiji Hayashi, Jason Hong
1
{sauvikda, ehayashi, jasonh}@cs.cmu.edu
Mobile Authentication Today
2
Mobile Authentication Is…
3
• Underutilized
– Up to 50% do not use
authentication.
• Outdated
– Ported from the
desktop world.
http://confidenttechnologies.com/files/Infographic%20small%20image.png
4
Smartphones know a lot about us.
5
We carry them everywhere.
6
They are our communication hubs.
7
They know where we are.
8
They know where we’re going.
9
We use them to browse the web.
10
…and to take photos.
11
And the apps! Oh, the apps we use!
Key Observation
12
Smartphone Logs Human Memory
Capturable Everyday Memory
Key Observation
• Capturable Everyday Memory can be used as
the basis for a series of autobiographical
challenge-response questions.
• This is what we call Autobiographical
Authentication.
13
14
What did you eat for lunch
yesterday?
15
What application did you use at
1pm?
16
Who did you call yesterday at
4pm?
CONTRIBUTIONS
17
A Model of Capturable Everyday Memory
• How well can users answer questions about
capturable everyday memory?
• What factors affect their performance?
18
A Framework for Autobiographical
Authentication
• How can we move from raw, noisy question-
answer responses to an authentication
decision?
• How do we handle inaccuracies/lapses in
human memory?
19
METHODOLOGY
20
Approach
• Ran 3 studies to get a handle on both
questions that could be asked and can be
asked.
• 2 Mturk studies based on self-report data.
• 1 Field study with ground truth data.
21
FIELD STUDY
22
myAuth
• Indexed “knowledge” on the phone:
– Sensor readings
– System-level content-providers
• Capable of asking 13 questions.
23
24
QType Description
FBApp What application did you use on <time>?
FBLoc Where were you on <time>?
FBOCall Who did you call on <time>?
FBInCall Who called you on <time>?
FBOSMS Who did you SMS message on <time>?
FBInSMS Who SMS messaged you on <time>?
FBIntSrc What did you search the internet for on <time>?
FBIntVis What website did you visit on <time>?
NAOSMS Name someone you SMS messaged in the last 24 hours.
NAInSMS Name someone who SMS messaged you in the last 24 hours.
NAOCall Name someone you called in the last 24 hours.
NAInCall Name someone who called you in the last 24 hours.
NAApp Name an application you used in the last 24 hours.
25
QType Description
FBApp What application did you use on <time>?
FBLoc Where were you on <time>?
FBOCall Who did you call on <time>?
FBInCall Who called you on <time>?
FBOSMS Who did you SMS message on <time>?
FBInSMS Who SMS messaged you on <time>?
FBIntSrc What did you search the internet for on <time>?
FBIntVis What website did you visit on <time>?
NAOSMS Name someone you SMS messaged in the last 24 hours.
NAInSMS Name someone who SMS messaged you in the last 24 hours.
NAOCall Name someone you called in the last 24 hours.
NAInCall Name someone who called you in the last 24 hours.
NAApp Name an application you used in the last 24 hours.
26
QType Description
FBApp What application did you use on <time>?
FBLoc Where were you on <time>?
FBOCall Who did you call on <time>?
FBInCall Who called you on <time>?
FBOSMS Who did you SMS message on <time>?
FBInSMS Who SMS messaged you on <time>?
FBIntSrc What did you search the internet for on <time>?
FBIntVis What website did you visit on <time>?
NAOSMS Name someone you SMS messaged in the last 24 hours.
NAInSMS Name someone who SMS messaged you in the last 24 hours.
NAOCall Name someone you called in the last 24 hours.
NAInCall Name someone who called you in the last 24 hours.
NAApp Name an application you used in the last 24 hours.
27
RecognitionRecall
Study Design
• Answer 5 questions a day for 14 days.
• Incentive to answer questions correctly.
• Could skip questions if they chose.
28
Descriptive Stats
• 24 users
– Average age: 25 (s.d. 6.25, range 18-43)
– 14 male (58%)
• 2167 question-answer responses collected
29
FIELD STUDY RESULTS
30
1381 questions answered
correctly (64%)
+
168 near misses (8%)
31
Recognition > Recall
68% recognition
questions answered
correctly vs. 62%
recall questions (p =
0.008).
32
Performance Stable Over Time
No difference between
first and last 20% of
responses (64.1% vs.
64.8%, p = 0.73).
33
Question Type Matters
34
Time Bucketing Does Not Help
No difference between
“Fact-Based” and
“Name-Any” questions
(64% vs. 63%, p=0.5).
35
AUTOBIOGRAPHICAL
AUTHENTICATION
36
Users only get 64% of answers correct.
But, performance is stable and errors
are systematic.
37
Given the systematic and stable nature of user
errors, we can make an authentication decision
given both correct and incorrect answers.
38
Confidence Estimator
39
C(u| seq,û)= P(u| seq,û)*S(seq |u)
Confidence that the
attempting authenticator is
the user. Range 0 – P(u).
Probability that the
observed sequence comes
from the user,
given an adversary model.
Value, from 0-1, that the
observed sequence of
responses matches what we
expect from the user.
User
model
Observed
question/answers
Adversary
model
AutoAuth takes in a sequence of
autobiographical question-answer responses
and an adversary model, and outputs a
confidence score from 0-P(u).
40
EVALUATION
41
Evaluation Plan
• Use field study data to run simulations.
– What confidence scores would users get versus
impersonators?
• Simulate 5 probable adversaries, weak and
strong.
42
5 Adversaries Simulated
Naïve Adversary
Observing Adversary
Always Correct Adversary
Empirical Observing Adversary
Empirical Knows Correct Adversary
43
Weakest
Strongest
Naïve Adversary
• Simulates complete stranger who steals
phone.
• Has a 1/10 chance of guessing the correct
answer. Random guess on recognition
question.
44
Always Correct Adversary
• Simulates stalker, or software that has
compromised the knowledge base.
• Always answers correctly.
45
Empirical Knows Correct Adversary
• Simulates adversary with all correct answers
and an understanding of how an average user
answers questions.
• Purposely gets certain questions wrong to
best simulate the “average” user.
46
USER PERFORMANCE
Evaluation Results
47
48
49
50
ADVERSARY PERFORMANCE
Evaluation Results
51
Against Naive
52
Against ACA
53
Against EKCA
54
Evaluation Take-Aways
• Users always get relatively high confidence
scores.
• We can easily defend against simple
adversaries.
• Advanced adversaries do better but can also
be detected when modeled against.
55
CONCLUSION
56
Summary
• People answered 64% of autobiographical
questions correctly, on average.
• Their errors can be signals, too!
• Autobiographical Authentication is promising
and robust against some tough adversaries.
57
Limitations
• Autobiographical Authentication is slow (22
seconds on average per question).
• Requires constant device usage to replenish
the knowledge base.
• Remains unclear how users will react to this
sort of authentication in practice.
58
Questions?
59
Practical Use Cases
• Password Reset.
• Scalable, Dynamic Authentication.
• Tiered Authentication.
60
Why might we want this?
• Scalable by context.
• Authentication is dynamic.
– Shoulder-surfing, social engineering, brute-force
attacks are all much harder to execute.
61
Usability Sanity Check
62
DERIVING THE AUTOAUTH
EQUATION
63
Systematic Response Error Model
Given an observed sequence of
responses, seq, and a user, u, we want
something like:
P(u | seq)
64
65
Using Bayes Law
P(u | seq) =
P(seq |u)P(u)
P(seq)
Prior probability that it is the user.
Probability that we would
observe this sequence of
responses from the user.
Overall probability that we would
observe this sequence of responses.
66
Adopting an Advesary
P(seq) is hard to compute perfectly: Requires knowledge of all possible
impersonators. But we can break it into two components.
P(seq) = P(seq |u)+P(seq |û)
Adversary Model
Modified Bayesian Equation
67
P(u | seq,û) =
P(seq |u)P(u)
P(seq | u)+ P(seq |û)
Problem: To Get a High Score…
68
P(u | seq,û) =
P(seq |u)P(u)
P(seq | u)+ P(seq |û)
Minimize this term.
Unlike Adversary Model Attack
A clever impersonator can get a high score
simply by being unlike the adversary
model, even if s/he is not like the user.
69
Add Bit-String Similarity
70
S(seq |u) =
n-| E(seq |u)-correct(seq)|
n
Number of responses.
Expected answer
correctness from user.
Actual answer correctness
from authenticator.
Bit string similarity.
Final Equation
71
C(u| seq,û)= P(u| seq,û)*S(seq |u)
Confidence that the
attempting authenticator is
the user. Range 0 – 100.
Probability that the
observed sequence
comes from the user,
given an adversary model.
Bit-string similarity between
the correctness of the
observed sequence and the
expected correctness of the
observed sequence.
Modeling Capturable Everyday
Memory
• Used a Mixed-Effects Logistic Regression
• Users as a random effect
– Each user had his/her own baseline likelihood of
getting a question correct (intercept).
72
Fixed Effect Description
Age Integer age.
Gender Male or Female.
Time to Answer Number of seconds it took the user to answer the question.
Time since Correct Answer Number of hours since the correct answer event occurred.
Day of Study The day number of the study (0-13).
Correct Answer Entropy The Shannon entropy of correct answers for this question type.
Answer Uniqueness Inverse of the percentage of times that the correct answer to
this question was the correct answer to this type of question.
Confidence Self-reported confidence in the answer (1-5).
Ease of Remember Answer Self-reported ease of remembering the answer (1-5).
Difficulty of Others Guessing Self-reported perceived difficulty of others guessing the answer
(1-5).
Answer Type Recognition or Recall.
Question Type The question type.
73
Model Coefficients
74
Unique Answers Hard to Remember
75
Systematic Response Error Model
• A simplified system:
– Two question types
– Training data
– An observed authentication attempt
76
QType Probability Correct
QT1 0.7
QT2 0.4
Training Data Probability
Distribution
# QType Correct?
1 QT1 Yes
2 QT2 No
Observed Question-Answer
Response Sequence
Systematic Response Error Model
• We can calculate the probability that we observe
a sequence of responses given the user:
77
QType Probability Correct
QT1 0.7
QT2 0.4
Training Data Probability
Distribution
# QType Correct?
1 QT1 Yes
2 QT2 No
Observed Question-Answer
Response Sequence
P(seq |u) = P(correct(QT1)|u)P(incorrect(QT2)|u)
= 0.7*(1-0.4)= 0.42
78
Adopting an Advesary
We can simplify the calculation of
P(seq) by adopting a specific
adversary model, û.
Location Question With Map
79

Weitere ähnliche Inhalte

Andere mochten auch

CASA: Context-Aware Scalable Authentication, at SOUPS 2013
CASA: Context-Aware Scalable Authentication, at SOUPS 2013CASA: Context-Aware Scalable Authentication, at SOUPS 2013
CASA: Context-Aware Scalable Authentication, at SOUPS 2013Jason Hong
 
Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...
Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...
Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...Sauvik Das
 
A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...
A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...
A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...Sauvik Das
 
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...Sauvik Das
 
Examining Game World Topology Personalization
Examining Game World Topology PersonalizationExamining Game World Topology Personalization
Examining Game World Topology PersonalizationSauvik Das
 
Revival Actions in a Shooter Game
Revival Actions in a Shooter GameRevival Actions in a Shooter Game
Revival Actions in a Shooter GameSauvik Das
 
Self-Censorship on Facebook
Self-Censorship on FacebookSelf-Censorship on Facebook
Self-Censorship on FacebookSauvik Das
 

Andere mochten auch (7)

CASA: Context-Aware Scalable Authentication, at SOUPS 2013
CASA: Context-Aware Scalable Authentication, at SOUPS 2013CASA: Context-Aware Scalable Authentication, at SOUPS 2013
CASA: Context-Aware Scalable Authentication, at SOUPS 2013
 
Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...
Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...
Testing Computer-Assisted Mnemonics and Feedback for Fast Memorization of Hig...
 
A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...
A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...
A Market In Your Social Network: The Effect of Extrinsic Rewards on Friendsou...
 
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...
 
Examining Game World Topology Personalization
Examining Game World Topology PersonalizationExamining Game World Topology Personalization
Examining Game World Topology Personalization
 
Revival Actions in a Shooter Game
Revival Actions in a Shooter GameRevival Actions in a Shooter Game
Revival Actions in a Shooter Game
 
Self-Censorship on Facebook
Self-Censorship on FacebookSelf-Censorship on Facebook
Self-Censorship on Facebook
 

Ähnlich wie Exploring Capturable Everyday Memory for Autobiographical Authentication, at Ubicomp 2013

Bug debug keynote - Present problems and future solutions
Bug debug keynote - Present problems and future solutionsBug debug keynote - Present problems and future solutions
Bug debug keynote - Present problems and future solutionsRIA RUI Society
 
Learning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsLearning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsClaudia Hauff
 
1345 track 1 chen_using our laptop
1345 track 1 chen_using our laptop1345 track 1 chen_using our laptop
1345 track 1 chen_using our laptopRising Media, Inc.
 
Rinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningRinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningAnna Chaney
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008vipulkocher
 
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...University of Geneva
 
Brightfind world usability day 2016 full deck final
Brightfind world usability day 2016   full deck finalBrightfind world usability day 2016   full deck final
Brightfind world usability day 2016 full deck finalBrightfind
 
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...UXPA International
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social MediaJeffrey Nichols
 
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...Distilled
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Daniel Austin
 
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...DesignHammer
 
System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...Michelle Zhou
 
Implementing Crowdsourced Testing
Implementing Crowdsourced TestingImplementing Crowdsourced Testing
Implementing Crowdsourced TestingTechWell
 
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4Anthony D. Paul
 
Organizing Your First Website Usability Test - WordCamp Toronto 2016
Organizing Your First Website Usability Test - WordCamp Toronto 2016Organizing Your First Website Usability Test - WordCamp Toronto 2016
Organizing Your First Website Usability Test - WordCamp Toronto 2016Anthony D. Paul
 

Ähnlich wie Exploring Capturable Everyday Memory for Autobiographical Authentication, at Ubicomp 2013 (20)

Bug debug keynote - Present problems and future solutions
Bug debug keynote - Present problems and future solutionsBug debug keynote - Present problems and future solutions
Bug debug keynote - Present problems and future solutions
 
Learning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsLearning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestions
 
1345 track 1 chen_using our laptop
1345 track 1 chen_using our laptop1345 track 1 chen_using our laptop
1345 track 1 chen_using our laptop
 
ARlab RESEARCH | Social search
ARlab RESEARCH | Social searchARlab RESEARCH | Social search
ARlab RESEARCH | Social search
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 
Rinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningRinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine Learning
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008
 
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
 
Brightfind world usability day 2016 full deck final
Brightfind world usability day 2016   full deck finalBrightfind world usability day 2016   full deck final
Brightfind world usability day 2016 full deck final
 
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social Media
 
Intranet Usability Testing
Intranet Usability TestingIntranet Usability Testing
Intranet Usability Testing
 
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
 
System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...System U: Computational Discovery of Personality Traits from Social Media for...
System U: Computational Discovery of Personality Traits from Social Media for...
 
Implementing Crowdsourced Testing
Implementing Crowdsourced TestingImplementing Crowdsourced Testing
Implementing Crowdsourced Testing
 
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4
 
Organizing Your First Website Usability Test - WordCamp Toronto 2016
Organizing Your First Website Usability Test - WordCamp Toronto 2016Organizing Your First Website Usability Test - WordCamp Toronto 2016
Organizing Your First Website Usability Test - WordCamp Toronto 2016
 

Kürzlich hochgeladen

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Exploring Capturable Everyday Memory for Autobiographical Authentication, at Ubicomp 2013

  • 1. Exploring Capturable Everyday Memory for Autobiographical Authentication Sauvik Das, Eiji Hayashi, Jason Hong 1 {sauvikda, ehayashi, jasonh}@cs.cmu.edu
  • 3. Mobile Authentication Is… 3 • Underutilized – Up to 50% do not use authentication. • Outdated – Ported from the desktop world. http://confidenttechnologies.com/files/Infographic%20small%20image.png
  • 4. 4 Smartphones know a lot about us.
  • 5. 5 We carry them everywhere.
  • 6. 6 They are our communication hubs.
  • 8. 8 They know where we’re going.
  • 9. 9 We use them to browse the web.
  • 10. 10 …and to take photos.
  • 11. 11 And the apps! Oh, the apps we use!
  • 12. Key Observation 12 Smartphone Logs Human Memory Capturable Everyday Memory
  • 13. Key Observation • Capturable Everyday Memory can be used as the basis for a series of autobiographical challenge-response questions. • This is what we call Autobiographical Authentication. 13
  • 14. 14 What did you eat for lunch yesterday?
  • 15. 15 What application did you use at 1pm?
  • 16. 16 Who did you call yesterday at 4pm?
  • 18. A Model of Capturable Everyday Memory • How well can users answer questions about capturable everyday memory? • What factors affect their performance? 18
  • 19. A Framework for Autobiographical Authentication • How can we move from raw, noisy question- answer responses to an authentication decision? • How do we handle inaccuracies/lapses in human memory? 19
  • 21. Approach • Ran 3 studies to get a handle on both questions that could be asked and can be asked. • 2 Mturk studies based on self-report data. • 1 Field study with ground truth data. 21
  • 23. myAuth • Indexed “knowledge” on the phone: – Sensor readings – System-level content-providers • Capable of asking 13 questions. 23
  • 24. 24 QType Description FBApp What application did you use on <time>? FBLoc Where were you on <time>? FBOCall Who did you call on <time>? FBInCall Who called you on <time>? FBOSMS Who did you SMS message on <time>? FBInSMS Who SMS messaged you on <time>? FBIntSrc What did you search the internet for on <time>? FBIntVis What website did you visit on <time>? NAOSMS Name someone you SMS messaged in the last 24 hours. NAInSMS Name someone who SMS messaged you in the last 24 hours. NAOCall Name someone you called in the last 24 hours. NAInCall Name someone who called you in the last 24 hours. NAApp Name an application you used in the last 24 hours.
  • 25. 25 QType Description FBApp What application did you use on <time>? FBLoc Where were you on <time>? FBOCall Who did you call on <time>? FBInCall Who called you on <time>? FBOSMS Who did you SMS message on <time>? FBInSMS Who SMS messaged you on <time>? FBIntSrc What did you search the internet for on <time>? FBIntVis What website did you visit on <time>? NAOSMS Name someone you SMS messaged in the last 24 hours. NAInSMS Name someone who SMS messaged you in the last 24 hours. NAOCall Name someone you called in the last 24 hours. NAInCall Name someone who called you in the last 24 hours. NAApp Name an application you used in the last 24 hours.
  • 26. 26 QType Description FBApp What application did you use on <time>? FBLoc Where were you on <time>? FBOCall Who did you call on <time>? FBInCall Who called you on <time>? FBOSMS Who did you SMS message on <time>? FBInSMS Who SMS messaged you on <time>? FBIntSrc What did you search the internet for on <time>? FBIntVis What website did you visit on <time>? NAOSMS Name someone you SMS messaged in the last 24 hours. NAInSMS Name someone who SMS messaged you in the last 24 hours. NAOCall Name someone you called in the last 24 hours. NAInCall Name someone who called you in the last 24 hours. NAApp Name an application you used in the last 24 hours.
  • 28. Study Design • Answer 5 questions a day for 14 days. • Incentive to answer questions correctly. • Could skip questions if they chose. 28
  • 29. Descriptive Stats • 24 users – Average age: 25 (s.d. 6.25, range 18-43) – 14 male (58%) • 2167 question-answer responses collected 29
  • 31. 1381 questions answered correctly (64%) + 168 near misses (8%) 31
  • 32. Recognition > Recall 68% recognition questions answered correctly vs. 62% recall questions (p = 0.008). 32
  • 33. Performance Stable Over Time No difference between first and last 20% of responses (64.1% vs. 64.8%, p = 0.73). 33
  • 35. Time Bucketing Does Not Help No difference between “Fact-Based” and “Name-Any” questions (64% vs. 63%, p=0.5). 35
  • 37. Users only get 64% of answers correct. But, performance is stable and errors are systematic. 37
  • 38. Given the systematic and stable nature of user errors, we can make an authentication decision given both correct and incorrect answers. 38
  • 39. Confidence Estimator 39 C(u| seq,û)= P(u| seq,û)*S(seq |u) Confidence that the attempting authenticator is the user. Range 0 – P(u). Probability that the observed sequence comes from the user, given an adversary model. Value, from 0-1, that the observed sequence of responses matches what we expect from the user. User model Observed question/answers Adversary model
  • 40. AutoAuth takes in a sequence of autobiographical question-answer responses and an adversary model, and outputs a confidence score from 0-P(u). 40
  • 42. Evaluation Plan • Use field study data to run simulations. – What confidence scores would users get versus impersonators? • Simulate 5 probable adversaries, weak and strong. 42
  • 43. 5 Adversaries Simulated Naïve Adversary Observing Adversary Always Correct Adversary Empirical Observing Adversary Empirical Knows Correct Adversary 43 Weakest Strongest
  • 44. Naïve Adversary • Simulates complete stranger who steals phone. • Has a 1/10 chance of guessing the correct answer. Random guess on recognition question. 44
  • 45. Always Correct Adversary • Simulates stalker, or software that has compromised the knowledge base. • Always answers correctly. 45
  • 46. Empirical Knows Correct Adversary • Simulates adversary with all correct answers and an understanding of how an average user answers questions. • Purposely gets certain questions wrong to best simulate the “average” user. 46
  • 48. 48
  • 49. 49
  • 50. 50
  • 55. Evaluation Take-Aways • Users always get relatively high confidence scores. • We can easily defend against simple adversaries. • Advanced adversaries do better but can also be detected when modeled against. 55
  • 57. Summary • People answered 64% of autobiographical questions correctly, on average. • Their errors can be signals, too! • Autobiographical Authentication is promising and robust against some tough adversaries. 57
  • 58. Limitations • Autobiographical Authentication is slow (22 seconds on average per question). • Requires constant device usage to replenish the knowledge base. • Remains unclear how users will react to this sort of authentication in practice. 58
  • 60. Practical Use Cases • Password Reset. • Scalable, Dynamic Authentication. • Tiered Authentication. 60
  • 61. Why might we want this? • Scalable by context. • Authentication is dynamic. – Shoulder-surfing, social engineering, brute-force attacks are all much harder to execute. 61
  • 64. Systematic Response Error Model Given an observed sequence of responses, seq, and a user, u, we want something like: P(u | seq) 64
  • 65. 65 Using Bayes Law P(u | seq) = P(seq |u)P(u) P(seq) Prior probability that it is the user. Probability that we would observe this sequence of responses from the user. Overall probability that we would observe this sequence of responses.
  • 66. 66 Adopting an Advesary P(seq) is hard to compute perfectly: Requires knowledge of all possible impersonators. But we can break it into two components. P(seq) = P(seq |u)+P(seq |û) Adversary Model
  • 67. Modified Bayesian Equation 67 P(u | seq,û) = P(seq |u)P(u) P(seq | u)+ P(seq |û)
  • 68. Problem: To Get a High Score… 68 P(u | seq,û) = P(seq |u)P(u) P(seq | u)+ P(seq |û) Minimize this term.
  • 69. Unlike Adversary Model Attack A clever impersonator can get a high score simply by being unlike the adversary model, even if s/he is not like the user. 69
  • 70. Add Bit-String Similarity 70 S(seq |u) = n-| E(seq |u)-correct(seq)| n Number of responses. Expected answer correctness from user. Actual answer correctness from authenticator. Bit string similarity.
  • 71. Final Equation 71 C(u| seq,û)= P(u| seq,û)*S(seq |u) Confidence that the attempting authenticator is the user. Range 0 – 100. Probability that the observed sequence comes from the user, given an adversary model. Bit-string similarity between the correctness of the observed sequence and the expected correctness of the observed sequence.
  • 72. Modeling Capturable Everyday Memory • Used a Mixed-Effects Logistic Regression • Users as a random effect – Each user had his/her own baseline likelihood of getting a question correct (intercept). 72
  • 73. Fixed Effect Description Age Integer age. Gender Male or Female. Time to Answer Number of seconds it took the user to answer the question. Time since Correct Answer Number of hours since the correct answer event occurred. Day of Study The day number of the study (0-13). Correct Answer Entropy The Shannon entropy of correct answers for this question type. Answer Uniqueness Inverse of the percentage of times that the correct answer to this question was the correct answer to this type of question. Confidence Self-reported confidence in the answer (1-5). Ease of Remember Answer Self-reported ease of remembering the answer (1-5). Difficulty of Others Guessing Self-reported perceived difficulty of others guessing the answer (1-5). Answer Type Recognition or Recall. Question Type The question type. 73
  • 75. Unique Answers Hard to Remember 75
  • 76. Systematic Response Error Model • A simplified system: – Two question types – Training data – An observed authentication attempt 76 QType Probability Correct QT1 0.7 QT2 0.4 Training Data Probability Distribution # QType Correct? 1 QT1 Yes 2 QT2 No Observed Question-Answer Response Sequence
  • 77. Systematic Response Error Model • We can calculate the probability that we observe a sequence of responses given the user: 77 QType Probability Correct QT1 0.7 QT2 0.4 Training Data Probability Distribution # QType Correct? 1 QT1 Yes 2 QT2 No Observed Question-Answer Response Sequence P(seq |u) = P(correct(QT1)|u)P(incorrect(QT2)|u) = 0.7*(1-0.4)= 0.42
  • 78. 78 Adopting an Advesary We can simplify the calculation of P(seq) by adopting a specific adversary model, û.

Hinweis der Redaktion

  1. Today, I want to talk about a new form of authentication for smartphones: Autobiographical Authentication.Autobiographical Authentication is rooted in the observation that we can use shared information between user and smartphone as the basis for identifying users.But, you might wonder, why do this at all?
  2. Welllll, no.Presently, mobile authentication looks a lot like this: PINs, passwords, or graphical passwords—all forms ofAuthentication that have worked well in and been ported from the desktop and laptop worlds.
  3. But, while smartphones are increasingly used for security sensitive tasks, as many as 50% of users do not usethese forms of authentication on their phones.One problem stifling adoption might be that, authentication, today, is generally a non-mobile experience. Perhaps we can do better by using the extra bits of information provided by smartphone sensors. But,how?
  4. Well, smartphones capture incredibly rich information about our day to day lives. Frankly, they know a lot about us.Think about it:
  5. We carry them everywhere, and use them in almost all situations.
  6. They are our primary means of remote communication with friends, loved ones, work colleagues and even strangers.
  7. They are actuated with rich sensors that allows them to know where we are at any give time,
  8. and where we’re going.
  9. We also use them extensively, and sometimes exclusively, to browse the web.
  10. And to take photos of people browsing the web.
  11. And the apps. Between ToDo Lists, Games, Maps, and Mail, there’s little about our lives that isn’t encoded somewhere in our phones.
  12. So, yeah. Smartphones know a lot about us.In turn, human memory might accurately recall some subset of these rich smartphone logs. This intersection, is what I refer to as“capturable everyday memory.”
  13. In this project, I wanted to explore to what extent capturable everyday memory can be used as the basis for authentication.How? By asking users a series of questions about their day to day experiences, such as:
  14. “what did you eat for lunch yesterday?”,
  15. “What application did you use at 1pm?”,
  16. “who did you call yesterday at 4pm?”. This is what I call Autobiographical Authentication. But that’sA mouthful, so I’m going to abbreviate it to AutoAuth for this presentation.
  17. So, in pursuit of this grand vision, I offer two contributions through the present work.
  18. First, I offer a model of capturable everyday memory. I answer how well can users answer questions about capturable everyday memory and what factors affect its retention.There is a plethora of past work on everyday autobiographical memory, and I use this to inform the factors I consider in my model (for example, age, gender, and answer uniqueness).But, as far as I know, no work before mine has tried to model the intersection of smartphone logs and everyday memory.
  19. Second, I offer a framework for autoauththat allows for natural user error.In other words, I answer the question: how do we move from raw, noisy question/answer responses to an authentication decision?Human memory is not perfect. For autoauth to work, then, Inaccuracies should be expected, and handled gracefully. While prior work has noted the promise of using dynamic autobiographical questions for authentication, no work to my knowledgehas actually developed an end to end framework for dealing with error and making authentication decisions.
  20. So that’s neat, but what did I actually do?
  21. Our approach was two-pronged. To get a broad understanding of the questions that could potentially be asked,I ran two open-ended mechanical turk studies based on self-report data. Next, to get a more detailed, rigorous handle on the questions that can actually be asked based on present sensors and system logs, I deployed an Android app that actually indexed ground truth smartphone logs and generated questions based on them.For today’s presentation, I will focus only on the results of the field study. Please refer to the paper for the Mturk results.
  22. So let’s talk a little more about the field study it self.
  23. The app I built, myAuth, indexed just about all universally accessible information on Android phones, including pertinent sensor readings (such as location sensor readings),app usage information and user’s call and sms logs, and web browsing and search history.From this information, myAuth was able to ask 13 different questions.
  24. These are the 13 questions, along with a corresponding shortened code for easy reference. These questions were selected because they were considered memorable in the Mturk self-report studies.
  25. You might notice that the first 8 questions have a dynamically generated time tag. These questions are what I call specific fact based questions, or just fact-based questions for short. The abbreviations for these questions are prefixedwith an FB.These questions were generated with a specific answer in mind. For example, “Who did you SMS message on Wednesday, August 21st at 1:34pm?” There is only one correct answer to this question.
  26. On the other hand, the last five questions have no such time tag. These questions are what I call“name-any” questions, and have abbreviations prefixed with an NA. For these questions, any correspondingevent that occurred within the past 24 hours was considered a correct answer. For example,Consider the question “Name someone you SMS messged in the past 24 hours.
  27. Fact-based questions could also be recognition or recall questions, or multiple-choice vs free-response questions.For recall questions, users had to enter an open-ended response to answer the question. For recognition questions,Users had to select the correct answer out of 10 answers, including some near-miss distractors. Near-miss distractors were answers that were almost right, for example, the name of someone you text messaged at 1:20 instead of 1:34.
  28. Moving on to the study design, participants had to answer at least 5 questions a day for 14 days and were given monetary incentives to answer questions correctly.They were also able to skip questions they found uncomfortable answering.
  29. I recruited 30 users from CBDR, and 24 of them actually followed through with the study. Participants were instructed to download myAuth through Google Play.Participants were, on average, 25 years old and 14 of them were male.Over the course of the 2 weeks, these 24 participants generated 2167 question-answer responses.
  30. Now to the interesting stuff, the results! Note that there is a much more detailed description of the analysis and methods usedTo model capturable everyday memory In the paper. Here, I focus on high-level takeaways that directly inform AutoAuth.
  31. First, the question on everyone’s mind: how well could users answer these questions?1381 out of the 2167 responses, or 64%, were answered correctly with an additional 168 near misses.As expected, there participants performed well but not perfectly. They generally get questions right, but it is hardlyUncommon for them to get questions wrong.
  32. Moving on, the next question we wanted to answer was how much better is recognition than recall, as recall questions have betterSecurity properties.Unsurprisingly, multiple choice questions are answered significantly more accurately than free-response questions,at 68% versus 62%.Even so, 68% accuracy is hardly stellar. There is still substantial inaccuracy that will need to be dealt with in making authentication decisions.
  33. We also wanted to answer whether or not there was a substantial learning effect. In other words,Even if users are initially bad at answering these questions, do they get better at it?In fact, performance appeared to be stable over the short term, with a negligible learning effect.Indeed, there was no statistically significant difference between the accuracy of the first 20% and the last 20% of responses by users.
  34. Next, we wanted a better handle on how well users could answer different questions(click)It seems that fact-based questions about Incoming Calls and Outgoing SMS Messages were answered most correctly at just over 80%,(click)While questions about what users searched for on the internet or what application they were using at a specific time were answeredCorrectly at much lower rates: just 28% for application questions, and 18% for web search questions.(click)In general, it seems that users are good at answering questions about communication,(click)but bad at answering questions about phone and web usage.
  35. Finally, we wanted to know how much does time-bucketing help, as fact-based questions have better security properties.But, surprisingly, time-bucketing did not help. There was no statistically significant difference between how often users got fact-based and name-any questions correct.This is good news from a security perspective.
  36. So how can we turn all of what we just learned into an authentication system?
  37. Well, we know the straightforward approach would not work. AutoAuth is not like passwords, and we cannot expect that a user willAnswer every question presented correctly.On the other hand, users’ performance appears to be stable over the short term, and it seems that users make systematic, predictableerrors.
  38. And there is the key insight. Given the systematic nature of user errors, incorrect answers can also be a signal.In other words, we can use both correct and incorrect answers as indicators that the attempting authenticator is, in fact, the user.A simple example would be that if I generally get questions about location correct, but questions about outgoing SMS messagesIncorrect, autoauth should look for that pattern in my responses.
  39. So, the operationalization of that insight is this mess that we call the confidence estimator: the cornerstone of AutoAuth.Note that my goal here is to simply provide you with high-level intuition. The details of how we arrived at this equation are specificallyoutlined in the paper; here I’ll just summarize each of these terms. (click)(click)Our estimator takes in three parameters: a user model derived from the training data, u; the observed sequence of question/answer responsesFrom an authentication session, seq; and a notion of an ‘adversary’, u hat. The adversary model is a computational representation of anImpersonator that we believe might be trying to crack the system.(click)Given these three parameters, our estimator outputs a confidence estimate from 0 to the prior probability that the attempting authenticator is the user, We compute this estimate as the product of two terms: one that encapsulates the relative likelihood thatwhat we observe comes from the user and not the adversary, and a second that encapsulates how well what we observematches with what we expect from the user’s prior performance.(click)More specifically, the first term is a Bayesian model that encapsulates the likelihood that seq comes from the userand not the adversary. We need an adversary model for this term because in order to compute the likelihoodthat an observed sequence of responses comes from the user, we also need the likelihood that it does *not*come from the user. Our adversary model is simply a simplification all possible definitions of “not user”.(click)Finally, the third term encapsulates how well what we observe matches what we expect, given that the attemptingAuthenticator is the user. For example, if a user generally gets questions about location correct, we expect theuser to continue to get these questions correct in the observed sequence. If that is what we observe, then thisValue will be high. If what we expect is not what we observe, this value will be low.
  40. To summarize, AutoAuth takes in a sequence of autobiographical question-answer responses and an adversary model and outputs a confidence score from0 to the prior probability that the authenticator is the user.
  41. That all seems great in theory, but how well did the model perform, actually?
  42. Specifically, I wanted to answer the question: if our field study users were actually using AutoAuth, what confidence scores would they be getting?What confidence scores would adversaries get?The first step towards answering these questions was to pick the adversaries to simulate, both weak and strong.
  43. In total, I simulated 5 adversaries based on plausible real life counterparts. For this presentation, I will only talk about 3: 1 simple and 2 advanced.Please refer to the paper for the others.
  44. The Naïve adversary is given a 1/10 chance of guessing any question correct, simulating a random Guess on a recognition question. This adversary might represent a complete stranger who steals the user’s phone.
  45. The Always Correct adversary, or ACA, always answers every question correctly.While the concept is simple, this adversary is quite advanced, potentially representing malwareThat has compromised the knowledge base but not the authenticator.
  46. The Empirical Knows Correct Adversary, or EKCA, is the most advanced adversary I consider. This adversary both knows the correct answer to Every question and is smart about when he answers correctly.The EKCA has created an empirical probability distribution that any user will get any type of question correct from a separate set of users, and uses this empiricaldistribution to selectively answer correctly or incorrectly to best emulate the empirical average.
  47. So, the how well does AutoAuth do? Specifically, what sort of confidence scores do users get when our confidence estimatoradopts the aforementioned adversaries?
  48. So, we have this graph. The x-axis represents the number of questions answered in one single session, ranging from 1-13.The y-axis represents the authentication confidence output from our confidence estimator, ranging from 0-95, where 95Is the prior probability that the attempting authenticator is the user.This graph plots the mean confidence estimate our field study users get from AutoAuth as they answer an increasingNumber of questions. The translucent shading around the plot is the 95% confidence interval. There are two things to pay attention to here. One is the direction of the trend. We want the plot to move up and toThe right. After all, if real users get decreasing confidence estimates, our estimator would not be doing a veryGood job! The second is the actual confidence values. We can see that when actual users attempt to authenticateWhere the adversary is naïve, users quickly get very high confidence scores.Note that 80 is a fairly high confidence score because the maximum value would imply that the data we observeHas a 0% chance that it comes from the adversary and a 100% chance that it matches our expectations of theUser. This is very unlikely in practice!.
  49. Here, I’ve superimposed the confidence scores that real users would obtain if modeled against an Always Correct Adversary.This adversary is simple, but represents a very salient threat: a stalker who carefully chronicles a user or a malicious programThat has direct access to the knowledge base.The results are quite promising. We can see that while the initial confidence is markedly lower than when modeled again the naïve adversary,Users quickly obtain high confidence estimates even against this powerful adversary.
  50. Finally, I’ve superimposed real user’s confidence estimates when modeled against the most powerful adversary: the EKCA.Not suprisingly, the confidence estimates offered by AutoAuth are much more conservative when modeling against thisAdversary. It is, after all, an incredibly sophisticated adversary. Nevertheless, the trend is still upwards, suggesting thatUsers can still achieve high confidence scores, albeit with a greater time investment.In general, it seems that AutoAuth always yields a reasonably high confidence score when actual users are trying to authenticate.These results are encouraging when you remember that the ACA and EKCA are actually fairly sophisticated adversaries that haveAccess to incredibly sensitive data about the user already.
  51. Next, we wanted to see how well adversaries could impersonate users, so we simulated the confidence scores our three adversaries would obtainwhen modeled against each other.
  52. The next question is how well adversaries can impersonate users.This plot shows how the user and our three adversaries perform after answering five and eight questions in one session, when modeled against a naïve adversary.The actual user’s performance is also shown for a comparison point.We should see that the confidence score for the adversary modeled against should be low, while the user’s confidence should remain high.The bars for adversaries not modeled against may be low or high. Ideally low, but it would not be surprising if they were high.We see exactly what we expect. The user performs very well, while the naïve adversary itself performs very poorly. However, advanced adversaries like the ACA and EKCA also obtain high confidence scores when AutoAuth adopts the naïve adversary model.
  53. But there is hope! Here we see the same graph, except with scores derived when modeled against the AlwaysCorrect Adversary.This graph is very encouraging, as we can see the ACA’s confidence score drops dramatically: from near 80 to less than 20.In other words, when specifically modeled against, the Always Correct Adversary acquires verylow confidence estimates from AutoAuth. Users themselves, however, continue to acquire high scores, suggesting aNegligible usability hit when adopting this stronger model.
  54. Lastly, we see the same graph with scores modeled against the EKCA.AutoAuth generally produces much more conservative estimates across the board whenIt adopts this powerful adversary model, as we would expect. Nevertheless, the user still performsWell relative to impersonators, while the EKCA performs less well and obtains lowerscores as she answers more questions. This is especially encouraging, because it suggestsThat the EKCA cannot simply try and answer more questions to get a higher score.
  55. In sum, autoauth performs reasonably well.Users always get relatively high and increasing confidence estimates, no matter the adversary modeled against.Furthermore, AutoAuth can easily obtain against simple adversaries like the naïve adversary, which is great whenYou consider that these are by far the more common adversaries.On the other hand, advanced adversaries can fool AutoAuth when it adopts a simple adversary model, but even these adversaries encounter road blocks when specifically modeled against.These are all promising results, suggesting that this AutoAuth might actually be useful in practice!
  56. So what have we learned from all of this?
  57. One of the key points I hope you takeaway is that while people are not flawless at answering questions aboutCapturable everyday memory, even their inaccuracies can be used as useful signals for authentication!Indeed, Autobiographical Authentication does just that and shows a lot of promise. It accountsFor systematic response error and performs well even against some incredibly sophisticatedAdversaries.
  58. But as with any study, there are limitations with this one that offer fruitful avenues for future work.For one thing, autobiographical Authentication is slow. It takes 22 seconds on average for users to answerEach question.Additionally,AutoAuth requires constant device usage to replenish itsknowledge base. Most of its niceProperties only arise in environments where the same question never has to be asked twice.Finally, it remains unclear how users will react to this sort of authentication. Edge cases, such as when a user genuinely but uncharacteristically answers all questions correctly, might break usability unless they can be handledgracefully.
  59. With that, I’ll open the floor to questions.
  60. Finally, a word on the practical use cases for AutoAuth as it stands.As AutoAuth is presently slow, one can imagine that while not a replacement for passwords, autoauth might be useful for password replacement.Presently, challenge questions that query about static facts are used as password replacement challenges. AutoAuth might be a more robust andSecure replacement.AutoAuth is also really amenable to Context-Aware Scalable Authentication, which scales authentication to be more challenging or simpleBased on a risky context. One can imagine scaling the minimum threshold confidence required for authentication with AutoAuth.Also, as AutoAuth is dynamic, many threats such as brute-force attacks and shoulder surfing become less potent.Finally, AutoAuth can also conceivably be used for tiered authentication, where all data is not treated equal. Read access to text messagesMight require a low confidence, while write-access to security settings might require high confidence.
  61. So you might be wondering at this point: what does autoauth offer? Here are two of my favorites.First, autoauth is pretty neat in that it can be scaled by context. In other words, if you’re in a risky context (for example, in an unknown city for the first time), autoauth might ask you to answer 5 or 6 questions. But if you’re at home, autoauth might ask you to answer just 1 question.Second, autoauth allows authentication to be dynamic and ever changing. Thus, attacks built with static challenge response in mind can be muchHarder to execute. For example, even if someone sees me answering a question once, that doesn’t necessarily give them insight into the questionThey would have to answer if they stole my phone.
  62. This graph plots participants’ mean confidence score along with the 95% standard errors after answering a certain number of questions within a single session.The different lines represent the user’s confidence scores computed when modeled against different adversaries.The first encouraging result is that no matter the adversary, the user’s confidence score appears to increase as he answersmore questions.Unsurprisingly, AutoAuth offers the highest confidence estimates when modeled against the weakest adversary: the naïve adversary.In this case, after just 5 questions, the user achieves a confidence score close to 80.AutoAuth offers the most conservative estimates when it adopts the strongest adversary model: the EKCA.Nevertheless, the trend is still upwards, suggesting that users can answer more questions to achieve higher confidence.In general, it seems that AutoAuth always yields a reasonably high confidence score when actual users are trying to authenticate.
  63. So that’s a cool idea, but how do we operationalize it? Well, what we have is a user model with training data, u. Let’s say weHave an observed sequence of question-answer responses, seq, that represents an authentication session and we’reTrying to make an authentication decision.Given this information, we want to compute something like P(u | seq)—the probability that the attempting authenticator isThe user, given the observed sequence of responses.
  64. Bayes Theorem tells us a lot about how to get what we want from what we have.(click)The first term in the numerator is P(seq | u), the probability we would observe the given sequenceOf responses from the user. This is easy to calculate from training data. ImagineI’m trying to log in to my phone, and it asks me: What did you eat for lunch yesterday? My phoneKnows, from the training data, that I answer this question correctly 70% of the time, so P(seq | u)Is just that: 0.7.(click)The second term in the numerator, P(u), is simply the prior probability that the authenticator is the user.For personal mobile devices, this is just a high constant. Something close, but not quite equal to 1.(click)The denominator, P(seq), represents the overall probability that we would observe this sequence of responses.
  65. P(seq) is hard to compute because it requires knowledge of all possible impersonators.But, we can break down P(seq) into two components that make it more managable: the probabilityThat the sequence comes from the user plus the probability that the sequence doesn’t come from the user or P(seq | u hat).(click)u hat is what I call the adversary model. Rather than try and enumerate all possible non-users, we model a specific kind of non-userAnd use that model as a representation of who we consider our most likely impersonator.
  66. Themodified equation takes in an additional parameter, u hat, and breaks the denominator down into the twocomponents I just mentioned.
  67. But, you might have noticed that this simplification has introduced a problem.(click)To get a high score, an authenticator must only minimize the second term of the denominator: P(seq | u hat).
  68. In other words, a clever impersonator can get a high score simply by being unlike the adversary, even if he is nothing like the user.
  69. We can fix this by adding an additional term, the bit-string similarity between the expected and actual correctness of the observed sequence of responses.I won’t spend too much time explaining this term, but the important takeaway is that this term encapsulates how well the observed Sequence of responses matches our expectations if it were actually the user, on a scale of 0 – 1.Please refer to the paper for more details.
  70. So, we finally arrive at the final equation.Our confidence in the user is the probability that the observed sequence comes from the user, given an adversary model timesThe similarity between the expected and observed sequence of responses.
  71. To get a more detailed look at what affected a participant’s ability to get a response correct or incorrect,I modeled the correctness of a response with a mixed-effects logistic regression with the user as aRandom effect.In short, I used a mixed-effects model because I collected multiple responses per user and becauseIt allows for each user to have his/her own baseline likelihood to get a question correct.For more on the modeling technique, please refer to the paper.
  72. These are the fixed effect controls I included in the model. As you can see, I controlled for demographic variables thatlike age and gender that might affect memory retention, and included several other features in the model includingQuestion type and answer uniqueness.However, I’ll primarily be focusin
  73. Another curious finding is that questions with more unique answers were answered incorrectlyMore often than questions with less unique answers.This seems unintuitive; surely, more unique events are more memorable! However, this finding suggests that users may not actually “remember” answers, but simple “know” them. In other words,If asked ‘Where were you on Thursday at 1pm?”, I might say Newell Simon Hall because that’s where I generallyam on Thursdays at 1pm, whether or not I specifically remember being there at that time.This heuristic works well for the majority case, but breaks down when we stray from routine.----- Meeting Notes (8/20/13 10:08) -----explain graph more. y-axis, x-axis,
  74. So that’s a cool idea. How do we actually get to it?Imagine a simplified system with only two question types. We have some training data from a user,From which we know how often a user gets both types of questions correct, in the left table. We also haveAn attempted authentication into this user’s smartphone, in the right table.How can we get from the raw question-answer response to a confidence estimate?
  75. The first thing we can do is calcualte the probability that we would observe this question-answer response sequence from the user.We do that by simply plugging in and multiplying.The attempting authenticator got the first question of type QT1 correct, and the second question of type QT2 incorrect.Our training data tells us that there is a 42% chance we would observe this sequence from the actual user.
  76. Now, P(seq | u hat) can potentially be problematic as well, but we can simplify the problem by adopting a specific adversary model.In other words, u hat is not all possible non-users, but a model of non-users that we believe is our “adversary”, or the most likely impersonator.
  77. For the location question, or Where were you at &lt;time&gt;?, users were asked to select their location on a map. Any location selected within the error radiusOf the sensor reading at the time was considered correct.