SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Tailored,
Machine Learning-driven
Password Guessing Attacks
and Mitigation
Georg Knabl
Georg Knabl
• self-employed IT-Consultant &
Software Engineer at
• based in Graz, Austria
• areas of expertise
• machine learning implementations
• web development
• information security
2
3
The Problem with
Human Passwords
4
A Human Attack Vector
• people use password creation schemes
• types
• machine-random (&CtAEaCp?b&v"s%)
• human-general (123456)
• human-individual (John1970!)
• human-random (randomly typed, 34ghjk34f3hjkHGFC)
• What about correct horse battery staple?
• issues
• reduced entropy
• attacker: knowing scheme (+ personal data) => password
• humans limited in creativity
 somebody else might have come up with same scheme
 schemes publicly available in password leaks
5
Attacking Passwords
6
Traditional Approaches
Hybrid or rule-
based
•dictionaries
•word-
mangling
rules
Markov Models
•high-
probability
character
sequences
Masks
•reduce set to
typical
structures
Brute-force
•try every
possible
combination
7
key space (Dunning, 2016)
• tool support:
hashcat, John-the-Ripper, PACK, CeWL, CUPP, …
Dictionary Sources
• password leaks: rockyou.txt, exploit.in, …
• tailored lists
• CeWL: web scraping
• CUPP: pre-defined questions
8
Analytics
Website
Designs
Webdesign
Rebranding
passionately
simply
Factory
…
smithJohn@*
smithJohn@@
smithJohn_1
smithSmithy
smith_
smith_01
smith_01050
…
123456
12345
123456789
password
iloveyou
princess
1234567
12345678
abc123
…
Machine-generated Text
9
Neural Networks
10
• analyze huge datasets
• learn hidden structures
• reproduce structures
on new data
• supervised learning process:
train on data generate model
use model to
analyze/generate
Recurrent Neural Networks (RNN)
• learn, analyze, reproduce sequences
• password = sequence of characters
• password list: next password
 n: just another character
11
(Olah, 2015)
RNN Tokenization
12
0 a
1 b
2 c
3 d
4 e
… …
92 n
„abc“
source
data
training generation
target
data
„cde“0, 1, 2 2, 3, 4
char-rnn
• RNN predicts character sequences based on
training text
• by Andrej Karpathy
• https://github.com/karpathy/char-rnn
13
(Karpathy, 2015)
Works of Shakespeare
14
training
output
(Karpathy, 2015)
Linux Source Code
15
training output
(Karpathy, 2015)
rockyou.txt
16
training output
General Human Passwords Guessing
• Neural
Networks
outperform
other methods
at above 10^10
guesses
• (almost) infinite
number of
passwords
17
(Melicher et. al., 2016)
Exploiting Individual Human
Password Schemes
A Machine Learning Approach
18
Relevance
• most passwords have
individual context
• individual details publicly
available (OSINT)
• social media
 harvester scripts
• website user tables
 leaked database dumps
• …
19
exploit.in
Tailored Password Lists
20
training output
John2050
180374
09091958
06031982
160883
soni
John!
john!
j0hn.5m17h
john.smith
Smith866
asdfghj
John50
Data Protection Compliance
• EU-GDPR (General Data Protection Regulation)
• significant fines
• up to 20 mio. € or 4% of worldwide annual revenue
• processing personal data requires consent
• password lists contain personal information
•  publicly available leaked data illegal
• imbalance
• info-sec researcher:
has to comply & find (less ideal) alternatives
• attacker:
ignores regulations & trains on best available data
21
Data Protection Compliance
• compliant solutions to collect data
• general passwords:
• use e.g. top-100,000 passwords list
 no personal details contained
• individual details + passwords:
• compliance based on "public interest"? (GDPR Art. 6 (1) (e))
• collect consent from users
 requires broad access to user data
a) directly store & relate data until training is finished
 requires password storage in plaintext (!!!)
b) only store tokenized password schemes without user relation
 requires all relatable personal data to be known at password
hashing time
22
Challenges
• generate password sequences ✓
• GDPR compliance ?
• recognize & relate individual structures ?
• How to relate personal data?
• same scheme, different character sequences
<first name><year of birth>!
John1985!, Jane1992!
• dealing with obfuscations ?
• e.g. Leetspeak, all upper/lower case
j0hn1985!, JOHN1985!, john1985!
23
Generating a Dataset Containing
Individual Details
• starting point: any password leak that contains
a personal identifier
• char-rnn requires > 50,000 entries for proper
results
• e.g. exploit.in (797 mio. credentials):
<email address>:<password>
• collect, match and attach personal details to
entries
• e.g. using social media harvester
24
Generating a Dataset Containing
Individual Details
25
Gender Username First Name Last Name Year of Birth Password
f margarete Judy Wells 1972 Wells106
f sondra Lucia Morrow 1950 cvbnm
f zakia Gale Weiss 1999 syndikat
f eada Ana Elliott 1994 Ana94
f karalee Denise Hanson 1965 OLIVER
m agatha Edmond Daniels 1956 Agatha
…
• example result:
Password Schemes Used
• Random: random choice of top-X password list (e.g. 123456)
• Easy to Type: nearby characters on keyboard (e.g. qwerty)
• Username: use person‘s username (e.g. smithy)
• First Name + „!“: use person‘s first name plus exclamation mark (e.g.
John!)
• Lowercased First Name + „!“: use person‘s lowercased first name plus
exclamation mark (e.g. john!)
• Last Name + Random Int: use person‘s last name plus a three digit integer
at the end (e.g. Smith758)
• Username Leetspeak: use person‘s username in Leetspeak (e.g. 5m17hy)
• First Name + Year of Birth (4 digits): use person‘s first name plus their year
of birth (e.g. John1985)
• First Name + Year of Birth (2 digits): use person‘s first name plus their year
of birth in two digits (e.g. John85)
26
Tokenization
• replace personal details with column id
• column id is just another character
• problem: exact matching fails to match
obfuscations or abbreviations
• John != j0hn
• 1986 != 86
27
# First Name Year of Birth Password Resulting Password Tokens
1 Max 1983 Max1983! column: First Name, column: Year of Birth, !
2 John 1986 John86! column: First Name, 8, 6, !
3 Max 1987 123456 1, 2, 3, 4, 5, 6
Support Matching Using Data
Variations
• add on-the-fly word mangling rules to columns
• Leetspeak
• lowercase
• uppercase
• …
28
f f f F tania 74n14 tania TANIA Kara k4r4 kara KARA Rosales r054135 rosales ROSALES
…
f tania Kara Rosales
…
Challenges
• generate password sequences ✓
• GDPR compliance ✓
•  use top-X password lists + fake rules
• recognize & relate individual structures ✓
•  column ids instead individual details
• dealing with obfuscations ✓
•  on-the-fly word mangling rules to extend
columns
29
Implementation
• Python application based on Sean Robertson's
pytorch-char-rnn
• https://github.com/spro/char-rnn.pytorch
• adaptions (excerpt)
• matrix-based individual detail matching
• on-the-fly word-mangling rules
30
Training
31
Whn
carickte
aanhls
cshscarn
suasso
ail
zpkoty
beigedl
11883469
aw
aeeenl
aiseie
enal
faedni
bnoxtln
Wh
ronis25
44353133
maty
0598971
treames
bicken
ratont
tulie
stocker
shathos
netrer
derfa
tolei
dorled
Wh
ge
butter
jackout
05081984
lllllll
sian
harder
chedle
raven
11021985
supers
17031988
spike
duddick
epoch 10 epoch 40 epoch 280
Attacking the Target
• collect data about victim & generate dataset
• use trained model to generate a tailored
password list
• quality of list depends heavily on
• selected training data
• hyperparameter configuration
32
Gender Username First Name Last Name Year of Birth
m john.smith John Smith 2050
Results & Qualitative Analysis
33
Scheme Adoption
34
John2050
180374
09091958
06031982
160883
soni
John!
John!
[skipped until line 14]
john!
[skipped until line 23]
j0hn.5m17h
[skipped until line 30]
john.smith
[skipped until line 80]
Smith866
[skipped until line 85]
asdfghj
[skipped until line 514]
John50
[...]
Random:
stochastic character generation
(mostly human dates)
First Name + Year of Birth (4 digits):
learned
Username Leetspeak:
learned using word mangling
Last Name + Random Int:
partially learned + stochastic generation
Lowercased First Name + „!“:
learned using word mangling
First Name + „!“:
learned
Easy to Type:
learned
Username:
learned
First Name + Year of Birth (2 digits):
partially learned + stochastic generation
Duplicate because of
few available rules
Gender Username First Name Last Name Year of Birth
m john.smith John Smith 2050
Proving Password Scheme Adoption
1. use new fake dataset with same schemes
2. loop through each entry and generate a
individual password list (1000 entries)
3. check if password is on that list
35
Gender Username First Name Last Name Year of Birth Password
f margarete Judy Wells 1972 Wells106
?
Results
• 6 models with different
configurations
• all models match about
70% in password lists of
only ~100 lines
• optimized configurations
increase matching
efficiency
• recreated distributions
of schemes
36
Mitigation
37
Mitigation Strategies
• generating own model and check user‘s password
against generated lists
• attacker‘s model and dataset not available
 password lists will differ
• long or complex passwords
• passwords might still be guessed if they contain
personal information
• e.g. JohnSmith1985 is actually
<column: firstname><column: lastname><column: year of birth>
• treating all human-like passwords as insecure
• requires classification of human likeliness
38
Human Password Classification
• using machine learning to classify human likeliness
• dataset (80k human + 80k machine labeled passwords)
• classifiers
• Logistic Regression
• Multinomial Naïve Bayes
• Linear Support Vector Machine
• Random Forest
• vectorizers
• TFIDF
• Count
39
&CtAEaCp?b&v"s% m
-SUuf4TLtF m
mallrats h
bP0.}BO/L&{: m
^=c.rgH$z m
boxers h
j&uzHCutff_A{ m
656565 h
6>IB|~@4^n}K m
forever1 h
…
Results
accuracy human vs. machine-random:
99% correct
40
14061966 0.9961306540
y-JQ6{v;_yb|q 0.0000000000
ZBT4n#z-x 0.0000121259
longball 0.9920406811
vikings 0.9723564484
gunit 0.9683620674
.XP?]b36nP]l| 0.0000000000
8J9{Bd^ 0.0000107884
123india 0.9986476258
*[qg;t 0.0000058089
…
What about randomly-typed
passwords?
• human-random passwords
• almost impossible for humans to distinguish
• previously trained model:
83% correct
• specifically trained model (human-random vs. machine-random):
94% correct
41
,asgl213
HGHfwjiofjiw!?
FEA452
dciuowed7983zy_
jksdgf644kjbndf
Xkkeelt7tad5z
sabjas012
123jfmvfkfn49fvk.
…
Demo
42
Conclusion
• machine learning can be used to efficiently
attack passwords created by humans
• mitigation
• treat human passwords as insecure
• warn users or provide password policy
 use machine learning model to identify human
passwords
 integrate on web servers & password storage
services
43
Resources
• Thesis Machine Learning-driven Password
Lists:
• https://www.researchgate.net/publication/328719
001_Machine_Learning-driven_Password_Lists
• Human Password Classifier:
• https://github.com/georgknabl/human-password-
classifier
• ready-to-use trained models available via e-mail
44
45
"The only secure password is the one you can't remember."
Troy Hunt (haveibeenpwned.com)
Contact
46
DI (FH) Georg Knabl, MSc
IT-Consultant & Software Engineer
georg.knabl@pageonstage.at
Sources
• Dunning, Julian (2016). Statistics Will Crack Your Password. Available
from: https://p16.praetorian.com/blog/statistics-will-crack-
yourpassword-mask-structure [Mar. 3, 2018]
• Karpathy, Andrej (2015). The Unreasonable Effectiveness of
Recurrent Neural Networks. Available from:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ [Nov. 10,
2017]
• Melicher, William, Blase Ur, Sean M Segreti, Saranga Komanduri,
Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor (2016). „Fast,
Lean, and Accurate: Modeling Password Guessability Using Neural
Networks“. In: 25th {USENIX} Security Symposium ({USENIX} Security
16). Vancouver: {USENIX} Association, pp. 175–191.
• Olah, Christopher (2015). Understanding LSTM Networks. Available
from: http://colah.github.io/posts/2015- 08-Understanding- LSTMs/
[Nov. 10, 2017]
47

Weitere ähnliche Inhalte

Ähnlich wie Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation

Hackers are innocent
Hackers are innocentHackers are innocent
Hackers are innocentdanish3
 
Password Cracking
Password CrackingPassword Cracking
Password CrackingSagar Verma
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!nerdybeardo
 
Nicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of PasswordsNicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of PasswordsCSNP
 
2018 FRSecure CISSP Mentor Program- Session 5
2018 FRSecure CISSP Mentor Program-  Session 52018 FRSecure CISSP Mentor Program-  Session 5
2018 FRSecure CISSP Mentor Program- Session 5FRSecure
 
CNIT 123 12: Cryptography
CNIT 123 12: CryptographyCNIT 123 12: Cryptography
CNIT 123 12: CryptographySam Bowne
 
What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)Xavier Mertens
 
Techniques for password hashing and cracking
Techniques for password hashing and crackingTechniques for password hashing and cracking
Techniques for password hashing and crackingNipun Joshi
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Alex Pinto
 
Chapter# 3 modified.pptx
Chapter# 3 modified.pptxChapter# 3 modified.pptx
Chapter# 3 modified.pptxMaryam522887
 
Cryptography
CryptographyCryptography
CryptographyPPT4U
 
Dark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 CryptographyDark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 CryptographyMarcus Leaning
 

Ähnlich wie Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation (20)

Hackers are innocent
Hackers are innocentHackers are innocent
Hackers are innocent
 
Ppsp icassp17v10
Ppsp icassp17v10Ppsp icassp17v10
Ppsp icassp17v10
 
Password Cracking
Password CrackingPassword Cracking
Password Cracking
 
L27
L27L27
L27
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!
 
Nicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of PasswordsNicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of Passwords
 
2018 FRSecure CISSP Mentor Program- Session 5
2018 FRSecure CISSP Mentor Program-  Session 52018 FRSecure CISSP Mentor Program-  Session 5
2018 FRSecure CISSP Mentor Program- Session 5
 
CNIT 123 12: Cryptography
CNIT 123 12: CryptographyCNIT 123 12: Cryptography
CNIT 123 12: Cryptography
 
What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)
 
Passwords
PasswordsPasswords
Passwords
 
Techniques for password hashing and cracking
Techniques for password hashing and crackingTechniques for password hashing and cracking
Techniques for password hashing and cracking
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Security.ppt
Security.pptSecurity.ppt
Security.ppt
 
Data platform ID generation
Data platform ID generationData platform ID generation
Data platform ID generation
 
Chapter# 3 modified.pptx
Chapter# 3 modified.pptxChapter# 3 modified.pptx
Chapter# 3 modified.pptx
 
Symmetric encryption
Symmetric encryptionSymmetric encryption
Symmetric encryption
 
Computer Security
Computer SecurityComputer Security
Computer Security
 
From OSINT to Phishing presentation
From OSINT to Phishing presentationFrom OSINT to Phishing presentation
From OSINT to Phishing presentation
 
Cryptography
CryptographyCryptography
Cryptography
 
Dark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 CryptographyDark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 Cryptography
 

Mehr von DefCamp

Remote Yacht Hacking
Remote Yacht HackingRemote Yacht Hacking
Remote Yacht HackingDefCamp
 
Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!DefCamp
 
The Charter of Trust
The Charter of TrustThe Charter of Trust
The Charter of TrustDefCamp
 
Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?DefCamp
 
Bridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UXBridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UXDefCamp
 
Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...DefCamp
 
Drupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the AttackerDrupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the AttackerDefCamp
 
Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)DefCamp
 
Trust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFATrust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFADefCamp
 
Threat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical ApplicationThreat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical ApplicationDefCamp
 
Building application security with 0 money down
Building application security with 0 money downBuilding application security with 0 money down
Building application security with 0 money downDefCamp
 
Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...DefCamp
 
Lattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochLattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochDefCamp
 
The challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcareThe challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcareDefCamp
 
Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?DefCamp
 
Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured DefCamp
 
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...DefCamp
 
We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.DefCamp
 
Connect & Inspire Cyber Security
Connect & Inspire Cyber SecurityConnect & Inspire Cyber Security
Connect & Inspire Cyber SecurityDefCamp
 
The lions and the watering hole
The lions and the watering holeThe lions and the watering hole
The lions and the watering holeDefCamp
 

Mehr von DefCamp (20)

Remote Yacht Hacking
Remote Yacht HackingRemote Yacht Hacking
Remote Yacht Hacking
 
Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!
 
The Charter of Trust
The Charter of TrustThe Charter of Trust
The Charter of Trust
 
Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?
 
Bridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UXBridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UX
 
Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...
 
Drupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the AttackerDrupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the Attacker
 
Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)
 
Trust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFATrust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFA
 
Threat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical ApplicationThreat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical Application
 
Building application security with 0 money down
Building application security with 0 money downBuilding application security with 0 money down
Building application security with 0 money down
 
Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...
 
Lattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochLattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epoch
 
The challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcareThe challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcare
 
Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?
 
Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured
 
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
 
We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.
 
Connect & Inspire Cyber Security
Connect & Inspire Cyber SecurityConnect & Inspire Cyber Security
Connect & Inspire Cyber Security
 
The lions and the watering hole
The lions and the watering holeThe lions and the watering hole
The lions and the watering hole
 

Kürzlich hochgeladen

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation

  • 1. Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation Georg Knabl
  • 2. Georg Knabl • self-employed IT-Consultant & Software Engineer at • based in Graz, Austria • areas of expertise • machine learning implementations • web development • information security 2
  • 3. 3
  • 5. A Human Attack Vector • people use password creation schemes • types • machine-random (&CtAEaCp?b&v"s%) • human-general (123456) • human-individual (John1970!) • human-random (randomly typed, 34ghjk34f3hjkHGFC) • What about correct horse battery staple? • issues • reduced entropy • attacker: knowing scheme (+ personal data) => password • humans limited in creativity  somebody else might have come up with same scheme  schemes publicly available in password leaks 5
  • 7. Traditional Approaches Hybrid or rule- based •dictionaries •word- mangling rules Markov Models •high- probability character sequences Masks •reduce set to typical structures Brute-force •try every possible combination 7 key space (Dunning, 2016) • tool support: hashcat, John-the-Ripper, PACK, CeWL, CUPP, …
  • 8. Dictionary Sources • password leaks: rockyou.txt, exploit.in, … • tailored lists • CeWL: web scraping • CUPP: pre-defined questions 8 Analytics Website Designs Webdesign Rebranding passionately simply Factory … smithJohn@* smithJohn@@ smithJohn_1 smithSmithy smith_ smith_01 smith_01050 … 123456 12345 123456789 password iloveyou princess 1234567 12345678 abc123 …
  • 10. Neural Networks 10 • analyze huge datasets • learn hidden structures • reproduce structures on new data • supervised learning process: train on data generate model use model to analyze/generate
  • 11. Recurrent Neural Networks (RNN) • learn, analyze, reproduce sequences • password = sequence of characters • password list: next password  n: just another character 11 (Olah, 2015)
  • 12. RNN Tokenization 12 0 a 1 b 2 c 3 d 4 e … … 92 n „abc“ source data training generation target data „cde“0, 1, 2 2, 3, 4
  • 13. char-rnn • RNN predicts character sequences based on training text • by Andrej Karpathy • https://github.com/karpathy/char-rnn 13 (Karpathy, 2015)
  • 15. Linux Source Code 15 training output (Karpathy, 2015)
  • 17. General Human Passwords Guessing • Neural Networks outperform other methods at above 10^10 guesses • (almost) infinite number of passwords 17 (Melicher et. al., 2016)
  • 18. Exploiting Individual Human Password Schemes A Machine Learning Approach 18
  • 19. Relevance • most passwords have individual context • individual details publicly available (OSINT) • social media  harvester scripts • website user tables  leaked database dumps • … 19 exploit.in
  • 20. Tailored Password Lists 20 training output John2050 180374 09091958 06031982 160883 soni John! john! j0hn.5m17h john.smith Smith866 asdfghj John50
  • 21. Data Protection Compliance • EU-GDPR (General Data Protection Regulation) • significant fines • up to 20 mio. € or 4% of worldwide annual revenue • processing personal data requires consent • password lists contain personal information •  publicly available leaked data illegal • imbalance • info-sec researcher: has to comply & find (less ideal) alternatives • attacker: ignores regulations & trains on best available data 21
  • 22. Data Protection Compliance • compliant solutions to collect data • general passwords: • use e.g. top-100,000 passwords list  no personal details contained • individual details + passwords: • compliance based on "public interest"? (GDPR Art. 6 (1) (e)) • collect consent from users  requires broad access to user data a) directly store & relate data until training is finished  requires password storage in plaintext (!!!) b) only store tokenized password schemes without user relation  requires all relatable personal data to be known at password hashing time 22
  • 23. Challenges • generate password sequences ✓ • GDPR compliance ? • recognize & relate individual structures ? • How to relate personal data? • same scheme, different character sequences <first name><year of birth>! John1985!, Jane1992! • dealing with obfuscations ? • e.g. Leetspeak, all upper/lower case j0hn1985!, JOHN1985!, john1985! 23
  • 24. Generating a Dataset Containing Individual Details • starting point: any password leak that contains a personal identifier • char-rnn requires > 50,000 entries for proper results • e.g. exploit.in (797 mio. credentials): <email address>:<password> • collect, match and attach personal details to entries • e.g. using social media harvester 24
  • 25. Generating a Dataset Containing Individual Details 25 Gender Username First Name Last Name Year of Birth Password f margarete Judy Wells 1972 Wells106 f sondra Lucia Morrow 1950 cvbnm f zakia Gale Weiss 1999 syndikat f eada Ana Elliott 1994 Ana94 f karalee Denise Hanson 1965 OLIVER m agatha Edmond Daniels 1956 Agatha … • example result:
  • 26. Password Schemes Used • Random: random choice of top-X password list (e.g. 123456) • Easy to Type: nearby characters on keyboard (e.g. qwerty) • Username: use person‘s username (e.g. smithy) • First Name + „!“: use person‘s first name plus exclamation mark (e.g. John!) • Lowercased First Name + „!“: use person‘s lowercased first name plus exclamation mark (e.g. john!) • Last Name + Random Int: use person‘s last name plus a three digit integer at the end (e.g. Smith758) • Username Leetspeak: use person‘s username in Leetspeak (e.g. 5m17hy) • First Name + Year of Birth (4 digits): use person‘s first name plus their year of birth (e.g. John1985) • First Name + Year of Birth (2 digits): use person‘s first name plus their year of birth in two digits (e.g. John85) 26
  • 27. Tokenization • replace personal details with column id • column id is just another character • problem: exact matching fails to match obfuscations or abbreviations • John != j0hn • 1986 != 86 27 # First Name Year of Birth Password Resulting Password Tokens 1 Max 1983 Max1983! column: First Name, column: Year of Birth, ! 2 John 1986 John86! column: First Name, 8, 6, ! 3 Max 1987 123456 1, 2, 3, 4, 5, 6
  • 28. Support Matching Using Data Variations • add on-the-fly word mangling rules to columns • Leetspeak • lowercase • uppercase • … 28 f f f F tania 74n14 tania TANIA Kara k4r4 kara KARA Rosales r054135 rosales ROSALES … f tania Kara Rosales …
  • 29. Challenges • generate password sequences ✓ • GDPR compliance ✓ •  use top-X password lists + fake rules • recognize & relate individual structures ✓ •  column ids instead individual details • dealing with obfuscations ✓ •  on-the-fly word mangling rules to extend columns 29
  • 30. Implementation • Python application based on Sean Robertson's pytorch-char-rnn • https://github.com/spro/char-rnn.pytorch • adaptions (excerpt) • matrix-based individual detail matching • on-the-fly word-mangling rules 30
  • 32. Attacking the Target • collect data about victim & generate dataset • use trained model to generate a tailored password list • quality of list depends heavily on • selected training data • hyperparameter configuration 32 Gender Username First Name Last Name Year of Birth m john.smith John Smith 2050
  • 33. Results & Qualitative Analysis 33
  • 34. Scheme Adoption 34 John2050 180374 09091958 06031982 160883 soni John! John! [skipped until line 14] john! [skipped until line 23] j0hn.5m17h [skipped until line 30] john.smith [skipped until line 80] Smith866 [skipped until line 85] asdfghj [skipped until line 514] John50 [...] Random: stochastic character generation (mostly human dates) First Name + Year of Birth (4 digits): learned Username Leetspeak: learned using word mangling Last Name + Random Int: partially learned + stochastic generation Lowercased First Name + „!“: learned using word mangling First Name + „!“: learned Easy to Type: learned Username: learned First Name + Year of Birth (2 digits): partially learned + stochastic generation Duplicate because of few available rules Gender Username First Name Last Name Year of Birth m john.smith John Smith 2050
  • 35. Proving Password Scheme Adoption 1. use new fake dataset with same schemes 2. loop through each entry and generate a individual password list (1000 entries) 3. check if password is on that list 35 Gender Username First Name Last Name Year of Birth Password f margarete Judy Wells 1972 Wells106 ?
  • 36. Results • 6 models with different configurations • all models match about 70% in password lists of only ~100 lines • optimized configurations increase matching efficiency • recreated distributions of schemes 36
  • 38. Mitigation Strategies • generating own model and check user‘s password against generated lists • attacker‘s model and dataset not available  password lists will differ • long or complex passwords • passwords might still be guessed if they contain personal information • e.g. JohnSmith1985 is actually <column: firstname><column: lastname><column: year of birth> • treating all human-like passwords as insecure • requires classification of human likeliness 38
  • 39. Human Password Classification • using machine learning to classify human likeliness • dataset (80k human + 80k machine labeled passwords) • classifiers • Logistic Regression • Multinomial Naïve Bayes • Linear Support Vector Machine • Random Forest • vectorizers • TFIDF • Count 39 &CtAEaCp?b&v"s% m -SUuf4TLtF m mallrats h bP0.}BO/L&{: m ^=c.rgH$z m boxers h j&uzHCutff_A{ m 656565 h 6>IB|~@4^n}K m forever1 h …
  • 40. Results accuracy human vs. machine-random: 99% correct 40 14061966 0.9961306540 y-JQ6{v;_yb|q 0.0000000000 ZBT4n#z-x 0.0000121259 longball 0.9920406811 vikings 0.9723564484 gunit 0.9683620674 .XP?]b36nP]l| 0.0000000000 8J9{Bd^ 0.0000107884 123india 0.9986476258 *[qg;t 0.0000058089 …
  • 41. What about randomly-typed passwords? • human-random passwords • almost impossible for humans to distinguish • previously trained model: 83% correct • specifically trained model (human-random vs. machine-random): 94% correct 41 ,asgl213 HGHfwjiofjiw!? FEA452 dciuowed7983zy_ jksdgf644kjbndf Xkkeelt7tad5z sabjas012 123jfmvfkfn49fvk. …
  • 43. Conclusion • machine learning can be used to efficiently attack passwords created by humans • mitigation • treat human passwords as insecure • warn users or provide password policy  use machine learning model to identify human passwords  integrate on web servers & password storage services 43
  • 44. Resources • Thesis Machine Learning-driven Password Lists: • https://www.researchgate.net/publication/328719 001_Machine_Learning-driven_Password_Lists • Human Password Classifier: • https://github.com/georgknabl/human-password- classifier • ready-to-use trained models available via e-mail 44
  • 45. 45 "The only secure password is the one you can't remember." Troy Hunt (haveibeenpwned.com)
  • 46. Contact 46 DI (FH) Georg Knabl, MSc IT-Consultant & Software Engineer georg.knabl@pageonstage.at
  • 47. Sources • Dunning, Julian (2016). Statistics Will Crack Your Password. Available from: https://p16.praetorian.com/blog/statistics-will-crack- yourpassword-mask-structure [Mar. 3, 2018] • Karpathy, Andrej (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. Available from: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ [Nov. 10, 2017] • Melicher, William, Blase Ur, Sean M Segreti, Saranga Komanduri, Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor (2016). „Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks“. In: 25th {USENIX} Security Symposium ({USENIX} Security 16). Vancouver: {USENIX} Association, pp. 175–191. • Olah, Christopher (2015). Understanding LSTM Networks. Available from: http://colah.github.io/posts/2015- 08-Understanding- LSTMs/ [Nov. 10, 2017] 47