Weitere ähnliche Inhalte
Ähnlich wie Using Sequence Statistics to Fight Advanced Persistent Threats (20)
Mehr von DataWorks Summit/Hadoop Summit (20)
Kürzlich hochgeladen (20)
Using Sequence Statistics to Fight Advanced Persistent Threats
- 2. © 2016 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & others
VP of Incubator at Apache Foundation
Email tdunning@apache.org tdunning@maprtech.com
Twitter @ted_dunning
Hashtags today: #hs16dublin #mapr
- 3. © 2016 MapR Technologies 3
Agenda
• What’s this persistent threat stuff?
– What attackers do
– How they do it
• Examples
• Sequence statistics
– Really geeking with gas now!
• Detection techniques
• Specifics
• Summary
- 4. © 2016 MapR Technologies 4
Agenda of All Security Talks
• Terror
• Faint hope
• More terror
• Practical suggestions
• Summary
- 5. © 2016 MapR Technologies 5
Operation Ababil – Brobots on Parade
• Dork attack to find unpatched default Joomla sites
– Especially web servers with high bandwidth connections
– Basically just Google searches for default strings
– Joomla compromised into attack Brobot
• C&C network checks in occasionally
– Note C&C is incoming request and looks like normal web requests
• Later, on command, multiple Brobots direct 50-75 Gb/s of attack
– Attacks come from white-listed sites
- 6. © 2016 MapR Technologies 6
Attack Sequence
Source
First level
C&C
Second
level C&C
- 7. © 2016 MapR Technologies 7
Google
Attack Sequence
Source
First level
C&C
Second
level C&C
- 8. © 2016 MapR Technologies 8
Brobot
Brobot
Brobot
Attack Sequence
Source
First level
C&C
Second
level C&C
- 9. © 2016 MapR Technologies 9
Target
Brobot
Brobot
Brobot
Attack Sequence
Source
First level
C&C
Second
level C&C
- 10. © 2016 MapR Technologies 10
Outline of an Advanced Persistent Threat
• Advanced
– Common use of zero-day for preliminary attacks
– Often attributed to state-level actors
– Modern privateers blur the line
• Persistent
– Result of first attack is heavily muffled, no immediate exploit
– Remote access toolset installed (RAT)
• Threat
– On command, data is exfiltrated covertly or en masse
– Or the compromised host is used for other nefarious purpose
- 11. © 2016 MapR Technologies 11
APT in Summary
• Attack, penetrate, pivot, exfiltrate or exploit
• If you are a high-value target, attack is likely and stealthy
– High-value = telecom, banks, utilities, retail targets, web100
– … and all their vendors
– Conventional multi-factor auth is easily breached
• Penetration and pivot are critical counter-measure opportunities
– In 2010, RAT would contact command and control (C&C)
– In 2016, C&C looks like normal traffic
• Once exfiltration or exploit starts, you may no longer have a
business
- 12. © 2016 MapR Technologies 12
So are we totally screwed?
- 13. © 2016 MapR Technologies 13
So are we totally screwed?
Not entirely!
- 14. © 2016 MapR Technologies 14
Event Sequences Provide Clues
• Event sequence appear in many places
• Headers
– Header types, ordering in requests
• IP address accesses
– Source and destination, sequences of either
• TLS options
– Which options, which values, which algorithms
• Incoming component request ordering and timing
– Body first, CSS, scripts and images next
– But which are cached, what is round-trip time?
- 15. © 2016 MapR Technologies 15
Sequences and Cooccurrences
• All of these characteristics form symbolic sequences
• Current systems use hand-crafted rules about particular state
– But hand-crafting depends on human knowledge
• We can do much, much better by considering cooccurrence and
ordering of symbols in these sequences
• Log-likelihood ratio test (jargon alert) is a key tool
- 16. © 2016 MapR Technologies 16
A core technique
• Many of these easy problems reduce to finding interesting
coincidences
• This can be summarized as a 2 x 2 table
• Actually, many of these tables
A Other
B k11 k12
Other k21 k22
- 17. © 2016 MapR Technologies 17
How do you do that?
• This is well handled using G-test
– See wikipedia
– See http://bit.ly/surprise-and-coincidence
• Original application in linguistics now cited > 2000 times
• Available in ElasticSearch, in Solr, in Mahout
• Available in R, C, Java, Python
- 18. © 2016 MapR Technologies 18
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
- 19. © 2016 MapR Technologies 19
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
0.90 1.95
4.52 14.3
Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence,
Computational Linguistics vol 19 no. 1 (1993)
- 20. © 2016 MapR Technologies 20
How to Count (header-like documents)
For each “document”:
For each “word” A:
left[A]++
For each “word” B after that (within window):
count[A,B]++
right[B]++
total++
- 21. © 2016 MapR Technologies 21
• We wanted this 2 x 2 table for each A,B
• But we only counted k11 directly
• But we did count
k*1 = k11 + k21 (how many A’s we saw)
k1* = k11 + k12 (how many B’s we saw)
k** = k11 + k21 + k12 + k22 (how many pairs in total)
A Other
B k11 k12
Other k21 k22
- 22. © 2016 MapR Technologies 22
How to Count (continued)
Map<PriorityQueue> queue
for each pair (A,B)
k11 = count[A,B]
k1x = left[A]
kx1 = right[B]
kxx = total
k12 = k1x - k11
k21 = kx2 - k11
k22 = kxx - k11 - k12 - k21
queue.add(A, (LLR(k11,k12,k21,k22), B))
- 23. © 2016 MapR Technologies 23
How to Count (cooccurrence)
for each (C,B)=(“context”, “word”):
if (!filter(C) && !filter(B)):
right[B]++
for each A in history(C):
count[A,B]++
left[A]++
history(C) += B
total++
- 24. © 2016 MapR Technologies 24
Seriously...
It really can be that simple
- 25. © 2016 MapR Technologies 25
Basic techniques
• Counting – often the hardest part
• LLR – the basic tool
• Order models
– Ordered cooccurrences
– Transition probabilities
– Recurrent neural networks
• Ploughing a quiet field
– Reimage servers often
– Force attackers to pivot repeatedly
- 26. © 2016 MapR Technologies 26
Target
Brobot
Brobot
Brobot
Example 1 - Ababil
Source
First level
C&C
Second
level C&C
Defense has to
happen here
- 27. © 2016 MapR Technologies 27
Spot the Important Difference?
GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1
Host: www.sometarget.com
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Accept-Encoding: deflate
Accept-Charset: UTF-8
Accept-Language: fr
Cache-Control: no-cache
Pragma: no-cache
Connection: Keep-Alive
GET /photo.jpg HTTP/1.1
Host: lh4.googleusercontent.com
User-Agent: Mozilla/5.0 (Macint
Accept: image/png,image/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate,
Referer: https://www.google.com
Connection: keep-alive
If-None-Match: "v9”
Cache-Control: max-age=0
Attacker request Real request
- 28. © 2016 MapR Technologies 28
Spot the Important Difference?
GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1
Host: www.sometarget.com
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Accept-Encoding: deflate
Accept-Charset: UTF-8
Accept-Language: fr
Cache-Control: no-cache
Pragma: no-cache
Connection: Keep-Alive
GET /photo.jpg HTTP/1.1
Host: lh4.googleusercontent.com
User-Agent: Mozilla/5.0 (Macint
Accept: image/png,image/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate,
Referer: https://www.google.com
Connection: keep-alive
If-None-Match: "v9”
Cache-Control: max-age=0
Attacker request Real request
- 29. © 2016 MapR Technologies 29
This could only be found at scale
- 30. © 2016 MapR Technologies 30
Target
Brobot
Brobot
Brobot
Overall Outline Again
Source
First level
C&C
Second
level C&C
Tradecraft error!
- 31. © 2016 MapR Technologies 31
Large corpus analysis of source
IP’s wins big
- 33. © 2016 MapR Technologies 33
Example 2 - Common Point of Compromise
• Scenario:
– Merchant 0 is compromised, leaks account data during compromise
– Fraud committed elsewhere during exploit
– High background level of fraud
– Limited detection rate for exploits
• Goal:
– Find merchant 0
• Meta-goal:
– Screen algorithms for this task without leaking sensitive data
- 34. © 2016 MapR Technologies 34
Example 2 - Common Point of Compromise
skim exploit
Merchant 0
Skimmed
data
Merchant n
Card data is stolen
from Merchant 0
That data is used
in frauds at other
merchants
- 35. © 2016 MapR Technologies 35
Simulation Setup
0 20 40 60 80 100
0100300500
day
count
Compromise period
Exploit period
compromises
frauds
- 36. © 2016 MapR Technologies 36
Simulation Strategy
• For each consumer
– Pick consumer parameters such as transaction rate, preferences
– Generate transactions until end of sim-time
• If merchant 0 during compromise time, possibly mark as compromised
• For all transactions, possible mark as fraud, probability depends on history
• Merchants are selected using hierarchical Pittman-Yor
• Restate data
– Flatten transaction streams
– Sort by time
• Tunables
– Compromise probability, transaction rates, background fraud, detection
probability
- 38. © 2016 MapR Technologies 38
●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ●●● ●●●●● ●●●●● ●●● ●●● ●● ● ●● ●● ●● ● ●●●● ●●●● ●● ●●●● ●●●● ●●● ●● ●● ● ●● ● ●●●● ●● ● ●●●● ●●●●●● ●● ●● ●●● ●●● ●●●●● ● ●●● ●● ●●● ●●● ●● ●●●● ●
●● ●●● ●●● ●
●
● ●●
●
●
●
●●
020406080
LLR score for real data
Number of Merchants
BreachScore(LLR)
Real truly bad guys
100
101
102
103
104
105
106
Really truly bad guys
- 39. © 2016 MapR Technologies 39
Historical cooccurrence gives high
S/N
- 40. © 2016 MapR Technologies 40
Summary
• The world can be seen as sequences of symbols
• We can find patterns
• Those patterns can nail opponents
• Many patterns only appear at scale
• You can do this
- 42. © 2016 MapR Technologies 42
Short Books by Ted Dunning & Ellen Friedman
• Published by O’Reilly in 2014 and 2015
• For sale from Amazon or O’Reilly
• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/recommend
ation-ebook
- 43. © 2016 MapR Technologies 43
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book
signing today
(oops… that was earlier)
http://bit.ly/mapr-ebook-streams
- 45. © 2016 MapR Technologies 45
Q&A
@mapr maprtech
tdunning@mapr.tech.com
Engage with us!
MapR
maprtech
mapr-technologies