The document describes a global botnet detector that was created to detect botnet activity across multiple countries in near real-time. It works by aggregating web traffic data from various sites, calculating correlations between traffic from different countries, and flagging coordinated spikes as potential botnet alerts. It then analyzes user behavior from the flagged countries to produce a list of suspect botnet participants and their threat scores. The tool was able to successfully detect a real botnet attack and identify all of the users responsible based on an investigation for a customer. Future work could integrate it into a machine learning product and address limitations like intra-country botnet activity.
2. ABOUT ME
HI THERE!
Data Scientist
at Distil Networks
Email: brenton.mallen@distilnetworks.com
Blog: carpefridiem.wordpress.com
Twitter: @BrentonMallen
2
3. MOTIVATION
WHAT’S MY MOTIVATION?
▸ Product of an investigation of a DDoS attack on a customer
▸ Wanted a means to be alerted
▸ Wanted a means to identify a group of users potentially
responsible
3
4. BOTS & BOTNETS
WHAT ARE THESE BOTS I KEEP HEARING ABOUT?
▸ Automated code that
pretends to be human
▸ Used to traverse them
internets
▸ Not all bots are bad
4
5. BOTS & BOTNETS
IS A BOT REALLY ALL THAT DANGEROUS?
▸ Botnets can cause damage:
▸ DDoS
▸ Mass Security Breaches
▸ Mass Data Theft
5
9. BOTNET DETECTOR
WHAT ARE THE GOALS OF A BOTNET DETECTOR?
▸ Detect
▸ Presence of a Botnet
▸ Identify
▸ List of Suspects
9
10. BOTNET DETECTOR
WHAT TOOLS DO WE USE?
▸ Python
▸ Boto
▸ Numpy
▸ AWS
▸ Hadoop
▸ Hive
▸ M-R Streaming
1.25 Billion Logs = 600 GB of Data
per Day
10
11. BOTNET DETECTOR
HOW DO WE DETECT A BOTNET?
▸ Part 1: Detect - For a given site, for each time window:
AGGREGATE COUNTRY
TRAFFIC
CHECK FOR
COORDINATED TRAFFIC
PRODUCE ALERT
11
30. IDENTIFY PARTICIPANTS
HOW DO WE FIND THOSE RESPONSIBLE?
▸ Part 2: Identify Participants
▸ From Detection Phase
▸ Times of Alerts
▸ Participating Countries
▸ Requires User Fingerprint
▸ ID Based on Various User Configuration Parameters
30
31. IDENTIFY PARTICIPANTS
HOW DO WE FIND THOSE RESPONSIBLE?
ISOLATE USERS IN
COUNTRIES
CHECK FOR MULTI-
COUNTRY PRESENCE
FIND COORDINATED
USERS
31
32. IDENTIFY PARTICIPANTS
Argentina - South AfricaIndonesia - Russian Federation
0.77
0.94
RequestCounts
Time
Threat
Score
32
A1
A2
B1
B2
33. IDENTIFY PARTICIPANTS
WHAT DOES THE FINAL OUTPUT LOOK LIKE?
ID Threat Score
007E6ABE-A48C-3DE5-81E0-CBECBC2C96AB 0.82
07EF4DBE-EC0D-3BCE-A5BA-5910FF2457F5 0.97
0CCA9DA5-D63D-34E9-85A1-55154E5480E2 0.96
17C00FD8-E931-3789-AAC4-ED004C9143DB 0.90
22533F87-4B97-356A-95A4-84D5A8841F63 0.78
2E1C87C1-90BF-37BB-9A33-C482038AEE57 0.92
2F91B34E-AB15-389B-BCB6-8D913135D 0.95
3F6B5DF3-607E-3F1F-8050-2932B11D9E8A 0.94
46069A1E-F077-3F78-870A-C9BD7A0E1740 0.81
58A8DB25-2B99-3D2F-BA6D-50D1A8CFF3E9 0.77
58CBD814-CAC1-3644-8AB9-99A3C07A8E8F 0.70
6336DAC7-6508-3E79-9D99-37034A7C2E3F 0.83
655A6266-D316-360C-BAC1-76F26F3C0643 0.72
66C3A2B1-2953-3848-882C-591224C77E33 0.91
34. RECAP
WHAT DID WE DO?
DETECTED THE
PRESENCE OF A
BOTNET
SCRUTINIZED USERS
FROM PARTICIPATING
COUNTRIES
PRODUCED A LIST OF
SUSPECT USERS
35. PERFORMANCE
HOW DOES IT PERFORM?
▸ Prototype - Looks at Past Data
▸ Applied to an attack investigation
▸ 10 alerts over the month in question
▸ 100% of responsible users*
▸ Botnet Limited to Cross-Country
▸ Lacks Sub-Country insight
* Deemed responsible by the customer
35
36. FUTURE WORK
WHERE DO WE GO FROM HERE?
▸ Integrate into ML product
▸ Extract Features from Suspects
▸ Address Pitfalls
▸ Inefficiencies Due to Sparsity, Intra-country Activity
▸ 24/7 Streaming Process Across all customer sites
▸ Utilize New Tools
▸ Spark, storm, etc.
▸ Internal Platform
36