SlideShare a Scribd company logo
1 of 32
Download to read offline
BotMagnifier: Locating Spambots on the Internet




                             Gianluca Stringhini
                               Thorsten Holz
                              Brett Stone-Gross
                             Christopher Kruegel
                               Giovanni Vigna

                            USENIX Security Symposium


                               August 12, 2011
Spam is a big problem
Spam is sneaky
Spam is sneaky
Tracking Spambots is important


Botnets are responsible for 85% of worldwide spam




  ISPs and organizations can clean up their networks
  Existing blacklists (DNSBL) can be improved
  Mitigation efforts can be directed to the most aggressive botnets
Tracking Spambots is challenging

    The IP addresses of infected machines change frequently
    It is easy to recruit “new members” into a botnet
e

An approach is to set up spam traps. However, a few problems arise:


    Only a subset of the bots will send emails to the spam trap
    addresses
    Some botnets target only users located in certain countries
Basic Insight

Bots that belong to the same botnet share similarities


As a result, they will follow a similar behavior when sending spam


Commoditized botnets could appear as multiple botnets


By observing a portion of a botnet, it is possible to identify more bots
that belong to it
Our Approach
Our Approach
Our Approach
Our Approach
Our Approach
Our Approach
Our Approach
Input Datasets


How can we achieve this?



Our approach takes two datasets as input:
  The IP addresses of known spamming bots, grouped by spam
  campaign (seed pools)
  A log of email transactions carried out on the Internet, both
  legitimate and malicious (transaction log)
Our System

We implemented our approach in a tool, called BotMagnifier


We used a large spam trap to populate seed pools


We used the logs of a Spamhaus mirror as transaction log
  Each query to the Spamhaus mirror corresponds to an email
  We show how BotMagnifier also works when using other
  datasets as transaction logs
Our System


BotMagnifier is executed periodically


It takes as input a set of seed pools


At the end of each observation period, it outputs:
  The IP addresses of the bots in the magnified pools
  The name of the botnet that carried out each campaign
Phase I: Building Seed Pools

Set of IP addresses that participated in a specific spam campaign

Built using the data of a spam trap set up by a large US ISP

                       ≈ 1M messages / day


We consider messages with similar subject lines as part of the same
campaign

Design decisions:
  Minimum seed pool size: 1,000 IP addresses
  Observation period: 1 day
Phase II: Characterizing Bot Behavior

For each seed pool:
  We query the transaction log to find all the events that are
  associated with the IP addresses in it
  We analyze the set of destinations targeted and build a target set

Problem
The target sets of two botnets might have substantial overlaps


We extract the set of destinations that are unique to each seed pool
(characterizing set)
Phase III: Bot Magnification

Goal: find the IP addresses of previously-unknown bots


BotMagnifier considers an IP address x as behaving similarly to
the bots in a seed pool if:
  x sent emails to at least N destinations in the target set
  x never sent an email to a destination outside the target set
  x has contacted at least one destination in the characterizing set


How large should N be?
Threshold Computation
N should be greater for campaigns targeting a larger number of
destinations


                      N = k · |T (pi )|, 0 < k ≤ 1

where |T (pi )| is the size of the target set, and k is a parameter

Precision vs. Recall analysis on ten campaigns for which we had
ground truth (coming from Cutwail C&C servers)


               k = kb +      α
                          |T (pi )|   → kb = 8 · 10−4 , α = 10
Phase IV: Spam Attribution
We want to “label” spam campaigns based on the botnet that carried
them out

Running Malware Samples
We match the subject lines observed in the wild with the ones of the
bots we ran


Botnet Clustering
  IP overlap
  Destination distance
  Bot distance
Validation of the Approach
To validate our approach, we studied Cutwail, for which we had direct
data about the IP addresses of the infected machines
The C&C servers we analyzed accounted for approximately 30% of
the botnet
We ran the validation experiment for the period between July 28 and
August 16, 2010
For each of the 18 days:
  We selected a subset of the IP addresses referenced by the C&C
  servers
  With the help of the spam trap, we identified the campaigns
  carried out
  We generated the seed and magnified pools
BotMagnifier identified 144,317 IP addresses as bots. Of these,
33,550 were actually listed in the C&C databases (≈ 23%).
Overview of Tracking Results
We ran our system between September 28, 2010 and February 5, 2011
BotMagnifier tracked 2,031,110 bot IP addresses
Of these, 925,978 belonged to magnified pools, while the others
belonged to seed pools
1.6% estimated false positives


 Botnet     Total # of IP addresses    # of ”static“ IP addresses
  Lethic            887,852                     117,335
 Rustock            676,905                     104,460
 Cutwail            319,355                      34,132
 MegaD               68,117                      3,055
 Waledac             36,058                      3,450
Overview of Tracking Results
Overview of Tracking Results
Overview of Tracking Results
Overview of Tracking Results
Application of Results
Can BotMagnifier improve existing blacklists?

We analyzed the email logs from the UCSB CS mail server from
November 30, 2010 to February 8, 2011
  If a mail got delivered, the IP address was not blacklisted at the
  time
  The spam ratios computed by SpamAssassin provide us with
  ground truth
28,563 emails were marked as spam, 10,284 IP addresses involved.
295 of them were detected by BotMagnifier, for a total of 1,225
emails (≈ 4%)

We then looked for false positives. BotMagnifier wrongly
identified 12 out of 209,013 IP addresses as bots.
Data Stream Independence

We show how BotMagnifier can be used on alternative datasets,
too

We used the netflow logs from an ISP backbone routers
1.9M emails logged per day

We had to use new values for kb and α

The experiment lasted from January 20, 2011 to January 28, 2011.

BotMagnifier identified 36,739 in magnified pools. This grew the
seed pools by 38%.
Conclusions


We presented BotMagnifier, a tool for tracking and analyzing
spamming botnets


We showed that our approach is able to accurately identify and track
botnets


By using more comprehensive datasets, the magnification results
would get better
Thanks!


email: gianluca@cs.ucsb.edu
twitter: @gianlucaSB

More Related Content

Viewers also liked

The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...Gianluca Stringhini
 
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesShady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesGianluca Stringhini
 
The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?Gianluca Stringhini
 
Follow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsFollow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsGianluca Stringhini
 
EvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online ServicesEvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online ServicesGianluca Stringhini
 
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...Gianluca Stringhini
 

Viewers also liked (6)

The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
 
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesShady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
 
The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?
 
Follow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsFollow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower Markets
 
EvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online ServicesEvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online Services
 
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
 

Similar to BotMagnifier: Locating Spambots on the Internet

Monitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetMonitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetIOSR Journals
 
Guarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkGuarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkEditor IJCATR
 
Detecting Spam Zombies by Monitoring Outgoing Messages
Detecting  Spam Zombies  by  Monitoring  Outgoing  MessagesDetecting  Spam Zombies  by  Monitoring  Outgoing  Messages
Detecting Spam Zombies by Monitoring Outgoing Messages Gowtham Chandra
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Jishnu Pradeep
 
Tracing Back The Botmaster
Tracing Back The BotmasterTracing Back The Botmaster
Tracing Back The BotmasterIJERA Editor
 
Detecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSDetecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSijsrd.com
 
Detection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsDetection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsCSCJournals
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation methodAcad
 
Synopsis viva presentation
Synopsis viva presentationSynopsis viva presentation
Synopsis viva presentationkirubavenkat
 
A Brief Incursion into Botnet Detection
A Brief Incursion into Botnet DetectionA Brief Incursion into Botnet Detection
A Brief Incursion into Botnet DetectionAnant Narayanan
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptxAnush90
 
Spams meet cryptocurrencies
Spams meet cryptocurrenciesSpams meet cryptocurrencies
Spams meet cryptocurrenciesMatteo Romiti
 
Lab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxLab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxsmile790243
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationIJERA Editor
 
Honeypots.ppt1800363876
Honeypots.ppt1800363876Honeypots.ppt1800363876
Honeypots.ppt1800363876Momita Sharma
 

Similar to BotMagnifier: Locating Spambots on the Internet (20)

Monitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetMonitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in Internet
 
Guarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkGuarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social Network
 
Detecting Spam Zombies by Monitoring Outgoing Messages
Detecting  Spam Zombies  by  Monitoring  Outgoing  MessagesDetecting  Spam Zombies  by  Monitoring  Outgoing  Messages
Detecting Spam Zombies by Monitoring Outgoing Messages
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
 
Tracing Back The Botmaster
Tracing Back The BotmasterTracing Back The Botmaster
Tracing Back The Botmaster
 
B0940509
B0940509B0940509
B0940509
 
Detecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSDetecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBS
 
Detection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsDetection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P Botnets
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation method
 
Botnets
BotnetsBotnets
Botnets
 
Synopsis viva presentation
Synopsis viva presentationSynopsis viva presentation
Synopsis viva presentation
 
A Brief Incursion into Botnet Detection
A Brief Incursion into Botnet DetectionA Brief Incursion into Botnet Detection
A Brief Incursion into Botnet Detection
 
2 dc meet new
2 dc meet new2 dc meet new
2 dc meet new
 
Botnets
BotnetsBotnets
Botnets
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
Spams meet cryptocurrencies
Spams meet cryptocurrenciesSpams meet cryptocurrencies
Spams meet cryptocurrencies
 
Lab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxLab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docx
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classification
 
Honeypots.ppt1800363876
Honeypots.ppt1800363876Honeypots.ppt1800363876
Honeypots.ppt1800363876
 
Botnets 101
Botnets 101Botnets 101
Botnets 101
 

Recently uploaded

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

BotMagnifier: Locating Spambots on the Internet

  • 1. BotMagnifier: Locating Spambots on the Internet Gianluca Stringhini Thorsten Holz Brett Stone-Gross Christopher Kruegel Giovanni Vigna USENIX Security Symposium August 12, 2011
  • 2. Spam is a big problem
  • 5. Tracking Spambots is important Botnets are responsible for 85% of worldwide spam ISPs and organizations can clean up their networks Existing blacklists (DNSBL) can be improved Mitigation efforts can be directed to the most aggressive botnets
  • 6. Tracking Spambots is challenging The IP addresses of infected machines change frequently It is easy to recruit “new members” into a botnet e An approach is to set up spam traps. However, a few problems arise: Only a subset of the bots will send emails to the spam trap addresses Some botnets target only users located in certain countries
  • 7. Basic Insight Bots that belong to the same botnet share similarities As a result, they will follow a similar behavior when sending spam Commoditized botnets could appear as multiple botnets By observing a portion of a botnet, it is possible to identify more bots that belong to it
  • 15. Input Datasets How can we achieve this? Our approach takes two datasets as input: The IP addresses of known spamming bots, grouped by spam campaign (seed pools) A log of email transactions carried out on the Internet, both legitimate and malicious (transaction log)
  • 16. Our System We implemented our approach in a tool, called BotMagnifier We used a large spam trap to populate seed pools We used the logs of a Spamhaus mirror as transaction log Each query to the Spamhaus mirror corresponds to an email We show how BotMagnifier also works when using other datasets as transaction logs
  • 17. Our System BotMagnifier is executed periodically It takes as input a set of seed pools At the end of each observation period, it outputs: The IP addresses of the bots in the magnified pools The name of the botnet that carried out each campaign
  • 18. Phase I: Building Seed Pools Set of IP addresses that participated in a specific spam campaign Built using the data of a spam trap set up by a large US ISP ≈ 1M messages / day We consider messages with similar subject lines as part of the same campaign Design decisions: Minimum seed pool size: 1,000 IP addresses Observation period: 1 day
  • 19. Phase II: Characterizing Bot Behavior For each seed pool: We query the transaction log to find all the events that are associated with the IP addresses in it We analyze the set of destinations targeted and build a target set Problem The target sets of two botnets might have substantial overlaps We extract the set of destinations that are unique to each seed pool (characterizing set)
  • 20. Phase III: Bot Magnification Goal: find the IP addresses of previously-unknown bots BotMagnifier considers an IP address x as behaving similarly to the bots in a seed pool if: x sent emails to at least N destinations in the target set x never sent an email to a destination outside the target set x has contacted at least one destination in the characterizing set How large should N be?
  • 21. Threshold Computation N should be greater for campaigns targeting a larger number of destinations N = k · |T (pi )|, 0 < k ≤ 1 where |T (pi )| is the size of the target set, and k is a parameter Precision vs. Recall analysis on ten campaigns for which we had ground truth (coming from Cutwail C&C servers) k = kb + α |T (pi )| → kb = 8 · 10−4 , α = 10
  • 22. Phase IV: Spam Attribution We want to “label” spam campaigns based on the botnet that carried them out Running Malware Samples We match the subject lines observed in the wild with the ones of the bots we ran Botnet Clustering IP overlap Destination distance Bot distance
  • 23. Validation of the Approach To validate our approach, we studied Cutwail, for which we had direct data about the IP addresses of the infected machines The C&C servers we analyzed accounted for approximately 30% of the botnet We ran the validation experiment for the period between July 28 and August 16, 2010 For each of the 18 days: We selected a subset of the IP addresses referenced by the C&C servers With the help of the spam trap, we identified the campaigns carried out We generated the seed and magnified pools BotMagnifier identified 144,317 IP addresses as bots. Of these, 33,550 were actually listed in the C&C databases (≈ 23%).
  • 24. Overview of Tracking Results We ran our system between September 28, 2010 and February 5, 2011 BotMagnifier tracked 2,031,110 bot IP addresses Of these, 925,978 belonged to magnified pools, while the others belonged to seed pools 1.6% estimated false positives Botnet Total # of IP addresses # of ”static“ IP addresses Lethic 887,852 117,335 Rustock 676,905 104,460 Cutwail 319,355 34,132 MegaD 68,117 3,055 Waledac 36,058 3,450
  • 29. Application of Results Can BotMagnifier improve existing blacklists? We analyzed the email logs from the UCSB CS mail server from November 30, 2010 to February 8, 2011 If a mail got delivered, the IP address was not blacklisted at the time The spam ratios computed by SpamAssassin provide us with ground truth 28,563 emails were marked as spam, 10,284 IP addresses involved. 295 of them were detected by BotMagnifier, for a total of 1,225 emails (≈ 4%) We then looked for false positives. BotMagnifier wrongly identified 12 out of 209,013 IP addresses as bots.
  • 30. Data Stream Independence We show how BotMagnifier can be used on alternative datasets, too We used the netflow logs from an ISP backbone routers 1.9M emails logged per day We had to use new values for kb and α The experiment lasted from January 20, 2011 to January 28, 2011. BotMagnifier identified 36,739 in magnified pools. This grew the seed pools by 38%.
  • 31. Conclusions We presented BotMagnifier, a tool for tracking and analyzing spamming botnets We showed that our approach is able to accurately identify and track botnets By using more comprehensive datasets, the magnification results would get better