The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming Botnet Mitigation

The Spammer, the Botmaster, and the Researcher: On the
Arms Race in Spamming Botnet Mitigation

Gianluca Stringhini

Major Area Exam

December 5, 2011

What is spam?

Spam is a big problem
Everyone receives spam
90-95% of emails are spam

Organic vs. Junk food
Spam vs. Ham
We need a deﬁnition a
computer can understand
Unsolicited Bulk Email

Early days spam

Spam as a hobby
Businesses ran from home’s basement

CAN-SPAM Act (2003)
Doesn’t forbid to spam, but the spammer
has to be nice.
$16k ﬁne per violating email

The world is big
Not every country prosecutes spammers

Modern spam

1

Aﬃliate programs [Samosseiko 2009]
Are banks the weak link? [Levchenko 2011]
1
source: Levchenko et al., Click Trajectories: End-to-End Analysis of the
Spam Value Chain

Is Spam Profitable?

Yes, it is
Estimates between $300k and $1M a month for large affiliate
programs [Kanich 2008, Kanich 2011]

Relatively low risk
Small fishes are the ones who get caught
The geographic dispersion makes coordinated actions difficult

How is Spam Delivered?
Botnets
Botnets are networks of compromised computers that act under the
control of a single entity (Botmaster)

What are botnets used for?
Running DoS
Stealing Information
Solving Captchas
Sending spam
Botnets are responsible for 85% of worldwide spam
Why botnets?
Botnets combine the best of two worlds: worms and IRC bots
Researchers and Botmasters are involved in an arms race

Botnet Evolution - Structure

SDBot 2002


IRC botnets
The C&C is an IRC server
Bots join a channel and get orders

Problems
Researchers can join the channel too
DNS sinkholing is possible


MyDoom 2004


Proprietary protocol botnets
The C&C uses a proprietary encrypted protocol
Two architectures:
Pull architecture
Push architecture

Problems
Researchers can reverse engineer the protocol
DNS sinkholing is still possible


Lethic 2007


Multiple tier botnets
The bots don’t connect directly to the C&C
The domains used by the proxies use Fast Flux

Fast Flux
Technique similar to Round-robin DNS and CDNs
Give high reliability for the botnet backbone
Many IP addresses associated to a domain
Low TTL, the record changes all the time


Problem
The domains used can still be sinkholed / blacklisted

The solution
Domain Generation Algorithms
Bots contact a domain according to a time-dependent algorithm
Used by Torpig (2008)

Problems
The algorithm can be reverse engineered [StoneGross 2009a]
Botmasters can add non-determinism (e.g., Twitter trends)


Storm 2007


Peer-to-peer botnets
Bots with private IPs act as workers
Bots with public IPs act as proxies
Workers ﬁnd proxies based on some overnet protocol

Problem
Proxies are not under the control of the botmaster
Researchers can impersonate a proxy and inﬁltrate the botnet

Botnet Evolution - Infection model

Worm-like spread
The bot scans the network for vulnerabilities and propagates

Non-spreading bots
Infections are propagated through
Drive-by-download websites [Provos 2008, StoneGross 2011]
Email attachments

Pay-per-Install
The new trend is paying third parties for “installing” a certain number
of bots [Caballero 2011]

Botnet and Spam Mitigation

Many Possible Vantage Points

Host-based detection

Traditional anti-virus approach
Look for the presence of virus speciﬁc instructions in the binaries
Antiviruses can be fooled by simple obfuscations
[Christodorescu 2003, Christodorescu 2004]

Obfuscations
NOP insertion and code transposition are usually enough
Metamorphic malware
Polymorphic malware

Host-based detection
Static analysis
Take program semantics into account [Christodorescu 2003,
Christodorescu 2005]

Dynamic analysis
Model the behavior of a program (e.g., using system calls)
[Kolbitsch 2009]
Monitor access to sensitive information [Yin 2007]
Reverse engineer of the C&C protocol [Caballero 2009]

Problems
Program equivalence is undecidable!
Analysis of samples takes time and resources

Malicious Web Pages Detection
Infection happening through browser exploits are a big problem
Detecting Drive-by-Download pages
Malicious Javascript can be detected by:
Emulation [Cova 2010]
Monitoring system changes [Provos 2008]
Hooking runtime [Curtsinger 2011, Heiderich 2011]
Look for common attack patterns (e.g., heap spray)
[Ratanaworabhan 2009]

Problems
The analysis could be detected
These systems might not detect newer attacks

Command and Control based Detection

Command and Control-based Detection

IRC server inﬁltration [AbuRajab2006]
Protocol Reverse Engineering
Protocol reverse engineering by active probing [Cho 2010a]
This enables botnet inﬁltration [Stock 2009, Kreibich 2009,
Cho 2010b]

Botnet Takeovers
Reverse engineering of DGAs [StoneGross 2009a]
This enables C&C impersonation [StoneGross 2009a]

Honeypots
Running bots in virtual machines allows to learn important botnet
features [John 2009]

This can be used for
Blacklisting the domains that host C&C servers
[StoneGross 2009b]
Performing botnet takedowns [StoneGross 2011]

Problems
Bots might detect virtualization [Balzarotti 2010]
Containment problems arise [Kreibich 2011]

DNS Based Detection

Detecting infected IPs
DNS sinkhole [Dagon 2006]
Look for DNS cached results [AbuRajab 2006]

Detect Fast-Flux Domains
Fast Flux domains present very diﬀerent characteristics than
legitimate ones [Holz 2008, Passerini 2008, Hu 2009]
IPs belong to diﬀerent networks
TTL is low
results change very frequently

DNS Based Detection

Detecting Malicious Domains
It is possible to build classiﬁers to detect malicious domains
Passive analysis of RDNSs queries [Antonanakis 2010,
Bilge 2011]
Limitation: only local view
Analysis at the authoritative server level or TLDs
[Antonanakis 2011]
Limitation: it can be evaded using diverse DNS servers

SMTP based Detection: Content Analysis
Rule-based Spam Detection
The nature of spam changes over time
Having a binary decision introduces problems.

Machine Learning
Bayesian Filtering: uses na¨ Bayes [Sahami 1998,
ıve
Androutsopolous 2000]
Support Vector Machines [Drucker 1999]

Problems
Feature selection has to be performed
“Good word” attacks are possible [Lowd 2005, Karlserger 2007]

SMTP based Detection: Content Analysis
Assign a Reputation to Received Emails
Diﬀerent features between spam and ham [Hao 2009]

Building Signatures from Spam
[Pitsillidis 2010] ran bots and assigned templates to diﬀerent botnets

Detect Spam by Looking at URLs
Study the URL structure [Xie 2008, Ma 2009]
Learning features from the landing page [Thomas 2011]

Problem
In general, content analysis is expensive

SMTP based Detection: IP Blacklisting
DNS-based blacklists
Mailservers can query the service to know whether an individual IP is
a known spammer

Problems
Low coverage [Ramachandran 2006a, Sinha 2008]
Bot machines have dynamic IPs
What happens when IPv6 takes over?

Better Approaches
IP reputation [Ramachandran 2006b, Sinha 2010, Qian 2010]
Behavioral blacklisting [Ramachandran 2007, Stringhini 2011]

SMTP based Detection: Policies
Greylisting
If a delivery temporary fails, spambots will not try again
Easy to bypass and prone to false positives [Levine 2005]
Multi-level greylisting [Janecek 2008]

Sender Validation
Spam pretends to come from legitimate addresses
SPF,DomainKeys,DKIM [Leiba 2007]

The solution chosen by Google
User voting on spam and ham [Taylor 2006]

Main problem: Spam hits server performances!
Mail prioritization systems [Twining 2004, Venkataraman 2007]

Social Network Detection
Online Social Networks are very successful
Users are not as risk aware as they are with email spam

Miscreants create fake profiles to spread spam
Systems to detect fake profiles have been developed
[Benvenuto 2010, Lee 2010, Stringhini 2010, Yang 2011a,
Yang 2011b]

Real accounts that get compromised are more valuable
45% of social network users click on any link by their friends
[Bilge 2009]
89% of profiles sending malicious content on Facebook are
compromised [Gao 2010]

Intrusion Detection
Signature-based intrusion detection
Snort,Bro [Paxson 1998]

Problems
Constant need of new rules
Problems with encrypted traffic

Anomaly-based intrusion detection
The system learns the “normal” behavior of a network and flags
anomalies [Portnoy 2001, Kruegel 2002, Wang 2004]

Problems
What is ”normal“ behavior?
It is hard to get traffic that is free of infections

Network Edge Detection
Detecting Successful Infections
Botnet infection can as a set of communication flows [Gu 2007]
Problem: what’s the infection model of a botnet?

Detecting Malicious Activity
Correlation between C&C commands and malicious activity
[Gu 2008a]
How to identify C&C traffic?
Well-known protocols (e.g., IRC, HTTP) [Gu 2008b]
Look for malicious activity first [Wurzinger 2010]

Leverage Previous Knowledge
Detect hosts that contact the same IPs as infected machines
[Coskun 2010]

How About the Future?
The arms race between researchers and cybercriminals is far from
being over

Is security research like ﬁghting the Hydra?

Future Directions

Botmasters will keep developing more sophisticated techniques

However, a functional botnet has to interact with legitimate services
DNS servers
SMTP servers
Web servers
Social Networks
This interaction cannot be obfuscated!

My Research

In my research, I focus on analyzing how bots interact with
legitimate, third party services

Bots can be distinguished from real users in the way they use such
services

The main reason is that bots have a diﬀerent goal than real users:

Fast interaction vs. Good user experience

My Research
So far, I have been looking at:
Social Networks
How fake accounts diﬀer from legitimate ones [ACSAC 2010]
How users behavior change once an account is compromised
[In submission]

SMTP servers
Distinguishing bots:
based on the destinations they target [USENIX 2011]
based on the (wrong) way in which they implement SMTP
[Work in progress]

My Research

Other interesting areas:
Login patterns on Social Networks
Interaction with search engines (e.g., SEO)

What if bots started behaving like legitimate users / programs?
This conﬂicts with their goal!

Thanks!

email: gianluca@cs.ucsb.edu
twitter: @gianlucaSB

The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming Botnet Mitigation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming Botnet Mitigation

Ähnlich wie The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming Botnet Mitigation (20)

Mehr von Gianluca Stringhini

Mehr von Gianluca Stringhini (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming Botnet Mitigation