BlueHat v17 || A Lustrum of Malware Network Communication: Evolution and Insights

A LUSTRUM OF MALWARE
NETWORK COMMUNICATION
EVOLUTION AND INSIGHTS
C H A Z L E V E R , P h D C A N D I DAT E

WHY DO WE CARE ABOUT MALWARE?
What is malware?
• Quite simply it is malicious software (e.g., viruses, spyware,
ransomware, and adware).
Why do we care?
• Used for illicit activities that affect individuals, enterprises, and
even governments.
• Reverse engineering malware is the foundation upon which
numerous security defenses are based.

WHAT IS MALWARE ANALYSIS?
What is malware analysis?
• Process of studying the functionality and potential impact of
malware samples.
• Static analysis examines malware without executing it.
• Dynamic analysis examines malware by running it in controlled
sandbox.
Why is it important?
• This is how indicators of compromise (IOCs) and other
information are derived from actual malware samples.

MORE MALWARE, MORE PROBLEMS?
• Cyber attacks are on the rise.
• Malware has been at the center
of a number of these attacks.
• Despite access to more malware
samples than ever, malware
based security products did not
prevent threats.

HOW MUCH MALWARE IS THERE?
5
Collection
Issues

A LUSTRUM OF MALWARE
What did we do?
• Study the network signal extracted from malware over a
half decade.
What are we trying to understand?
• Is malware effective for use with early warning systems?
• What are limitations of systems that rely on malware
samples for defense?

WHERE’S THE DATA?
8
*All datasets correspond to January 2011 through August 2015.

MALWARE CLASSIFICATION
What’s the goal?
• Cluster AV labels from VirusTotal based on family.
• Link each family with a both a type and queried e2LDs.
• Will use this information to provide extra context in later analysis.
What did we do?
• Modify AVClass1 to spit out a type (i.e., PUP or malware) for each
sample.
• Ran over our dataset of 23.9M VirusTotal reports.
9

GROWTH IN MALWARE BY TYPE
10
Collection Issue

CLASSIFICATION RESULTS
• There are more malware families, but PUP families tend to
have more samples per family.
11
Top Families by Sample Top Families by e2LD
• Malware families tend to have more e2LDs per sample,
indicating greater domain polymorphism.

CLEANING UP DATASETS
Invalid Domains
• Remove NX Domains to reduce the effects of Domain Generation Algorithms (DGA)
• Reduction from 6.8M to 1.31M e2LDs
Benign Domains
• Remove popular domains from Alexa
• Remove known content delivery networks (CDN)
• Manually whitelist remaining domains
• Reduction from 1.31M to 1.29M e2LDs
Spam Domains
• Remove resolutions from binaries with lots of MX lookups
• Remove resolutions with mail related keywords (e.g., mail, smtp, imap)
• Reduction from 1.29M to 329,348 e2LDs
Reverse Zone Delegations
• Remove reverse delegations, which often result from system level processes and introduce lots of
noise.
• Reduction from 329,348 to 327,514 e2LDs
12

DOMAIN POLYMORPHISM
• Most malware samples resolve fewer than 10 unique full qualified
domains (FQDNs).
• Most registered domains only queried by a single, unique
malware sample.
• Evasion appears to happen on the registered domain.
Blacklisting domains may do little to prevent future
communication from new samples.
13
subdomain.example.com

MALWARE QUERYING DYNAMIC DNS
• Evasion happens on the child label.
• Queried 8.6M (32%) distinct samples in our dataset.
14
Description: The Top 100 most popular Dynamic DNS domains queried by malware
samples.

MALWARE QUERYING CDNS
• Most popular CDNs are the usual suspects.
• Malware communication is hiding in plain site.
15
Description: Complete list of all known CDN domains queried by malware samples in
our dataset.

MALWARE QUERYING DGA DOMAINS
• Over 12.5M (46%) of
malware samples contained
at least one NX domain.
16
• Before filtering, we found
that 3M (44%) of all
domains were in DGArchive.
• After filtering, we found that
55,396 (17%) of filtered
domains were in DGArchive.

MALWARE QUERYING SPAM DOMAINS
• Most spam related malware
samples queried hundreds or
thousands of MX domains.
17
• Most popular spam related
sample (i.e., MyDoom) is
over a decade old.

AN INCONVENIENT TRUTH
18
(a) pDNS (b) PBL
(c) Expired Domains
Description: Time difference
between a domain was first seen in
passive DNS, public blacklists, or an
expired domain list rather than
through dynamic malware analysis.

LIFETIME OF DOMAINS
19
(a) Malware (b) PUP (c) Unknown
Description: Joint distribution of domain lifetime and resolution frequency
observed in passive DNS for PUP, Malware, and Unclassified domains.
Notice similarities

INFRASTRUCTURE ANALYSIS
2012
Notice pockets of abuse

INFRASTRUCTURE ANALYSIS
2012 2013
2014 2015
Pockets of abuse
across all years

KEY TAKE-AWAYS
• Waiting for malware to be discovered results in long
windows of vulnerability and potentially limited efficacy.
• Network defenses have the potential to identify threats
before the malware sample is discovered.
• Malware analysis is still extremely useful, but it’s
important to understand the limitations.

THANK YOU
THANK YOU!
chazlever@gatech.edu
linkedin.com/in/chazlever
@chazlever

BlueHat v17 || A Lustrum of Malware Network Communication: Evolution and Insights

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie BlueHat v17 || A Lustrum of Malware Network Communication: Evolution and Insights

Ähnlich wie BlueHat v17 || A Lustrum of Malware Network Communication: Evolution and Insights (20)

Mehr von BlueHat Security Conference

Mehr von BlueHat Security Conference (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

BlueHat v17 || A Lustrum of Malware Network Communication: Evolution and Insights

Hinweis der Redaktion