2. 2
project background
PhD: “a botnet needle in a virtual haystack”
- a mechanism to capture botnet communication traffic in
virtualised environments such as Cloud Service Providers.
why?
•cloud providers are building block for IoT
•a great hosting platform for botnets
interesting built environment:
•tenant isolation, data privacy
•internal infrastructure is an attack surface
2
5. 5
packet capture
Three drawbacks in this
scenario:
1) port mirroring doubles
network bandwidth volumes
2) assumes the monitored
devices support mirroring
3) big fat pipe to send
traffic to a 3rd
party SOC
5CORE
SWITCH
CORE
SWITCH
EDGE
SWITCH
EDGE
SWITCH
A B
Third Party
Capture Probe
RSPAN
VLAN
EDGE
SWITCH
C
Third
Party
Managed
SOC
6. 6
history lesson
1980’s
Simple Network Management Protocol (SNMP)
• MIB information is limited, so use syslog
• Syslog is unstructured
1990’s
• 1991 – IETF proposed packet aggregation into flows
• 1993 – Disbanded due to lack of interest
• 1996 – Cisco patented NetFlow
• continued…
6
7. 7
history lesson
1996 – Cisco patented NetFlow
flow aggregates similar traffic based on an attributes tuple:
e.g. 5 field flow tuple: { sIP, dIP, sPort, dPort, protocol }
PCAP is a phone call
flow is the phone bill (who, when, how long)
7
9. 9
history lesson
1996 – Cisco patented NetFlow
2002 – NetFlow v5
2004 – NetFlow v9
NetFlow was designed for application to network management,
but has limitations when applied to threat detection:
•NFv5 has 18 fixed fields (only 10 useful!)
•header information ONLY
•transport layer is UDP only
•no support for: MPLS, IPv6, VLANs, MAC addresses
•[typically 1:50 sampling rates]
Cisco NFv9 supports (most) of these, but is proprietary.
9
10. 10
IP Flow Information eXport
2013 – IPFIX the flow export standard (RFC7011 - RFC7015)
IPFIX IS A FLOW EXPORT PROTOCOL IN ITS OWN RIGHT (not NFv10)
»Standards-based: vendor neutrality
»Extensible: NFv5 – fixed template: 18 fields
NFv9 - 79 fields (104 if Cisco)
IPFIX - 433 Information Elements (IANA)
»EEs create your own bespoke Enterprise Elements
»Security: security by design
»Future-proof: supports IPv6, MPLS and multi-cast
10
14. 14
ipfix v pcap
14
PCAP IPFIX so what?
SPAN mirroring Inline TAP
mirroring doubles network
bandwidth,
TAP is more efficient
dedicated
infrastructure
s/w probe on any
device
more control over data capture,
lower data volumes
plain Text
encryption, replay
protection
security by design,
can be sent over internet
unstructured data structured data easier search
TB data volumes MB data volumes 97% reduction data volumes
full packet: payload
L3/L7 templates-
capture
privacy, lawful inspection.
15. 15
case studies
further IPFIX templates:
• botprobe : botnets
• smtpprobe : spam traffic
• httpprobe : malicious http streams
• iotprobe : malicious IoT traffic
• icsprobe : malicious Industrial Control Systems traffic
if an attribute is present in a packet [header or payload], we can capture
it.
15
17. 17
adaptive capture
17
machine learning genetic algorithms for real-
time template
adaption as traffic profiles change.
AI
97%
Less Data SOC
ANALYST
Network
Traffic
(PCAP
Stream)
Passive
Inline
Capture
A
B
E
A
B
C
D
E
A
B
C
D
E
A
D
T1 T2
18. 18
threat detection
three key phases of a cyber attack:
infection
detection
response
average time to detect a cyberattack is 205 days (Gartner, 2016)
the cost of a cyber attack is reputational, not just financial.
18
20. 20
big data challenge
20
97% reduction in threat
intel. data volumes
1)SOC team reacts faster
to cyberattacks
2) protecting business assets
and reputation
21. 21
new opportunities
template extensibility + big data reduction =
• automated mitigation
• legal interception
• pre-event forensics
• pcap indexing [flow indexing]
• new detection algorithms [not just for botnets]
21
22. 22
we need you…
if you are interested in collaboration,
We’d love to talk with you:
adrian.winckles@anglia.ac.uk
www.botprobe.co.uk
22
Hinweis der Redaktion
From ARU, Cambridge. Research AI in threat detection.
PHD – capture botnet comms traffic (ie. C&C to bot – keep alives, attack traffic). This has morphed into a method to reduce big data volumes threat detection
Wanted to do something in botnet detection. Already a bunch of v good detection algos, so decided to look into how we can make the capture process more efficient bearing in mind this is a big data environment
We set our scenario in CSP because:
1. CSP integral Building block for IOT [ease of access to centralised storage of device data], starting to see device intelligence outsourced to the cloud to reduce cost of end devices (Routers, IoT devices etc)
2. Cloud is a great platform to host bots (spin up VMs with C&C (redeploy if take down), or VM with bot(s) – no AV to bypass and easy to clone)
3. Interesting build environ
Got us thinking what would happen if malware could abuse the vulnerabilities in hypervisors (esp QEMU) to attack the infrastructure.
So how are we going to capture this botnet traffic? Obvious way is via full packet capture…
WShark will see A to B if on separate core, but not if B on same edge switch, as wont pass core
So need Wireshark on each nw segment –
Could use Port Mirroring (SPAN in cisco talk)
Set up a SPAN port on a switch
Mirror takes an EXACT COPY of the packet and forwards it to a collection device (probably via mgmt vlan)
This collection device will send it to the analysis device (might be a 3rd party soc)
(might be compression in SPAN traffic, but generating a LOAD of extra traffic)
3 DRAWBACKS:
double the bandwidth (because you are mirroring)
1GB edge, what you got in the core? 10GB -> generate TBs of data!
2) assumes the monitored device supports span. What about iot hubs?
You’ll probably want to send all this SPAN traffic to a 3rd party SOC
(can be compressed but still need a big fat pipe)
(encryption to avoid MITM)
So what else could we use?
1980s – SNMP was standard for network management. Poll a proprietary MIB on a device every x minutes regardless of a change on the device: up/down status, alters etc.
SNMP limited – cannot carry large amounts of device data, so if wanted more granular information you would use syslog alongside SNMP
Syslog is unstructured, must be rearranged before querying
1991 – Internet Engineering Task Force – aggregation for internet accounting.
Netflow came from Cisco Express forwarding – route the first packet, switch the rest (a way to speed up the network)
It works by grouping together similar traffic streams based on similar attributes:
So, if over a fixed collection period, 10 traffic streams match this tuple, they are exported as one aggregated flow.
For example, the standard 5 field tuple that is used a lot to aggregate nw magmt traffic is …..
10 streams in collection period, match tuple => 1 stream.
10 MB to 1 MB
NFv5 common in nw mgmt. = 90%+ devices
Flow EXPORT
The probe observes packets via a TAP, and takes a COPY of the fields needed (not the whole packet) –> Not interrupt traffic flow
So 2 lots of data reduction
TAP not mirror
good probe do aggregation (more common in IPFIX, so NF probes aggregate but slower)
2002, 1st commercial release of netflow.
Don’t get me wrong, Netflow was designed for network management statistics and it is good for that.
Not so useful for threat detection.
18 fields:
nw mgmt. fields – 10 any use! [don’t need input interface, next hop device, ToS, AS (Autonomous systems)]
fixed at 48 bytes, you cannot chose what to capture.
HEADER only – no L7 <- v imp!!!
UDP - fast, but no encryption, susceptible to replay attacks, etc
1:50 = in large NW, NF not efficient (poor aggregation) so typically sample 1:50 - miss botnet comms?
NFv9 to overcome this, but still proprietary.
In 2013, IETF standardised IPFIX.
It was developed (using NFv9 as a base) to address the drawbacks of Netflow.
export protocol in its own right. Many people call NFv9 IPFIX, or call it NFv10
For threat detection – NFv9 is non standard fore runner to IPFIX
Biggest advantage in threat detection is vendor neutral
VENDOR NEUTRAL = multi vendor nw
EXTENSIBLE
EES(L3 – L7)
Security = encrypt, replay and drop packet protection
Also:
bi-directional flows, configurable cache timeouts
Protocol: NF is UDP; IPFIX supports UDP / TCP / SCTP (TLS)
Transport: TCP/SCTP (stream control transmission protocol) –congestion awareness, protects against replay attack
25 million pcap streams
30 botnet families (HTTP, IRC, Spam, etc)
300 IEs and EEs
Uses rudimentary AI to select common attributes = occurrence, and relevance (basic ML)
NFv5 – 48 bytes fixed, 10 fields
IPFIX – 43 bytes (core), 19 fields, L7
97% vol reduction
Python search for malware – 3mins v .2 sec
Repeated 30 expts (used NFv5 and/or PCAP) (2 used NFv9, 0 used IPFIX)
NFv5 + PCAP = lots of data
Benefits:
Data volume reduction
NO CHANGE TO ALGO
capture ANYTHING in a packet => new detection algorithms.
IPFIX benefits:
Inline tap – reduced data
Sw – deploy anywhere
Structured data – easier to search, deploy to graph/elk
MB data
L7
working on some case studies, with some organisations.
More IoT and ICS [Modbus, scada]
Software on devices.
Small footprint devices (tested on PI 0)
IPFIX = less data
Genetic algo
Real time adaption – T1 & T2 ABE -> AD
What about latency here as AI updates template? (cache pcap?)
Infection -> detection = 205 days. Longest period in attack cycle, nw most vulnerable
We want to reduce this infection to detection time.
How much does it cost your business per hour during a cyber breach? Its not just financial, it reputation damage and brand credibility.
Threat detection is a big data challenge
At botprobe, we tackle the challenges of network threat big data.
Can you see threat in big data?
SCRAP the CRAP. (97% reduction)
Analyst efficiency.
So we think our research opens up new opportunities that
Use template Extensibility to reduce big data.
Automated mitigation = less traffic for AI (needs more traffic when learning, but want less when detecting)
Interception = less traffic to store/index
Forensics = cannot do with PCAP (too much data)
New algos – capture anything in packet [L3 – L7]