An overview of Apache Metron, an open source platform for ingesting, enriching, triaging, and storing diverse cybersecurity feeds. Metron is built on top of hadoop and is horizontally scalable using commodity hardware.
2. Part 1 – Overview of Apache Metron
• Challenges with Today’s Security Tools to Combat Cyber Attacks
• Introduction to Apache Metron
• Personas and Core Themes
• Why Apache Metron?
Part 2 – Metron Architecture
• Telemetry Parsing
• Enrichment
• Threat Intelligence
• Alert Triage
• Index and Write to Storage
• Getting Started
Agenda
3. The Good Guys
Security
Practitioner
I have too many tools I need to learn
I don’t have a centralized view of my data
My tools are too expensive
I can’t find enough talent
I can’t keep relying on static rules
I need to discover bad stuff quicker
Most of my alerts are false positives
I have too many manual tasks
SOC
Manager
Threat landscape too dynamic
More assets/users to manage
Attack surface increases
Legacy techniques don’t work anymore
Metron will make it easier and faster to find
the real issues I need to act on
Metron is a more cost effective way for my team
to deal with the fast moving threat landscape
4. The Bad Guys
Advanced
Persistent
Threat
Script
Kiddie
My techniques are predictable and known
My attack vectors are also known
You are not the only person I’ve attacked
I brag about what I did or will do
I set off a large number of alerts
I fumble around a lot
I am very unique in a way I do things
I live on your network for about 300 days
I know what I am after and I look for it, slowly
Your rules will not detect me, I am too smart
I impersonate a legitimate user, but I don’t act like one
Metron can take everything that is known
about me and check for it in real time
Metron can model historical behavior of whoever I am
impersonating and flag me as I try to deviate
5. Problems With Existing Tools
Security
Information
Management
System
I am prohibitively expensive
I have vendor lock-in
I can’t deal with big data
I am not open
I am not extensible enough
Legacy
Point
Tools
I was built for 1995
I am super specialized
I don’t scale horizontally
I have a proprietary format
You need a PhD to operate me
Behavioral
Analytics
Tools
I am mostly vapor ware
I was built by a small startup
I was modeled after a data set from 1999
I spam you with false positives
6. Apache Metron Vision
“Apache Metron is a Security Data
Analytics Platform (SDAP). As a
next generation security analytics
framework, it is designed to
consume and monitor network
traffic and machine data within an
enterprise. Apache Metron is
extensible and is designed to work
at a massive scale. It is not a SIEM
but rather the next evolution of a
SIEM.”
Apache Metron provides the following capabilities:
Extensible ingest to monitor any telemetry source
Extensible enrichment framework for any telemetry
stream
Hadoop-backed storage for telemetry stream with a
customizable retention time for cost effective archive
Automated real-time index for telemetry streams
enabling real-time search
Telemetry correlation and SQL query capability for data
stored in Hadoop backed by Hive
ODBC/JDBC compatibility and integration with existing
analytics tools
7. Use Case Setup
• On 4/10, a user named Ethan V at Company Foo submits a security ticket complaining about a
potential Phishing Email.
• Details provided by the Ethan V in the ticket
• The email states that a signature is required for a new Docu-Sign document for a new Stock
Option grant for granted to Ethan from internal Finance employee Sonja Lar
• There is a link in the email to the Docu-Sign Document
• Ethan clicks on the link, and login appears
• Ethan enters his SSO credentials and submits
• On submission, nothing happens
• Ethan calls Sonja but Sonja states she didn’t send an email
• Ethan is worried and then files help desk security ticket
• A security ticket is created and assigned to the SOC Team
• A SOC analyst James picks up the case to investigate it.
8. Systems
Accessed for
Threat Scope
Systems
Accessed for
Forensics
Systems Accessed for Investigation/Context
SIEM
“Scope of Threat”
Workflow Steps
• Step 6: Searches SIEM for Fireye and
IronPort email events associated with
Sonja. The SIEM doesn’t have that info
• Step 6 Result: Need to log into Fireye
and IronPort
• Step 7: Log into Fireye Email Threat
Prevention Cloud & IronPort to find all
emails sent from Sonja from that malicious
IP
• Step 7 Result: Have a list of all users that
the Phishing email was sent to. Can reset
the password for all those users
Maxmind
(IP Geo
DB)
AD
(Identity
Mgmt.)
Asset
Mgmt.
Inventory
Soltra
(Threat
Intel)
Story Unfolding
• Step 1 Insight: Anomalous
Event – Corp Gmail was
decommissioned on behalf of
exchange months back and only
few users are currently using it
• Step 2 Insight: Not possible
for the same user be logging in
from Ireland & Southern Cali at
the same time.
• Step 3 Insight: Unauthorized
access is occurring from
Ireland
• Step 4 Insight: Seems like
Sonja is in Southern Cali but
someone else pretending to
be her is logging in from
unidentified Asset
• Step 5 Insight: Sonja’s
account has been
compromised. Shut it down
and Ethan’s credentials have
been reset. But what others
users are affected like Ethan?
• Step 6 Insight: SIEM doesn’t
have all the fireye email events
I need to determine scope
• Step 7 Insight: Understand the
scope of the threat and can can
contain it.
“Forensics”
Workflow Steps
• Step 8: Logs into Cisco IronPort to
determine when the attacker first
compromised Sonja’s Gmail account
• Step 8 Result: On 3/26, a user from
Ireleand logged into Sony’s Corp Gmail
Account
• Step 8 Insight: Understands
when Sonja’s Gmail Account
was first compromised
• Step 9: Logs into Intermedia, an email
archive system, to understand how the
account was compromised
• Step 9 Result: Sees a set of emails
where the attacker spoofed someone
else email address “warmed up’ her with
a few emails and then sent an email with
an link that Sonja clicked on which stole • Step 9 Insight: Understand
how Sonja’s account got
Systems Accessed for Remediation
Exchange
(Primary
Email
Service)
Corp Gmail
(Secondary
Email
Service)
AD & SSO
(Identity
Provider &
SSO)
Searc
h
FireEye
(Email
Cloud
Security )
Cisco IronPort
(Email
On-Premise
Security )
Intermedia
(Email Archive)
9. Do Investigation, Find Scope and Perform Forensics Using only Metron
Systems Accessed for Remediation
Exchange
(Primary
Email
Service)
Corp Gmail
(Secondary
Email
Service)
AD & OKTA
(Identity
Provider &
SSO)
Maxmind
(IP Geo
DB)
AD
(Identity
Mgmt.)
Asset
Mgmt.
Inventory
Soltra
(Threat
Intel)
Systems Accessed for Investigation/Context
Systems
Accessed to
Determine Scope
FireEye
(Email
Cloud Security
)
Cisco IronPort
(Email
On-Premise
Security )
Intermedia
(Email Archive)
Systems
Accessed for
Forensics
10. Challenges that Apache Metron Solves
60%: Percent of breaches that
happened in minutes
8 months: Average time an
advanced security breach goes
unnoticed
$400 million in estimated
financial loss in 2015
70%-90%: Percentage of
malware in breach unique to
organization
2015 Verizon Data Breach Investigations Report
• Too many manual steps in different tools
makes investigations slow and expensive
• Too expensive to keep data for enough time to
understand history
• Too expensive to collect all the desired data to
understand context
• Not sure if can detect a targeted event.
• Too many events to review in timely manner
• Not enough staff to review events in a timely
manner
• Too long to detect breach
• Hackers getting more sophisticated
11. Why Metron? SOC Analyst Perspective
Looking through
alerts
25%
Collecting contextual
data
25%
Formulating a
Hypothesis
5%
Investigate
20%
Remediate
15%
Update Workflow
5%
Wrte Report
5%
ANALYST WORKFLOW • Alerts Relevancy Engine
• Smarter ML alerts
• Centralized Alerts Console
• Enriched with threat intel data
• Fully enriched messages
• Single pane of glass UI
• Centralized real-time search
• All logs in one place
• Granular access to PCAP
• Replay old PCAP against new signatures
• Tag behavior for modelling by data scientists
• Raw messages used as evidentiary store
• Mine investigation history
• Asset inventory as an enrichment
• User identity as an enrichment
• Workflow engine
• Ticket clustering
Everything you need to know in one place
12. Why Metron? Data Scientist Perspective
Formulating a
Hypothesis
5%
Finding Data
20%
Cleaning Data
20%
Munging Data
20%
Visualizing Data
20%
Modelling Data
10%
Validating Model
5%
DATA SCIENCE WORKFLOW
• All my data is in the same place
• Data exposed through a variety of APIs
• Standard Access Control Policies
• Quickly see what I have
• Metron normalizes objects
• Partial schema validation on ingest
• Tagging on ingest
• Automatic data enrichment
• Automatic application of class labels
• Common Metron Objects
• Massively parallel computation framework
• Reusable Zeppelin Dashboards
• Real-time search + UI
• Integration with Python/R
• Integration with analytics tools
Reducing time from hypothesis to model
13. Part 1 – Overview of Apache Metron
• Challenges with Today’s Security Tools to Combat Cyber Attacks
• Introduction to Apache Metron
• Personas and Core Themes
• Why Apache Metron?
Part 2 – Metron Architecture
• Telemetry Parsing
• Enrichment
• Threat Intelligence
• Alert Triage
• Index and Write to Storage
• Getting Started
Agenda
15. Telemetry Parsing
Accept logs
Normalize log formats to common Metron event format
Verifies incoming data
Telemetry
Parsing
Enrichment Threat Intel Alert Triage
Index &
Write
Metron Stream Processing Pipeline
16. Log format to Metron Message Conversion
{"full_hostname":"www.aliexpress.com","code":200,"method":"GET","url":"http://www.aliexpress.com
/af/shoes.html?","source.type":"squid","elapsed":832,"ip_dst_addr":"104.116.248.248","original_strin
g":"1475518070.281 832 127.0.0.1 TCP_MISS/200 448176 GET
http://www.aliexpress.com/af/shoes.html? - DIRECT/104.116.248.248
text/html","bytes":448176,"domain_without_subdomains":"aliexpress.com","action":"TCP_MISS","ip_
src_addr":"127.0.0.1","timestamp":1475518070281}
1475518070.281 832 127.0.0.1 TCP_MISS/200 448176 GET http://www.aliexpress.com/af/shoes.html? -
DIRECT/104.116.248.248 text/html
ORIGINAL LOG LINE
METRON JSON MESSAGE
17. Topic A
Parser
Topology ASensor
A
Native Format
Apache
Kafka
Apache Storm
Enriched
Metron JSON
Parsing and Normalizing Topology
• Each Telemetry source has:
• Kafka topic with original event content
• Storm Topology to normalize into common Metron event format
• All telemetry sources feed into single enrichment topic
19. Telemetry Parser Implementation Options
• General Purpose Parsers
• Easy to create – no programming
• Grok
• Regular expression based parser extracts Metron event values
• CSV Parser
• Maps CSV columns to Metron events
• Java
• High performance for high throughput sources
• Complex formats not easily expressed as Regex
• Java class implements MessageParser interface
20. Sensor
A
Sensor
B
Sensor
N
Topic A
Topic B
Topic (N)
Apache
Kafka
PCAP
PCAP
Probe
Physical Architecture
Parse
Topology A
Parser
Topology B
Parser
Topology N
Apache
Storm
Native Format
Native Format
Native Format
PCAP on HDFS Metron PCAP
Service
PCAP
Topology
Enrich
Normalized
Metron
Format Enrichment/
Threat Intel
Topology
Out to Index + HDFS
21. Enrichment
Add extra information to parsed event
Add context to event to save Security Analyst time
Score event for triage
Telemetry
Parsing
Enrichment Threat Intel Alert Triage
Index &
Write
Metron Stream Processing Pipeline
24. Enrichment Options
• Geo
• Add geo location information for ips (latitude, longitude, city, country, etc)
• Host
• Add information from known hosts configuration
• Hbase
• Threat intelligence information
• Stellar
• Apply Stellar Expressions to event
• Flexibility and extensibility
25. Stellar Enrichments
• DSL for simple computations and transformations on message variables
• Capabilities
• Reference event field
• Boolean: and, or, not
• Real/Integer Arithmetic: *, /, + , -,
• Comparison: <, > ,<= ,>=
• If else: if var1 < 10 then 'less than 10' else '10 or more’
• Check field exists: exists
• Functions: MAP_GET, SPLIT, STARTS_WITH, etc
• Documentation
• https://github.com/apache/incubator-metron/tree/master/metron-platform/metron-common
27. Event with top_level_domain Stellar Enrichment and
geo enrichment
{"adapter.threatinteladapter.end.ts":"1475617327962","full_hostname":"www.aliexpr
ess.com","code":200,"enrichmentsplitterbolt.splitter.end.ts":"1475617327621",
"top_level_domain":"com",
"enrichments.geo.ip_dst_addr.city":"Cambridge” …..}
28. Threat Intelligence
• Threat Indicators
• Malicious domain watchlist
• Malicious ip watchlist
• MD5 signatures
• Triaging
• Structured Threat Information eXpression (STIX)
• Threat Intelligence in machine format
• May be exchanged by TAXII
• Trusted Automated eXchange of Indicator Information (TAXII)
• Describes how TI is exchanged
• Automated standard exchange interface of threat intelligence
32. Scoring Event
If alert = true, then event is a threat
Calculate one or more risk scores
Aggregate all scores to get event score
SUM, MEAN, MAX, etc
34. Model as a Service
• Security Analysis Models applied during enrichment and threat intelligence
• REST microservices implementing a specified interface
• Machine learning or other model
• Train model with event history stored in Hadoop
• Register with discovery service
• Referenced in Stellar enrichments
• MAAS_GET_ENDPOINT
• MAAS_MODEL_APPLY
• System load balances across instances
• More Information
https://github.com/apache/incubator-metron/tree/master/metron-analytics/metron-maas-service
36. Indexing and Writing
• Store events for future reference
• Forensics
• Training machine learning models
• Reprocess with new threat indicators
Telemetry
Parsing
Enrichment Threat Intel Alert Triage
Index &
Write
Metron Stream Processing Pipeline