SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
1 © Hortonworks Inc. 2011–2018. All rights reserved.
Solving Cybersecurity at Scale
Laurence Da Luz & Mo Kamel
2 © Hortonworks Inc. 2011–2018. All rights reserved.
What Are We Talking About?
Cybersecurity Challenges
Solving Cybersecurity At Scale
Anatomy of Apache Metron
Use Case Walkthrough
3 © Hortonworks Inc. 2011–2018. All rights reserved. 3
CYBERSECURITY IS
A BIG DATA PROBLEM
4 © Hortonworks Inc. 2011–2018. All rights reserved.
Big Traffic, Big Trouble
Complexity Problem
• Too many point solutions
• Too many dashboards
• Too hard to correlate data across
silos
• Cybersecurity staff overwhelmed
with too many alerts
5 © Hortonworks Inc. 2011–2018. All rights reserved.
Big Traffic, Big Trouble + Capability Problem
• Huge volumes & Limited Storage
• Inconsistent data from multiple
sources
• Real-time context is crucial
• Missing Adv. Analytics
• Alert Fatigue – “Many False
Positives”
6 © Hortonworks Inc. 2011–2018. All rights reserved.
Big Traffic, Big Trouble
+ People Problem
• Skill Shortage around the globe
• Staff inefficiency & high cost
• Low value work of data gathering
and cleansing
• Impractical solution scaling people
7 © Hortonworks Inc. 2011–2018. All rights reserved.
Big Traffic, Big Trouble
+ Security Problem
• Distracting and Adv. attacks
• Lake of security context
• Asset Classifications
• Prioritization and Scoring
• Full access to historical data
8 © Hortonworks Inc. 2011–2018. All rights reserved.
9 © Hortonworks Inc. 2011–2018. All rights reserved.
A Community Solution Open Source Solution
• Volume
• Variety
• Value
• Automation
• Realtime
• Threat Intel
10 © Hortonworks Inc. 2011–2018. All rights reserved.
Advanced Use Cases
Open Source Solution
• Users Behavior
• Entities Behavior
• Advanced Analytics
11 © Hortonworks Inc. 2011–2018. All rights reserved.
12 © Hortonworks Inc. 2011–2018. All rights reserved.
Solving Cybersecurity at Scale
An architecture for real-time cybersecurity analytics
REAL-TIME PROCESSING CYBER SECURITY ENGINE
Cyber Security Stream Processing Pipeline
Telemetry Data
Sources
Telemetry Data
Collectors
Telemetry
Parsers
Enrichment Threat
Intel
Profiler Alert
Triage
Indexers
and
Writers
SecurityEndPoint
Devices
(Fireye,PaloAlto,
BlueCoat,etc.)
Machine
GeneratedLogs
(AD,App/Web
Server,firewall,
VPN,etc.)
IDS
(Suricata,Snort,
etc.)
NetworkData
PCAP,Netflow,Bro,
etc.)
ThreatIntelligence
Feeds
(Soltra,OpenTaxi
third-partyfeeds)
Performance
NetworkIngest
Probes
Real-Time
Enrich/Threat
IntelStreams
/Other…
DataVault
Real-TimeSearch
EvidentiaryStore
ThreatIntelligence
Platform
ModelasaService
CommunityModels
DataScience
Workbench
PCAPForensics
Modules
Data Services
& Integration
Layer
Telemetry
Ingest Buffer
HORTONWORKS DATA PLATFORMHORTONWORKS DATA FLOW
13 © Hortonworks Inc. 2011–2018. All rights reserved.
Solving Cybersecurity at Scale
An architecture for real-time cybersecurity analytics
Cyber Security Stream Processing Pipeline
Telemetry Data
Sources
Telemetry Data
Collectors
Telemetry
Parsers
Enrichment Threat
Intel
Profiler Alert
Triage
Indexers
and
Writers
SecurityEndPoint
Devices
(Fireye,PaloAlto,
BlueCoat,etc.)
Machine
GeneratedLogs
(AD,App/Web
Server,firewall,
VPN,etc.)
IDS
(Suricata,Snort,
etc.)
NetworkData
PCAP,Netflow,Bro,
etc.)
ThreatIntelligence
Feeds
(Soltra,OpenTaxi
third-partyfeeds)
Performance
NetworkIngest
Probes
Real-Time
Enrich/Threat
IntelStreams
/Other…
DataVault
Real-TimeSearch
EvidentiaryStore
ThreatIntelligence
Platform
ModelasaService
CommunityModels
DataScience
Workbench
PCAPForensics
Modules
Data Services
& Integration
Layer
Telemetry
Ingest Buffer
HORTONWORKS DATA PLATFORMHORTONWORKS DATA FLOW
Collect security device and
machine generated logs
Extendable data model
Enrichment on Ingest for
extra context
Behavior profiling and
advanced windowing
Flexible deployment of Data
Science
Alerting and triage (exposed
to SOC)
Hortonworks Cybersecurity Platform runs as an application
on top of HDF and HDP
REAL-TIME PROCESSING CYBER SECURITY ENGINE
14 © Hortonworks Inc. 2011–2018. All rights reserved.
15 © Hortonworks Inc. 2011–2018. All rights reserved.
16 © Hortonworks Inc. 2011–2018. All rights reserved.
17 © Hortonworks Inc. 2011–2018. All rights reserved.
Context is everything
Enrichments
User, group data,
internal business sources
Geospatial data, worldwide
shared threat intelligence Model predictions, via
Model As A Service framework Time
18 © Hortonworks Inc. 2011–2018. All rights reserved.
Time is context | Time matters
The Profiler
• A generalized solution for extracting model features and aggregations over time from high throughput,
streaming data
• Generates a profile describing the behavior of an entity; a host, user, subnet or application..
• A foundational component for both security model building and alerting in HCP
t = 1 t = 2 t = 3 t = n
Profile behavior across
windows in time, and
across multiple devices
19 © Hortonworks Inc. 2011–2018. All rights reserved.
t = 1 t = 2 t = 3 t = n
… how do we perform
behavioral profiling at
real-time scale?
Time is context | Time matters
The Profiler
Variety of different types of data sketches, but
general characteristics include:
• Stream friendly - each item examined only once,
can quickly update a small sketch data structure
• Scalable – effective for queries that do not scale
well; count distinct, quantiles, most frequent
items
• Approximate – faster results within
mathematically proven error bounds
• provide fixed size compute and
predictable space usage
Combined Sketch
Period: 0<t<3
Combined Sketch
Period: 0<t<1 + 2<t<3 +….
Sketch
Period: 0<t<1
Sketch
Period: 1<t<2
Sketch
Period: 2<t<3
Sketch
Period: n-1<t<n
Data sketches provide fast,
approximate answers to queries about
the underlying data.
Data sketches are combinable.
Allows us to slice and dice the windows and re-
combine them during read.
Can pick and mix sketches (skip certain days,
hours, etc..)
20 © Hortonworks Inc. 2011–2018. All rights reserved.
Streaming Analytics at Scale
Algorithms out of the box
Profiles
• HyperLogLog (Cardinality) – How many servers
does this user talk to usually?
• Bloom Filters – Have we seen this domain
before?
• T-Digest (distribution) – Personalized Baselining
and statistics
• Counters and descriptive statistics – Quick
results and triggers for more intensive
calculations.
• Mixed period windows – accounting for
holidays, typical working periods and seasons
Approximation algorithms - specialized algorithms that can produce results
orders-of-magnitude faster within mathematically proven error bounds - ideal
for real-time analytics
Natural Language Processing
(finding likely non-human behavior with Machine
Learning)
• Typosquat (mis-spellings, homoglyphs)
• DGA (Domain Generation Algorithm)
Streaming similarity and anomaly detection
• Mean Absolute Deviation
• TLSH (Locality Sensitive Hashing) – Finding events
similar to known bad
• GeoHash similarity
• Robust PCA
21 © Hortonworks Inc. 2011–2018. All rights reserved. 21
UNDER THE HOOD: ANATOMY OF THE
DATA ENGINEERING PIPELINE
22 © Hortonworks Inc. 2011–2018. All rights reserved.
Architecture & Capabilities
23 © Hortonworks Inc. 2011–2018. All rights reserved.
Architecture & Capabilities
Pipelines are created and deployed via the Metron
framework - no custom storm code required
Extendable Domain Specific Language (DSL) used across Metron
for querying, transformation, and configuring rules
Core pipeline components: NiFi, Kafka, Storm, Spark, Solr.
Access and Visualization: Metron UI & Zoomdata (partner)
Generated alerts can be integrated
with external systems
24 © Hortonworks Inc. 2011–2018. All rights reserved. 19
ARCHITECTURE & CAPABILITIES
[#] #### :
#######
#####
[#] #### :
#######
#####
[#] #### :
#######
#####
Acquire
NiFi (& MiNiFi) acquire raw
data and handle routing
Devices generate
raw log messages
data formats from a variety of
disparate systems and sources
Architecture & Capabilities
25 © Hortonworks Inc. 2011–2018. All rights reserved. 20
ARCHITECTURE & CAPABILITIES
[#] #### :
#######
#####
{
a: ##
b: ####
c: ###
}
[#] #### :
#######
#####
[#] #### :
#######
#####
[#] #### :
#######
#####
Acquire Normalize
Out of the box device parsers
ASA, Bro, Fireeye, PaloAlto, …
Convert all data from raw source
logs into a common JSON format
simplifies downstream
analytics across devices
General purpose format parsers
Grok, Regex, CSV, JSON
Custom java based parsers
Architecture & Capabilities
26 © Hortonworks Inc. 2011–2018. All rights reserved. 21
ARCHITECTURE & CAPABILITIES
[#] #### :
#######
#####
{
a: ##
b: ####
c: ###
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
alert: true
sev: 1
}
[#] #### :
#######
#####
[#] #### :
#######
#####
[#] #### :
#######
#####
Acquire EnrichNormalize
Geo enrichment, hbase lookups
for custom enrichments, MaaS
additional additional information
to raw source during streaming
Assess against threat feeds,
and alert based on severity
Architecture & Capabilities
27 © Hortonworks Inc. 2011–2018. All rights reserved. 22
ARCHITECTURE & CAPABILITIES
[#] #### :
#######
#####
{
a: ##
b: ####
c: ###
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
alert: true
sev: 1
}
[#] #### :
#######
#####
[#] #### :
#######
#####
[#] #### :
#######
#####
t1 t2 [t3] --- !
Acquire Enrich ProfileNormalize
Profiler generates feature sets
that are stored within HBase
Profiler is a separate pipeline that
listens on all streaming events
Pipeline specialized to understand a series
of actions in time across multiple devices
windowed features can be looped
back for triage and alerting
Batch profiling is also supported, that can
“seed” a feature set from historical data
Architecture & Capabilities
28 © Hortonworks Inc. 2011–2018. All rights reserved. 23
ARCHITECTURE & CAPABILITIES
[#] #### :
#######
#####
{
a: ##
b: ####
c: ###
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
alert: true
sev: 1
}
[#] #### :
#######
#####
[#] #### :
#######
#####
[#] #### :
#######
#####
t1 t2 [t3] --- !
--- [alert] ---
a: ## b: #### c: ### d: ## e: ##
Acquire Enrich ProfileNormalize Security Data Lake
Data is indexed in Solr near-term
for random access and hot tiering
Data is stored in HDFS long-term
for historical access and analytics
SOC Analysts perform security
monitoring and threat hunting
Security data scientists train
against historical trends to
improve alerting models
Architecture & Capabilities
29 © Hortonworks Inc. 2011–2018. All rights reserved. 24
ARCHITECTURE & CAPABILITIES
[#] #### :
#######
#####
{
a: ##
b: ####
c: ###
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
}
{
a: ##
b: ####
c: ###
d: ##
e: ##
alert: true
sev: 1
}
[#] #### :
#######
#####
[#] #### :
#######
#####
[#] #### :
#######
#####
t1 t2 [t3] --- !
--- [alert] ---
a: ## b: #### c: ### d: ## e: ##
Acquire Enrich ProfileNormalize Security Data Lake
End-to-end streaming data pipeline enables real-time
action against cyber threats in a repeatable patternArchitecture & Capabilities
30 © Hortonworks Inc. 2011–2018. All rights reserved. 30
USE CASE
WALK THROUGH
31 © Hortonworks Inc. 2011–2018. All rights reserved.
Deploying a Use Case in Apache Metron
What is Squid?
• Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response
times by caching and reusing frequently-requested web pages
What does a Squid access log look like?
• When you make an outbound http connection to http://www.cnn.com, the following entry is added to a file called
access.log:
Squid Logs - Use Case Walkthrough
Unix Epoch Time
IP of host where connection was
made
Domain name of the outbound
connection
32 © Hortonworks Inc. 2011–2018. All rights reserved.
Deploying a Use Case in Apache Metron
What is Squid?
• Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response
times by caching and reusing frequently-requested web pages
What does a Squid access log look like?
• When you make an outbound http connection to http://www.cnn.com, the following entry is added to a file called
access.log:
Squid Logs - Use Case Walkthrough
Unix Epoch Time
IP of host where connection was
made
Domain name of the outbound
connection
Convert from Unix Epoch to
Timestamp
Asset enrichment to enrich IP
(hostname, type of device)
WHOIS enrichment to look up
domain name information
Threat Intel to cross-reference address with
intel feed to see if there is a hit
Index the event into Solr and persist in
HDFS (Security Data Lake)
What Metron will do to the
Squid telemetry in real-time
33 © Hortonworks Inc. 2011–2018. All rights reserved.
Deploying a Use Case in Apache Metron
34 © Hortonworks Inc. 2011–2018. All rights reserved.
Deploying a Use Case in Apache Metron
Step 1 NiFi TailFile
Step 2 Define Parser
Step 3 Enrichment Config
Step 4 Configure Alerts
Step 5 Configure Profiler
35 © Hortonworks Inc. 2011–2018. All rights reserved.
Step 1 – Telemetry Ingest
Streaming from NiFi to Kafka
Data is tailed from the Squid access-log files:
36 © Hortonworks Inc. 2011–2018. All rights reserved.
Step 2 – Configuring the Squid Parser
Defining a Grok Filter for the Squid data
• Grok parser à config driven
• Regex-based abstraction
• Grok is suitable for structured or
semi-structured logs
• Contains pre-defined mappings
Pre-defined grok mappings for IP
37 © Hortonworks Inc. 2011–2018. All rights reserved.
Step 3 – Configuring Streaming Enrichment
Enriching events with GEO data
• Leverage the out of the box GEO enrichment.
• Custom enrichment sources also supported – stored
in HBase & configuration driven
Enriching against a DGA model
$METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -lmp $HOME/mock_dga -hmp
/user/$USER/models -mo ADD -m 512 -n dga -v 1.0 -ni 1
Mock DGA python model to
detect malicious domains
Deploy to Metron Model as a
Service (MaaS)
Call model in stream within
parser or enrichment configs
38 © Hortonworks Inc. 2011–2018. All rights reserved.
Step 4 – Configuring Alerts
Defining severity ratings based on threat triage rules
Raise an alert if our dga model finds
a detection
Set our score rating to 100 on hit
Multiple alert rules supported.
Aggregator defined for when
multiple conditions are met
39 © Hortonworks Inc. 2011–2018. All rights reserved. 39
enrichment from GEOIP lookup
enrichment from DGA python model
(from Model as a Service)
40 © Hortonworks Inc. 2011–2018. All rights reserved.
Step 5 – Configuring Profiler
Finding geographic anomalies in user login behavior - an authentication log example
Profile 1: Track locations by user
• geohashes of the locations the user has logged in from
• multiset of geohashes per user (mapping occurrence counts)
{
"profile": "locations_by_user",
"foreach": "user",
"onlyif": "hash != null && LENGTH(hash) > 0"
"init": {
"s": "MULTISET_INIT()"
},
"update": {
"s": "MULTISET_ADD(s, hash)"
},
"result": "s”
}
{
"profile": "geo_distribution_from_centroid",
"foreach": "'global'",
"onlyif": "geo_distance != null"
"init": {
"s": "STATS_INIT()"
},
"update": {
"s": "STATS_ADD(s, geo_distance)"
},
"result": "s”
}
Profile 2: Track geo distribution from centroid
• Statistical distribution of the distance between login location and the
geographic centroid of the user’s previous logins from within the last
5 minutes
These profiles will help us track if a user is logging in via vastly differing
geographic locations in a short period of time
41 © Hortonworks Inc. 2011–2018. All rights reserved.
{
"threatIntel": {
"fieldMap": {
"stellar" : {
"config" : [
"geo_distance_distr:= STATS_MERGE( PROFILE_GET('geo_distribution_from_centroid', 'global',
PROFILE_FIXED( 4, ’HOURS')))",
"dist_median := STATS_PERCENTILE(geo_distance_distr, 50.0)",
"dist_sd := STATS_SD(geo_distance_distr)",
"geo_outlier := ABS(dist_median - geo_distance) >= 5*dist_sd",
"is_alert := exists(is_alert) && is_alert",
"is_alert := is_alert || (geo_outlier != null && geo_outlier == true)",
"geo_distance_distr := null"
]
}
}
Step 5 – Configuring Profiler
Compute the threat given global context and per-user context
Get the statistical distribution of the
‘geo_distance’ field for all users
Decide if the geo_distance is an outlier by
testing how many standard deviations it is
from the median
Update the ‘is_alert’ accordingly. If this is
true, then we need to triage the alert level
42 © Hortonworks Inc. 2011–2018. All rights reserved.
"triageConfig" : {
"riskLevelRules" : [
{
"name" : "Geographic Outlier",
"comment" : "Determine if the user's geographic distance from the centroid of the historic logins
is an outlier as compared to all users.",
"rule" : "geo_outlier != null && geo_outlier",
"score" : 10,
"reason" : "FORMAT('user %s has a distance (%d) from the centroid of their last login is 5 std
deviations (%f) from the median (%f)', user, geo_distance, dist_sd, dist_median)"
}
],
"aggregator" : "MAX"
}
Step 5 – Configuring Profiler
Triage the threat
Because this is only a circumstantial
indicator, we’ll only give this a threat
score of 10
In a normal system, there would be many
rules triaging the threat. In this case the
max score would be taken
We need to ensure the SOC Analyst has enough
context to make a decision here
43 © Hortonworks Inc. 2011–2018. All rights reserved.
Key Takeaways
• Cybersecurity is a big data problem
We need a community driven approach to solve it
• Modern cybersecurity challenges require
a modern data architecture to facilitate real-time response
• Apache Metron provides an extensible, repeatable, and
configuration driven framework for real-time cybersecurity at scale
© Hortonworks Inc. 2011–2018. All rights reserved.
44 © Hortonworks Inc. 2011–2018. All rights reserved.
Thank you.

Weitere ähnliche Inhalte

Was ist angesagt?

Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo DataWorks Summit
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4DataWorks Summit
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
How to Ingest 16 Billion Records Per Day into your Hadoop Environment
How to Ingest 16 Billion Records Per Day into your Hadoop EnvironmentHow to Ingest 16 Billion Records Per Day into your Hadoop Environment
How to Ingest 16 Billion Records Per Day into your Hadoop EnvironmentDataWorks Summit
 
Navigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentNavigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentDataWorks Summit
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifiGaneshJoshi47
 

Was ist angesagt? (20)

Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Deep learning 101
Deep learning 101Deep learning 101
Deep learning 101
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Keynote
KeynoteKeynote
Keynote
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Containers and Big Data
Containers and Big Data Containers and Big Data
Containers and Big Data
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
How to Ingest 16 Billion Records Per Day into your Hadoop Environment
How to Ingest 16 Billion Records Per Day into your Hadoop EnvironmentHow to Ingest 16 Billion Records Per Day into your Hadoop Environment
How to Ingest 16 Billion Records Per Day into your Hadoop Environment
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Navigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentNavigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT Development
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifi
 

Ähnlich wie Solving Cybersecurity at Scale

Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsDataWorks Summit/Hadoop Summit
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayC:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayArik Weinstein
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!TigerGraph
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXKirk Haslbeck
 
A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronSimon Elliston Ball
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motionconfluent
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronDataWorks Summit
 
High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputinginside-BigData.com
 
System Support for Internet of Things
System Support for Internet of ThingsSystem Support for Internet of Things
System Support for Internet of ThingsHarshitParkar6677
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Pavel Hardak
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226Nick Kypreos
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Data Con LA
 

Ähnlich wie Solving Cybersecurity at Scale (20)

Make Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the DetailsMake Streaming Analytics work for you: The Devil is in the Details
Make Streaming Analytics work for you: The Devil is in the Details
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayC:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
 
A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache Metron
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache Metron
 
High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputing
 
System Support for Internet of Things
System Support for Internet of ThingsSystem Support for Internet of Things
System Support for Internet of Things
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Kürzlich hochgeladen (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Solving Cybersecurity at Scale

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. Solving Cybersecurity at Scale Laurence Da Luz & Mo Kamel
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. What Are We Talking About? Cybersecurity Challenges Solving Cybersecurity At Scale Anatomy of Apache Metron Use Case Walkthrough
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. 3 CYBERSECURITY IS A BIG DATA PROBLEM
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. Big Traffic, Big Trouble Complexity Problem • Too many point solutions • Too many dashboards • Too hard to correlate data across silos • Cybersecurity staff overwhelmed with too many alerts
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. Big Traffic, Big Trouble + Capability Problem • Huge volumes & Limited Storage • Inconsistent data from multiple sources • Real-time context is crucial • Missing Adv. Analytics • Alert Fatigue – “Many False Positives”
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Big Traffic, Big Trouble + People Problem • Skill Shortage around the globe • Staff inefficiency & high cost • Low value work of data gathering and cleansing • Impractical solution scaling people
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. Big Traffic, Big Trouble + Security Problem • Distracting and Adv. attacks • Lake of security context • Asset Classifications • Prioritization and Scoring • Full access to historical data
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved.
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. A Community Solution Open Source Solution • Volume • Variety • Value • Automation • Realtime • Threat Intel
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. Advanced Use Cases Open Source Solution • Users Behavior • Entities Behavior • Advanced Analytics
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved.
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. Solving Cybersecurity at Scale An architecture for real-time cybersecurity analytics REAL-TIME PROCESSING CYBER SECURITY ENGINE Cyber Security Stream Processing Pipeline Telemetry Data Sources Telemetry Data Collectors Telemetry Parsers Enrichment Threat Intel Profiler Alert Triage Indexers and Writers SecurityEndPoint Devices (Fireye,PaloAlto, BlueCoat,etc.) Machine GeneratedLogs (AD,App/Web Server,firewall, VPN,etc.) IDS (Suricata,Snort, etc.) NetworkData PCAP,Netflow,Bro, etc.) ThreatIntelligence Feeds (Soltra,OpenTaxi third-partyfeeds) Performance NetworkIngest Probes Real-Time Enrich/Threat IntelStreams /Other… DataVault Real-TimeSearch EvidentiaryStore ThreatIntelligence Platform ModelasaService CommunityModels DataScience Workbench PCAPForensics Modules Data Services & Integration Layer Telemetry Ingest Buffer HORTONWORKS DATA PLATFORMHORTONWORKS DATA FLOW
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. Solving Cybersecurity at Scale An architecture for real-time cybersecurity analytics Cyber Security Stream Processing Pipeline Telemetry Data Sources Telemetry Data Collectors Telemetry Parsers Enrichment Threat Intel Profiler Alert Triage Indexers and Writers SecurityEndPoint Devices (Fireye,PaloAlto, BlueCoat,etc.) Machine GeneratedLogs (AD,App/Web Server,firewall, VPN,etc.) IDS (Suricata,Snort, etc.) NetworkData PCAP,Netflow,Bro, etc.) ThreatIntelligence Feeds (Soltra,OpenTaxi third-partyfeeds) Performance NetworkIngest Probes Real-Time Enrich/Threat IntelStreams /Other… DataVault Real-TimeSearch EvidentiaryStore ThreatIntelligence Platform ModelasaService CommunityModels DataScience Workbench PCAPForensics Modules Data Services & Integration Layer Telemetry Ingest Buffer HORTONWORKS DATA PLATFORMHORTONWORKS DATA FLOW Collect security device and machine generated logs Extendable data model Enrichment on Ingest for extra context Behavior profiling and advanced windowing Flexible deployment of Data Science Alerting and triage (exposed to SOC) Hortonworks Cybersecurity Platform runs as an application on top of HDF and HDP REAL-TIME PROCESSING CYBER SECURITY ENGINE
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved.
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved.
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved.
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Context is everything Enrichments User, group data, internal business sources Geospatial data, worldwide shared threat intelligence Model predictions, via Model As A Service framework Time
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. Time is context | Time matters The Profiler • A generalized solution for extracting model features and aggregations over time from high throughput, streaming data • Generates a profile describing the behavior of an entity; a host, user, subnet or application.. • A foundational component for both security model building and alerting in HCP t = 1 t = 2 t = 3 t = n Profile behavior across windows in time, and across multiple devices
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. t = 1 t = 2 t = 3 t = n … how do we perform behavioral profiling at real-time scale? Time is context | Time matters The Profiler Variety of different types of data sketches, but general characteristics include: • Stream friendly - each item examined only once, can quickly update a small sketch data structure • Scalable – effective for queries that do not scale well; count distinct, quantiles, most frequent items • Approximate – faster results within mathematically proven error bounds • provide fixed size compute and predictable space usage Combined Sketch Period: 0<t<3 Combined Sketch Period: 0<t<1 + 2<t<3 +…. Sketch Period: 0<t<1 Sketch Period: 1<t<2 Sketch Period: 2<t<3 Sketch Period: n-1<t<n Data sketches provide fast, approximate answers to queries about the underlying data. Data sketches are combinable. Allows us to slice and dice the windows and re- combine them during read. Can pick and mix sketches (skip certain days, hours, etc..)
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics at Scale Algorithms out of the box Profiles • HyperLogLog (Cardinality) – How many servers does this user talk to usually? • Bloom Filters – Have we seen this domain before? • T-Digest (distribution) – Personalized Baselining and statistics • Counters and descriptive statistics – Quick results and triggers for more intensive calculations. • Mixed period windows – accounting for holidays, typical working periods and seasons Approximation algorithms - specialized algorithms that can produce results orders-of-magnitude faster within mathematically proven error bounds - ideal for real-time analytics Natural Language Processing (finding likely non-human behavior with Machine Learning) • Typosquat (mis-spellings, homoglyphs) • DGA (Domain Generation Algorithm) Streaming similarity and anomaly detection • Mean Absolute Deviation • TLSH (Locality Sensitive Hashing) – Finding events similar to known bad • GeoHash similarity • Robust PCA
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. 21 UNDER THE HOOD: ANATOMY OF THE DATA ENGINEERING PIPELINE
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. Architecture & Capabilities
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. Architecture & Capabilities Pipelines are created and deployed via the Metron framework - no custom storm code required Extendable Domain Specific Language (DSL) used across Metron for querying, transformation, and configuring rules Core pipeline components: NiFi, Kafka, Storm, Spark, Solr. Access and Visualization: Metron UI & Zoomdata (partner) Generated alerts can be integrated with external systems
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. 19 ARCHITECTURE & CAPABILITIES [#] #### : ####### ##### [#] #### : ####### ##### [#] #### : ####### ##### Acquire NiFi (& MiNiFi) acquire raw data and handle routing Devices generate raw log messages data formats from a variety of disparate systems and sources Architecture & Capabilities
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. 20 ARCHITECTURE & CAPABILITIES [#] #### : ####### ##### { a: ## b: #### c: ### } [#] #### : ####### ##### [#] #### : ####### ##### [#] #### : ####### ##### Acquire Normalize Out of the box device parsers ASA, Bro, Fireeye, PaloAlto, … Convert all data from raw source logs into a common JSON format simplifies downstream analytics across devices General purpose format parsers Grok, Regex, CSV, JSON Custom java based parsers Architecture & Capabilities
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. 21 ARCHITECTURE & CAPABILITIES [#] #### : ####### ##### { a: ## b: #### c: ### } { a: ## b: #### c: ### d: ## e: ## } { a: ## b: #### c: ### d: ## e: ## alert: true sev: 1 } [#] #### : ####### ##### [#] #### : ####### ##### [#] #### : ####### ##### Acquire EnrichNormalize Geo enrichment, hbase lookups for custom enrichments, MaaS additional additional information to raw source during streaming Assess against threat feeds, and alert based on severity Architecture & Capabilities
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. 22 ARCHITECTURE & CAPABILITIES [#] #### : ####### ##### { a: ## b: #### c: ### } { a: ## b: #### c: ### d: ## e: ## } { a: ## b: #### c: ### d: ## e: ## alert: true sev: 1 } [#] #### : ####### ##### [#] #### : ####### ##### [#] #### : ####### ##### t1 t2 [t3] --- ! Acquire Enrich ProfileNormalize Profiler generates feature sets that are stored within HBase Profiler is a separate pipeline that listens on all streaming events Pipeline specialized to understand a series of actions in time across multiple devices windowed features can be looped back for triage and alerting Batch profiling is also supported, that can “seed” a feature set from historical data Architecture & Capabilities
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. 23 ARCHITECTURE & CAPABILITIES [#] #### : ####### ##### { a: ## b: #### c: ### } { a: ## b: #### c: ### d: ## e: ## } { a: ## b: #### c: ### d: ## e: ## alert: true sev: 1 } [#] #### : ####### ##### [#] #### : ####### ##### [#] #### : ####### ##### t1 t2 [t3] --- ! --- [alert] --- a: ## b: #### c: ### d: ## e: ## Acquire Enrich ProfileNormalize Security Data Lake Data is indexed in Solr near-term for random access and hot tiering Data is stored in HDFS long-term for historical access and analytics SOC Analysts perform security monitoring and threat hunting Security data scientists train against historical trends to improve alerting models Architecture & Capabilities
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. 24 ARCHITECTURE & CAPABILITIES [#] #### : ####### ##### { a: ## b: #### c: ### } { a: ## b: #### c: ### d: ## e: ## } { a: ## b: #### c: ### d: ## e: ## alert: true sev: 1 } [#] #### : ####### ##### [#] #### : ####### ##### [#] #### : ####### ##### t1 t2 [t3] --- ! --- [alert] --- a: ## b: #### c: ### d: ## e: ## Acquire Enrich ProfileNormalize Security Data Lake End-to-end streaming data pipeline enables real-time action against cyber threats in a repeatable patternArchitecture & Capabilities
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. 30 USE CASE WALK THROUGH
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Deploying a Use Case in Apache Metron What is Squid? • Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages What does a Squid access log look like? • When you make an outbound http connection to http://www.cnn.com, the following entry is added to a file called access.log: Squid Logs - Use Case Walkthrough Unix Epoch Time IP of host where connection was made Domain name of the outbound connection
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Deploying a Use Case in Apache Metron What is Squid? • Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages What does a Squid access log look like? • When you make an outbound http connection to http://www.cnn.com, the following entry is added to a file called access.log: Squid Logs - Use Case Walkthrough Unix Epoch Time IP of host where connection was made Domain name of the outbound connection Convert from Unix Epoch to Timestamp Asset enrichment to enrich IP (hostname, type of device) WHOIS enrichment to look up domain name information Threat Intel to cross-reference address with intel feed to see if there is a hit Index the event into Solr and persist in HDFS (Security Data Lake) What Metron will do to the Squid telemetry in real-time
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved. Deploying a Use Case in Apache Metron
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved. Deploying a Use Case in Apache Metron Step 1 NiFi TailFile Step 2 Define Parser Step 3 Enrichment Config Step 4 Configure Alerts Step 5 Configure Profiler
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved. Step 1 – Telemetry Ingest Streaming from NiFi to Kafka Data is tailed from the Squid access-log files:
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved. Step 2 – Configuring the Squid Parser Defining a Grok Filter for the Squid data • Grok parser à config driven • Regex-based abstraction • Grok is suitable for structured or semi-structured logs • Contains pre-defined mappings Pre-defined grok mappings for IP
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved. Step 3 – Configuring Streaming Enrichment Enriching events with GEO data • Leverage the out of the box GEO enrichment. • Custom enrichment sources also supported – stored in HBase & configuration driven Enriching against a DGA model $METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -lmp $HOME/mock_dga -hmp /user/$USER/models -mo ADD -m 512 -n dga -v 1.0 -ni 1 Mock DGA python model to detect malicious domains Deploy to Metron Model as a Service (MaaS) Call model in stream within parser or enrichment configs
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved. Step 4 – Configuring Alerts Defining severity ratings based on threat triage rules Raise an alert if our dga model finds a detection Set our score rating to 100 on hit Multiple alert rules supported. Aggregator defined for when multiple conditions are met
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved. 39 enrichment from GEOIP lookup enrichment from DGA python model (from Model as a Service)
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved. Step 5 – Configuring Profiler Finding geographic anomalies in user login behavior - an authentication log example Profile 1: Track locations by user • geohashes of the locations the user has logged in from • multiset of geohashes per user (mapping occurrence counts) { "profile": "locations_by_user", "foreach": "user", "onlyif": "hash != null && LENGTH(hash) > 0" "init": { "s": "MULTISET_INIT()" }, "update": { "s": "MULTISET_ADD(s, hash)" }, "result": "s” } { "profile": "geo_distribution_from_centroid", "foreach": "'global'", "onlyif": "geo_distance != null" "init": { "s": "STATS_INIT()" }, "update": { "s": "STATS_ADD(s, geo_distance)" }, "result": "s” } Profile 2: Track geo distribution from centroid • Statistical distribution of the distance between login location and the geographic centroid of the user’s previous logins from within the last 5 minutes These profiles will help us track if a user is logging in via vastly differing geographic locations in a short period of time
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved. { "threatIntel": { "fieldMap": { "stellar" : { "config" : [ "geo_distance_distr:= STATS_MERGE( PROFILE_GET('geo_distribution_from_centroid', 'global', PROFILE_FIXED( 4, ’HOURS')))", "dist_median := STATS_PERCENTILE(geo_distance_distr, 50.0)", "dist_sd := STATS_SD(geo_distance_distr)", "geo_outlier := ABS(dist_median - geo_distance) >= 5*dist_sd", "is_alert := exists(is_alert) && is_alert", "is_alert := is_alert || (geo_outlier != null && geo_outlier == true)", "geo_distance_distr := null" ] } } Step 5 – Configuring Profiler Compute the threat given global context and per-user context Get the statistical distribution of the ‘geo_distance’ field for all users Decide if the geo_distance is an outlier by testing how many standard deviations it is from the median Update the ‘is_alert’ accordingly. If this is true, then we need to triage the alert level
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved. "triageConfig" : { "riskLevelRules" : [ { "name" : "Geographic Outlier", "comment" : "Determine if the user's geographic distance from the centroid of the historic logins is an outlier as compared to all users.", "rule" : "geo_outlier != null && geo_outlier", "score" : 10, "reason" : "FORMAT('user %s has a distance (%d) from the centroid of their last login is 5 std deviations (%f) from the median (%f)', user, geo_distance, dist_sd, dist_median)" } ], "aggregator" : "MAX" } Step 5 – Configuring Profiler Triage the threat Because this is only a circumstantial indicator, we’ll only give this a threat score of 10 In a normal system, there would be many rules triaging the threat. In this case the max score would be taken We need to ensure the SOC Analyst has enough context to make a decision here
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved. Key Takeaways • Cybersecurity is a big data problem We need a community driven approach to solve it • Modern cybersecurity challenges require a modern data architecture to facilitate real-time response • Apache Metron provides an extensible, repeatable, and configuration driven framework for real-time cybersecurity at scale © Hortonworks Inc. 2011–2018. All rights reserved.
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved. Thank you.