Graphistry presented techniques for accelerating security investigations using graph technologies. They demonstrated how generating a virtual hypergraph from multiple data sources allows analysts to easily pivot over the data. They also discussed how automating common investigation tasks using the graph model can scale workflows. Graphistry uses GPUs to enable interactive analysis of large datasets. Their goal is to "100X" productivity by enabling analysts to more quickly extract insights and forage for relevant data through virtual hypergraph queries and automation.
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
100X Investigations - Graphistry / Microsoft BlueHat
1. G R A P H I S T R Y info@graphistry.com
G R A P H I S T R Y
100X Investigations
Graph Workshop, BlueHat 2019, Seattle
Leo Meyerovich, CEO
2. G R A P H I S T R Y info@graphistry.com
… First: Demo - Announcing Graphistry with MS Azure & Sentinel!
3. G R A P H I S T R Y info@graphistry.com
100X’ing Investigation bottlenecks with graph tech
Foraging
Model for pivoting & automation
Compute
GPUs for
everyone!
Sense making
How to visually graph
Graph projects
Less fail, more win
G R A P H I S T R Y
4. G R A P H I S T R Y info@graphistry.com
100X Data foraging with graph:
EASY PIVOTING WITH VIRTUAL HYPERGRAPHS
+ SCALE WITH INVESTIGATION AUTOMATION
5. G R A P H I S T R Y info@graphistry.com5
Foraging Insight 1: The world as a virtual hypergraph
• 1K – 1M devices
• 1K – 1B users
• Digital activities: Payments, logins, clicks, …
• APIs: Software eating everything
6. G R A P H I S T R Y info@graphistry.com
No code – Graph as a lingua franca for querying
6
Search DrillCommandExpandConnect Dots
Enable analysts to work with
your DBs, APIs, & Enterprise
as one big & uniform virtual graph
7. G R A P H I S T R Y info@graphistry.com
IP=10.16.0.8; msg=Malware.Object;
time=2 Nov 2017 19:32:00 UTC;
vendor=FireEye; Product=Web MPS NX
7
Data foraging today
17. G R A P H I S T R Y info@graphistry.com17
knowing: tools x tables x fields
gathering: complete, fresh, fidelity… APIs??
stitching together into a story
… for each incident type x info source!
Foraging is TOUGH
18. G R A P H I S T R Y info@graphistry.com18
alert autoresponseCorrelator
Data Lake
Orchestratorincident
The Dream: SOC-in-a-Box
context
SOC-IN-A-BOX
19. G R A P H I S T R Y info@graphistry.com19
alert autoresponseCorrelator
Data Lake
Orchestrator
Controls
incident
Insight: Everything speaks Logs & APIs for Events & Entities
Case Manager
context
UIcase
Virtual hypergraph*
SOC-IN-A-BOX
Hypergraph:
Link events to many entities
Virtual:
Dynamically pivot over DBs, APIs
API API
API
*More useful than REST: Search, expand, …
20. G R A P H I S T R Y info@graphistry.com
Turn cols to nodes Link via Event nodes
event
Fetch log hits
(subgraph)
Filter, fluster, act,
& repeat
Example: JSON Log API <> Virtual Hypergraph
21. G R A P H I S T R Y info@graphistry.com
100X Data foraging with graph:
EASY PIVOTING WITH VIRTUAL HYPERGRAPHS
+ SCALE WITH INVESTIGATION AUTOMATION
22. G R A P H I S T R Y info@graphistry.com
Demo: Malware 360
2. Auto-expand virtual graph
23. G R A P H I S T R Y info@graphistry.com
100X foraging with virtual graph generated queries
Checks more data sources Tracks more clues In less time
Every analyst can now do SecOps:
“Record & replay” and Share Templates
Generated query for 1 Splunk pivot call
24. G R A P H I S T R Y info@graphistry.com
Management perspective: 80/20 rule for covering functional KPIs
80% of DATA
endpoint logs & alerts
user logs & alerts
server logs & alerts
network logs & alerts
service logs & alerts
ticket APIs
…
80% of INCIDENTS
malware
phishing
cloud tenant breach
app server takeover
device theft
offboarding
…
80% of TASKS
high-fidelity quick check
investigative deep dive
mitigation/containment/report
table top training
automation
...
Overdue to make investigation structured & predictable!
• Incident SLA
• Investigation depth (burnout!)
• Satellite team methodology
• …
25. G R A P H I S T R Y info@graphistry.com
Last month:
Azure
Next month:
Kusto & Sentinel
Reach out!
info@graphistry.com
26. G R A P H I S T R Y info@graphistry.com
100X Sense making with graph
27. G R A P H I S T R Y info@graphistry.com
Low-dimensional UIs are good
but sometimes too much work
for too little insight
29. G R A P H I S T R Y info@graphistry.com
Case study: Classic ML + graph analytics
PROJECT ARTEMIS
Massage parlor records & reviews:
• Normal
• Maybe illicit business
• Maybe human trafficking
30. G R A P H I S T R Y info@graphistry.com
UMAP: Classic ML likes numbers, times, pixel RGBs, scores,
…
@leland_mcinnes
31. G R A P H I S T R Y info@graphistry.com
RAPIDS UMAP layout
Tensorflow categorization
Graphistry visual analytics
Splunk data lake
regular review
potential illicit activity
potential trafficking
41K Reviews => 400 flagged
32. G R A P H I S T R Y info@graphistry.com
Graph: Top 5 most suspicious co’s,
their records, and hits on their metadata
Explainable & key entities *pop*
Graph for correlating entities across events
33. G R A P H I S T R Y info@graphistry.com
Correlated macro view better than disconnected alerts & tickets!
DEMO: 1w of FireEye HX over 546 IPs & 22 users
34. G R A P H I S T R Y info@graphistry.com
Quickly popping insights
Color by time, data source Expand 2 hops Expand by community
Color by rank, btwness, … Visual data cleaning Model tuning
35. G R A P H I S T R Y info@graphistry.com
100X Compute:
GPUs for everyone
What if we could easily compute over full datasets in subsecond?
36. G R A P H I S T R Y info@graphistry.com
Hunting:
Finally possible to do 1M+ events/entities w/ web UIs!
Ex: Bro/Zeek
(secrepo.com)
37. G R A P H I S T R Y info@graphistry.com
GPUs for everyone
2014/2015
GPU Dataframes
Graphistry NSF SBIR
2016/2017
GOAI, Apache Arrow
+ Nvidia, MapD, Blazing, …
2018/2019
RAPIDS
+ Databricks, Ursa, …
Shared GPU format,
portability (Docker, …)
Dataframes, SQL,
ML, graph, spatial,
& infra (IO, multi-gpu)
38. G R A P H I S T R Y info@graphistry.com
Faster Speeds, Real-World Benefits
cuIO/cuDF –
Load and Data Preparation cuML - XGBoost
Time in seconds (shorter is better)
cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost
Benchmark
200GB CSV dataset; Data prep includes
joins, variable transformations
CPU Cluster Configuration
CPU nodes (61 GiB memory, 8 vCPUs, 64-
bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand
network
8762
6148
3925
3221
322
213
End-to-End
my_gdf.groupby([‘src_ip’,’dest_ip’])[‘time’].plot()
39. G R A P H I S T R Y info@graphistry.com
cuGraph
Multi-GPU PageRank Performance
PageRank portion of the HiBench benchmark suite
HiBench Scale Vertices Edges CSV File
(GB)
# of GPUs PageRank for
3 Iterations (secs)
Huge 5,000,000 198,000,000 3 1 1.1
BigData 50,000,000 1,980,000,000 34 3 5.1
BigData x2 100,000,000 4,000,000,000 69 6 9.0
BigData x4 200,000,000 8,000,000,000 146 12 18.2
BigData x8 400,000,000 16,000,000,000 300 16 31.8
Graph().add_edges(my_df).pagerank()
40. G R A P H I S T R Y info@graphistry.com
graph = netflow_df.sql(“““
SELECT
sum(bytes),
min(time),
max(time)
GROUP BY src_ip, dest_ip
”””)
graphistry.plot(graph)
BlazingSQL’s C++ skips cuDF’s Python Numba JIT…
so _great_ for subsecond interactivity!
41. G R A P H I S T R Y info@graphistry.com
Closing remarks: Scaling graph _projects_
Avoid failure to launch by avoiding infra & NIH:
1d-1mo: Cloud, viz, on-the-fly compute, notebooks, API connectors
3mo-never: Graph DB, Kafka ingest, Hadoop, on-prem, custom analytics, custom UIs
Useful by design: Make user+problem #1 driver, not infra
Win ROI politics w/ cupcake principle: Big projects start as small projects
Lower switching costs by augmenting vs. replacing
Everyone used to status quo and uninterested in avoidable work..
Start w/ good champions: Ideally innovative, influential, technical, & has time
grow from there
Gartner: “85% of data science projects fail.”
42. G R A P H I S T R Y info@graphistry.com
100X investigations
Modeling: Virtual graph, hypergraphs, & automations
Insight: Graph viz + graph stats + ML
GPUs for full data pipeline! Try RAPIDS ecosystem –
cudf, blazingsql, cugraph, graphistry, …
Use data project best practices: Less fail, faster win
info@graphistry.com
• Now in Azure
• Contact for Kusto/Sentinel!
• GPU graph viz & investigation
automation