The market is overflowing with vendors who are out to build—wherein, graphs are used in the Detection phase. This session showcases the collaborative efforts between Azure Security Data Science, Microsoft Research, Azure Security Assurance and Microsoft’s Threat Intelligence Center to explore the idea of using graphs during/after the Incident Response phase, wherein the IOCs have been (or in the process of being) collected. At the end of the session, audience will gain insights from their incident response process using open source tools and take steps towards automating them.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Transforming incident Response to Intelligent Response using Graphs
1. TRANSFORMING INCIDENT RESPONSE
TO INTELLIGENT RESPONSE USING
GRAPHICALANALYSIS
RAM SHANKAR SIVA KUMAR
SECURITY DATA WRANGLER
MICROSOFT (AZURE SECURITY DATA SCIENCE)
PETER CAP
SENIOR THREATANALYST
MICROSOFT (THREAT INTELLIGENCE CENTER)
5. Team Person Expertise
Microsoft Threat Intelligence
Center
Peter Cap
Abhijeet Hatekar
Security Incident Response
Microsoft Research Danyel Fisher Visualization
Azure Security Thomas Garnier Engineering
Azure Security Data Science Ram Shankar Siva Kumar Data Science
Sharepoint Online Matt Swann Security
6. BOTTOM LINE UPFRONT
Close the Incident Response loop with the data owners
Using simple graph measures and matching algorithms, we can gain
insights into the Incident Response process
7. AGENDA
How graphs are currently, used in the Industry
Current pain points in Incident Response
Demo!
How graphs can help
Conclusion
12. PAIN POINTS
Investigation spans days to
months
Query different log sources,
minting different IOCs
Fighting fires all the time
Is there a story?
What is the big
picture?
What was the most
“important” log
source/IOC?
Are there any
patterns in how we
use our logs?
17. SYSTEM COMPONENTS
1) Data Aggregator: Collect the required information as your investigation
proceeds
Result is a table of IOC and log sources
2) Data Clean up: Covert into XML format with appropriate tags
3) Ingesting into visualization platform: d3.js
4) Incorporating the necessary libraries for computation:
18. MODELING DATA WITH GRAPHS…
Graphs are suitable for
capturing arbitrary
relations between the
various elements. VertexElement
Element’s Attributes
Relation Between
Two Elements
Type Of Relation
Vertex Label
Edge Label
Edge
Data Instance Graph Instance
Provide enormous flexibility for modeling the
underlying data as they allow the modeler to decide on
what the elements should be and the type of relations
to be modeled
Source: Lectures by George Karypsis/
19. Graphs in
IR
INTELLIGENT RESPONSE USING GRAPHS
Graph Theoretic
Measures
Contextual Visualization
Graph Mining
• Is there a story?
• What is the big picture?
Which log
source/IOC was
critical to the
investigation?
Is there a pattern to our log
usage?
24. FUTURE WORK
Once we have collected a corpus of response graphs, Can we tell if the attack at hand,
resembles previous attacks?
• Motivation: Finding inherent regularities in data in the DIFFERENT graphs
• Step 1: Store all IR graphs in graph database
• Step 2: Examine if query graph at hand, is part of graph database using sub
query graph graph database
Source: Lectures by George Karypsis/
25. WORDS OF WISDOM
Open Source Tools:
yEd – For graph drawing and Layout
Gephi – For graph analysis
neO4j – For graph database
Scale:
• Need to do some sortof clustering
Cyclic graphs:
• Some of the analysisbreaks.You can cheat by introducingduplicatenodes
Play around and
try a lot of
things!
26. CONCLUSION
There are three benefits to using graphs in IR
1. Contextual visualization
2. Simple graph measures to close feedback with data owners
3. Graph Mining to find inherent patterns in the Incident Response process
10/14/2015 26
27. ADDITIONAL RESOURCES
1) Kuramochi, Michihiro, and GeorgeKarypis."Finding frequent patternsin a largesparse
graph*." Data mining and knowledgediscovery 11.3 (2005): 243-271.
http://glaros.dtc.umn.edu/gkhome/fetch/papers/sigramDMKD05.pdf
2) Jiang, Chuntao, Frans Coenen, and Michele Zito."A surveyof frequentsubgraphmining
algorithms."The Knowledge EngineeringReview 28.01 (2013): 75-105.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.2712&rep=rep1&type=
pdf
3) Templatecode for Centrality measures
http://nodexl.codeplex.com/SourceControl/latest
4) Templatecode for Cola Visualization- http://marvl.infotech.monash.edu/webcola/
5) Blog post by John Lambert
10/14/2015 27
Our first Cross Company Red-Blue Engagement. Microsoft is super siloed (Big joint exercise; 6 Red teams going after 8 online services) Spanned 50 participants and supporting roles from the red and blue teams across all Online Services.
Ram and I are from the blue team, and our task was to find out what the red team had done.
Good news: Because all the blue teams worked together we were able to catch the red team in record time.
Bad news: We were struggling to articulate what the big picture is. At the end of the investigation, across the 8 online services, we used 18 log sources to find 73 pieces of evidence. Which Log source was super important in driving the investigation? What are the key takeaways from the investigation?
We wanted a tool that could answer all these questions
HeatRay paper from Microsoft Research tool developed by John Lambert and Matt Thomilson looked for EoP…done all the way back in 2009
Nodes represent machines, arrows represent connection from one machine to another.
Strata conference this year flooded with “Using graphs for catching APT” using Attack graphs.
Expectation is everything sorted for you
Each dog is an IOC – Now go and chase them.
Question is: Can we transform this chaotic process into a structured process?
Resolution: 1080p, Chrome Zoom: 75%
The process is represented as directed graph. A round vertex represents the tool used to mine for new IOC; a rectangular vertex represents the IOC that was minted from the round vertex. Since this is a directed graph, there edges are arrows such that the direction of the arrowhead denotes the direction of the result: an incoming arrow to a vertex represents that it was fed as input; an outgoing vertex represents the result from the analysis.
Graphs are a natural extension of Data Analysis. Every element of a graph, can be mapped back to retro data-mining
Some of the vertices can be attributes themselves (computer has file; file has atr1; file has attr2)
- You need to identify the drivers; initially elements can be anything- What is the importance of Relationship between two elements (list view is NOT apparent with relationship)
Link to slide 10 (Pain points slide)
Hierarchical representation shows a “timeline view” – from the IOC that started this investigation, all the way to catching the adversary
Flow Layout – Imagine the nodes are objects connected to each other using springs. This layout, ensures that the graph is stable just like physical force of mass is concentrated
You get hierarchical and Flow for Free if you use d3.js
Cola Layout - cola.js is an open-source JavaScript library CoLa achieves higher quality layout because it stops with local minima
It is much more stable in interactive applications (no "jitter");
it allows user specified constraints such as alignments and grouping;
-> We said “all directed edges should point down”
Degree centrality: An important node is involved in large number of interactions
High Indegree = prominent (like many ppl nominating the same person for an award; or highly cited paper); high out degree = influential (twitter follower)
Betweenness centrality:"An important node will lie on a high proportion of paths between other nodes in the network."
Size of the node proportional to the score
Source: Lectures by Jiawei Han & Micheline Kamber
What to expect in your current investigation based on previous investigation
This is typically tribal knowledge – objective way of “doing investigation” – This step is hard vs. easy
Have some concrete examples (if we have time)
But Must show class of solutions to the problems we faced