SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Roy Levin, Microsoft
CyberMLToolkit:
Anomaly Detection as a Scalable
Generic Service Over Apache
Spark
#UnifiedDataAnalytics #SparkAISummit
Session goals
• Present an easy-to-use framework that produces
cyber-security-anomalies
• Explain how recommendation systems are used to
find anomalous resource access
• Show how we evaluated the framework to show its
usefulness
3
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
4
centralized cloud native
Security Information &
Event Management system
Build Your Own ML (BYOML)
1. Log data from cloud resources
2. Process logs from Azure
Databricks cluster
3. Author custom security analytics
5
6
General Anomaly Detector
Dataset
Fault
detection
System health
monitoring
Security
incidents
…
We would like to capture only
Security-related-anomalies
7
•
•
•
anomalous access
• Train and apply on a simple-to-construct dataset
– Avoid writing and maintaining complex rules and logic
– Avoid the need to analyze multiple complex datasets such as:
§ Org-charts
§ RBAC tables
§ Cloud architectures
8
?
9
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
10
• Given user & resource pair (u, r)
• Provide an anomaly score of user u accessing resource r
• If anomaly score is above some threshold then surface the event
11
?
The straight forward approach
But users access new resources quite
often, so this is just not good enough
12
?Create profile per user and
resource and see if access
deviates from that profile
13
Intuition:
• Take a recommendation system and use it for anti-recommendations
14
Recommendation Engines
15
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 4 5
The Dark Knight2 3 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 3 3
Analyze That5 3 5 4 4
Anger Management6 3 5 5
Black Hawk Down7 5 4
Model Training Phase
Movie Recommendations
16
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 ? 4 ? 5 ? ?
The Dark Knight2 3 ? ? ? 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 ? ? 3 3
Analyze That5 3 5 4 ? 4 ?
Anger Management6 3 5 ? ? ? 5
Black Hawk Down7 5 ? ? 4 ? ?
Romance Action Comedy
x1
x2
xm
f1 f2 f3
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
f1 ? ? ? ? ? ?
f2 ? ? ? ? ? ?
f3 ? ? ? ? ? ?
𝜃"
Romance
Action
Comedy
𝜃# 𝜃$
Model Training Phase
Movie Recommendations
17
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 ? 4 ? 5 ? ?
The Dark Knight2 3 ? ? ? 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 ? ? 3 3
Analyze That5 3 5 4 ? 4 ?
Anger Management6 3 5 ? ? ? 5
Black Hawk Down7 5 ? ? 4 ? ?
Romance Action Comedy
x1
x2
xm
f1 f2 f3
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
Model Training Phase
Movie Recommendations
18
f1 ? ? ? ? ? ?
f2 ? ? ? ? ? ?
f3 ? ? ? ? ? ?
𝜃"
Romance
Action
Comedy
𝜃# 𝜃$
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 ? 4 ? 5 ? ?
The Dark Knight2 3 ? ? ? 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 ? ? 3 3
Analyze That5 3 5 4 ? 4 ?
Anger Management6 3 5 ? ? ? 5
Black Hawk Down7 5 ? ? 4 ? ?
Romance Action Comedy
x1
x2
xm
f1 f2 f3
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
Model Apply Phase
Movie Recommendations
f1 ? ? ? ? ? ?
f2 ? ? ? ? ? ?
f3 ? ? ? ? ? ?
𝜃"
Romance
Action
Comedy
𝜃# 𝜃$
Back to
Anomalous Resource Access
20
• Let us re-examine our data:
– User-resource pairs with number of times accessed
• Standard CF model assumes explicit item ratings, some problems:
– A rating is not really what we have in the input
• Although more user access to a resource likely means he should be allowed access
– We do not really have negative rating indications either, i.e., there is no explicit
indicator saying that a user should not have access to some resource
• what we do have is missing access
21
user1 user2 user3 user4 user5 user6
resource1 1200 1500
resource2 900 301 1
resource3 1500 599 1 902 1205 1500
resource4 299 1200 895 901
resource5 601 1500 1200 1203
resource6 603 1499 1495
resource7 1499 1200
user1 user2 user3 user4 user5 user6
resource1 9 10
resource2 8 6 5
resource3 10 7 5 8 9 10
resource4 6 9 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9
Linear Scaling
22
user1 user2 user3 user4 user5 user6
resource1 9 10
resource2 8 6 5
resource3 10 7 5 8 9 10
resource4 6 9 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9
Random Negative Samples
23
user1 user2 user3 user4 user5 user6
resource1 1 9 10
resource2 8 1 6 5
resource3 10 7 5 8 9 10
resource4 6 9 1 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9 1
Random Negative Samples
24
user1 user2 user3 user4 user5 user6
resource1 1 9 10
resource2 8 1 6 5
resource3 10 7 5 8 9 10
resource4 6 9 1 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9 1
Adjusting for user & resource bias and create an anomaly score
−
25
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
26
• Actually: we are given a tenant-id, user, resource triplet (tid, u, r)
• Provide anomaly score of user u accessing resource r per-tenant
• Note: access within each tenant is isolated
• Goals:
– Process tenants in parallel
– Cope with data from large tenants
27
• Create a PUDF which uses the Surprise Python library to run the
CF algorithm locally on each worker node
• Provided PUDF works on Pandas-DFs that are created per-group
when apply is called
• The method is applied as follows:
– df.groupBy(tid_colname).apply(my_pudf)
* SurPRISE: Simple Python RecommendatIon System Engine http://surpriselib.com/
28
• Problem: the data from some tenants may be too large to fit into
the memory of a single worker node
• Solution: before applying, count number of entries per-tenant
– If number of entries can fit in-memory then apply PUDF method
– If not, then apply Spark CF, per tenant, one-by-one
29
• Training produces a model which is basically
– A dataframe mapping (tenant-id, user) and (tenant-id, resource) pairs to
their corresponding latent feature vectors
• Applying the model requires:
– Joining with respective user/resource to retrieve vectors
– Applying a dot-product
* Note: model can be applied with Structured Streaming
30
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
31
Experiments for Azure Sentinel AI
1. Synthetic dataset
2. Actual file share data from large customer
• Users accessing shared network files
32
For training
33
Add cross
group access
For testing
1.
2.
34
Results
100%, i.e. all 100 cross group access
receives top-100 anomaly scores!
Add cross
group access
35
File Share SMB server
Actual Attack Description
shares
Machine 1
shares
Machine 2
shares
Machine n
58% of companies have over 100,000 folders open to everyone within the network
(source: Varonis cybersecurity data security and analytics)
36
Algorithm Training
shares
Machine 1
shares
Machine 2
shares
Machine n
37
Testset (2 days after training)
shares
Machine 1
shares
Machine 2
shares
Machine n
38
Results
dataset/anomaly
scores
Mean stddev min Max count
Entire test set 0.05 1.16 -19.21 8.07 3.8M
𝑼𝒏𝒔𝒆𝒆𝒏 𝒗𝒂𝒍𝒊𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 -0.28 0.38 -1.2 1.18 410
𝑹𝒆𝒔𝒕𝒓𝒊𝒄𝒕𝒆𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 7.81 0.11 7.44 8.07 400
39
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
40
41
from sentinel_ai.peer_anomaly.spark_collaborative_filtering import AccessAnomaly
access_anomaly = AccessAnomaly( # it is just an estimator
tenant_colname,
user_colname,
res_colname,
score_colname
)
anom_model = access_anomaly.fit(training_dataset_scored_triplets)
scored_test_dataset_triplets = anom_model.transform(test_dataset_triplets)
scored_test_dataset_triplets.show()
https://github.com/Azure/Azure-Sentinel-BYOML
• Introduced an Access Anomaly Detection framework for cyber
security and how it fits into the BYOML pillar of Azure Sentinel
– an anti-recommendation is an access-anomaly
– code has been open sourced
• The framework provides a simple-to-use API allowing security
analysts to surface access anomalies
• Call-to-action: experiment with the framework, continue this line
of research, suggest and add more algorithm
42
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Weitere ähnliche Inhalte

Was ist angesagt?

Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at ScaleDatabricks
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinDatabricks
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherDatabricks
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowDatabricks
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Databricks
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Databricks
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Databricks
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaSpark Summit
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks
 

Was ist angesagt? (20)

Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark Together
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflow
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 

Ähnlich wie CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark

2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class Nine2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class NineFRSecure
 
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial EmulationScott Sutherland
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisMatt Stubbs
 
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...Precisely
 
2018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 92018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 9FRSecure
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapFelipe Prado
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for CybersecurityVMware Tanzu
 
How i'm going to own your organization v2
How i'm going to own your organization v2How i'm going to own your organization v2
How i'm going to own your organization v2RazorEQX
 
Mining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVDMining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVDLoren Gordon
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Precisely
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsEmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsFaithWestdorp
 
BSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysBSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysJoff Thyer
 
RIoT (Raiding Internet of Things) by Jacob Holcomb
RIoT  (Raiding Internet of Things)  by Jacob HolcombRIoT  (Raiding Internet of Things)  by Jacob Holcomb
RIoT (Raiding Internet of Things) by Jacob HolcombPriyanka Aash
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies台灣資料科學年會
 
2020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 92020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 9FRSecure
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READZachary S. Brown
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...Mike Spaulding
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j
 

Ähnlich wie CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark (20)

2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class Nine2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class Nine
 
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
 
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
 
2018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 92018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 9
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
How i'm going to own your organization v2
How i'm going to own your organization v2How i'm going to own your organization v2
How i'm going to own your organization v2
 
Mining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVDMining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVD
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
 
Become a Cloud Security Ninja
Become a Cloud Security NinjaBecome a Cloud Security Ninja
Become a Cloud Security Ninja
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsEmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
 
BSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysBSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad Guys
 
RIoT (Raiding Internet of Things) by Jacob Holcomb
RIoT  (Raiding Internet of Things)  by Jacob HolcombRIoT  (Raiding Internet of Things)  by Jacob Holcomb
RIoT (Raiding Internet of Things) by Jacob Holcomb
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
2020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 92020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 9
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READ
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Roy Levin, Microsoft CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark #UnifiedDataAnalytics #SparkAISummit
  • 3. Session goals • Present an easy-to-use framework that produces cyber-security-anomalies • Explain how recommendation systems are used to find anomalous resource access • Show how we evaluated the framework to show its usefulness 3
  • 4. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 4
  • 5. centralized cloud native Security Information & Event Management system Build Your Own ML (BYOML) 1. Log data from cloud resources 2. Process logs from Azure Databricks cluster 3. Author custom security analytics 5
  • 6. 6 General Anomaly Detector Dataset Fault detection System health monitoring Security incidents … We would like to capture only Security-related-anomalies
  • 8. anomalous access • Train and apply on a simple-to-construct dataset – Avoid writing and maintaining complex rules and logic – Avoid the need to analyze multiple complex datasets such as: § Org-charts § RBAC tables § Cloud architectures 8
  • 9. ? 9
  • 10. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 10
  • 11. • Given user & resource pair (u, r) • Provide an anomaly score of user u accessing resource r • If anomaly score is above some threshold then surface the event 11
  • 12. ? The straight forward approach But users access new resources quite often, so this is just not good enough 12
  • 13. ?Create profile per user and resource and see if access deviates from that profile 13
  • 14. Intuition: • Take a recommendation system and use it for anti-recommendations 14
  • 16. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 4 5 The Dark Knight2 3 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 3 3 Analyze That5 3 5 4 4 Anger Management6 3 5 5 Black Hawk Down7 5 4 Model Training Phase Movie Recommendations 16
  • 17. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 ? 4 ? 5 ? ? The Dark Knight2 3 ? ? ? 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 ? ? 3 3 Analyze That5 3 5 4 ? 4 ? Anger Management6 3 5 ? ? ? 5 Black Hawk Down7 5 ? ? 4 ? ? Romance Action Comedy x1 x2 xm f1 f2 f3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? f1 ? ? ? ? ? ? f2 ? ? ? ? ? ? f3 ? ? ? ? ? ? 𝜃" Romance Action Comedy 𝜃# 𝜃$ Model Training Phase Movie Recommendations 17
  • 18. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 ? 4 ? 5 ? ? The Dark Knight2 3 ? ? ? 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 ? ? 3 3 Analyze That5 3 5 4 ? 4 ? Anger Management6 3 5 ? ? ? 5 Black Hawk Down7 5 ? ? 4 ? ? Romance Action Comedy x1 x2 xm f1 f2 f3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Model Training Phase Movie Recommendations 18 f1 ? ? ? ? ? ? f2 ? ? ? ? ? ? f3 ? ? ? ? ? ? 𝜃" Romance Action Comedy 𝜃# 𝜃$
  • 19. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 ? 4 ? 5 ? ? The Dark Knight2 3 ? ? ? 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 ? ? 3 3 Analyze That5 3 5 4 ? 4 ? Anger Management6 3 5 ? ? ? 5 Black Hawk Down7 5 ? ? 4 ? ? Romance Action Comedy x1 x2 xm f1 f2 f3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Model Apply Phase Movie Recommendations f1 ? ? ? ? ? ? f2 ? ? ? ? ? ? f3 ? ? ? ? ? ? 𝜃" Romance Action Comedy 𝜃# 𝜃$
  • 21. • Let us re-examine our data: – User-resource pairs with number of times accessed • Standard CF model assumes explicit item ratings, some problems: – A rating is not really what we have in the input • Although more user access to a resource likely means he should be allowed access – We do not really have negative rating indications either, i.e., there is no explicit indicator saying that a user should not have access to some resource • what we do have is missing access 21
  • 22. user1 user2 user3 user4 user5 user6 resource1 1200 1500 resource2 900 301 1 resource3 1500 599 1 902 1205 1500 resource4 299 1200 895 901 resource5 601 1500 1200 1203 resource6 603 1499 1495 resource7 1499 1200 user1 user2 user3 user4 user5 user6 resource1 9 10 resource2 8 6 5 resource3 10 7 5 8 9 10 resource4 6 9 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 Linear Scaling 22
  • 23. user1 user2 user3 user4 user5 user6 resource1 9 10 resource2 8 6 5 resource3 10 7 5 8 9 10 resource4 6 9 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 Random Negative Samples 23
  • 24. user1 user2 user3 user4 user5 user6 resource1 1 9 10 resource2 8 1 6 5 resource3 10 7 5 8 9 10 resource4 6 9 1 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 1 Random Negative Samples 24
  • 25. user1 user2 user3 user4 user5 user6 resource1 1 9 10 resource2 8 1 6 5 resource3 10 7 5 8 9 10 resource4 6 9 1 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 1 Adjusting for user & resource bias and create an anomaly score − 25
  • 26. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 26
  • 27. • Actually: we are given a tenant-id, user, resource triplet (tid, u, r) • Provide anomaly score of user u accessing resource r per-tenant • Note: access within each tenant is isolated • Goals: – Process tenants in parallel – Cope with data from large tenants 27
  • 28. • Create a PUDF which uses the Surprise Python library to run the CF algorithm locally on each worker node • Provided PUDF works on Pandas-DFs that are created per-group when apply is called • The method is applied as follows: – df.groupBy(tid_colname).apply(my_pudf) * SurPRISE: Simple Python RecommendatIon System Engine http://surpriselib.com/ 28
  • 29. • Problem: the data from some tenants may be too large to fit into the memory of a single worker node • Solution: before applying, count number of entries per-tenant – If number of entries can fit in-memory then apply PUDF method – If not, then apply Spark CF, per tenant, one-by-one 29
  • 30. • Training produces a model which is basically – A dataframe mapping (tenant-id, user) and (tenant-id, resource) pairs to their corresponding latent feature vectors • Applying the model requires: – Joining with respective user/resource to retrieve vectors – Applying a dot-product * Note: model can be applied with Structured Streaming 30
  • 31. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 31
  • 32. Experiments for Azure Sentinel AI 1. Synthetic dataset 2. Actual file share data from large customer • Users accessing shared network files 32
  • 34. Add cross group access For testing 1. 2. 34
  • 35. Results 100%, i.e. all 100 cross group access receives top-100 anomaly scores! Add cross group access 35
  • 36. File Share SMB server Actual Attack Description shares Machine 1 shares Machine 2 shares Machine n 58% of companies have over 100,000 folders open to everyone within the network (source: Varonis cybersecurity data security and analytics) 36
  • 38. Testset (2 days after training) shares Machine 1 shares Machine 2 shares Machine n 38
  • 39. Results dataset/anomaly scores Mean stddev min Max count Entire test set 0.05 1.16 -19.21 8.07 3.8M 𝑼𝒏𝒔𝒆𝒆𝒏 𝒗𝒂𝒍𝒊𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 -0.28 0.38 -1.2 1.18 410 𝑹𝒆𝒔𝒕𝒓𝒊𝒄𝒕𝒆𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 7.81 0.11 7.44 8.07 400 39
  • 40. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 40
  • 41. 41 from sentinel_ai.peer_anomaly.spark_collaborative_filtering import AccessAnomaly access_anomaly = AccessAnomaly( # it is just an estimator tenant_colname, user_colname, res_colname, score_colname ) anom_model = access_anomaly.fit(training_dataset_scored_triplets) scored_test_dataset_triplets = anom_model.transform(test_dataset_triplets) scored_test_dataset_triplets.show() https://github.com/Azure/Azure-Sentinel-BYOML
  • 42. • Introduced an Access Anomaly Detection framework for cyber security and how it fits into the BYOML pillar of Azure Sentinel – an anti-recommendation is an access-anomaly – code has been open sourced • The framework provides a simple-to-use API allowing security analysts to surface access anomalies • Call-to-action: experiment with the framework, continue this line of research, suggest and add more algorithm 42
  • 43. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT