SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Choosing
the Learner
Binary Classification
Regression
Multiclass Classification
Unsupervised
Ranking
Anomaly Detection
Collaborative Filtering
Sequence Prediction
Reinforcement Learning
Representation Learning
Choosing the Learning Task
•Binary Classification
•Anomaly Detector
•Ranking
Defining Data Input
• Data Loaders (text, binary, SVM light, Transpose
loader)
•Data type
Applying Data Transforms
•Cleaning Missing data
•Dealing with categorical data
•Dealing with text data
•Data Normalization
Choosing the Learner
•Binary Classification
•Regression
•Multi class
•Unsupervised
•Ranking
•Anomaly Detection
•Collaborative Filtering
•Sequence Prediction
Choosing Output
•Save Features of a model?
•Save the model as text?
•Save Model as binary?
•Save the per-instance results?
Choosing Run Options
•Run Locally?
•Run distributed on HPC cluster?
•Are all paths in the experiment node-accessible?
•Priority?
•Max Concurrent Process?
View Results
•Too large?
•Sampled
•Right size
•Load data
•Histogram
•Per feature
•Sampled Instances
Debug and Visualize Errors
•Error in Data
•Error in Learner
•Error in Optimizer
•Error in Experimentation setup
Analyze Model Predictions
•Root cause analysis
•Grading
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Choosing the Learning Task
• Binary Classification
• Anomaly Detector
• Ranking
Defining Data Input
• Data Loaders (text, binary,
SVM light, Transpose loader)
• Data type
Applying Data Transforms
• Cleaning Missing data
• Dealing with categorical data
• Dealing with text data
• Data Normalization
Choosing the Learner
• Binary Classification
• Regression
• Multi class
• Unsupervised
• Ranking
• Anomaly Detection
• Collaborative Filtering
• Sequence Prediction
Choosing Output
• Save Features of a model?
• Save the model as text?
• Save Model as binary?
• Save the per-instance results?
Choosing Run Options
• Run Locally?
• Run distributed on HPC cluster?
• Are all paths in the
experiment node-accessible?
• Priority?
• Max Concurrent Process?
View Results
Debug and Visualize Errors
Analyze Model Predictions
Operationalizing Security
Data Science
Ram Shankar Siva Kumar (@ram_ssk)
Andrew Wicker
Microsoft
Security Data Science Projects are different
• Traditional Programming Projects: spec/prototype → implement → ship
• Data Science Projects: at each stage: relabel, refeaturize, retrain
• With data-driven features, all components drift:
• Learner: more accurate/faster/lower-memory-footprint/…
• Features: there are always better ones
• Data: all distributions drift
• Security Projects: at each stage: assess threat, build detections, respond
• All components drift:
• Threat: new attacks constantly come out;
• Detection: newer log sources
• Response: better tooling, newer TSGs
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
So wait…when do
we ship??
You ship when your solution is operational
Security
Experts
Engineers
Legal
Service
Engineers
Product
Managers
Machine
Learning
Experts
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Operational is more than your “model is working”…
Detect unusual user activity to
prevent data exfiltration
Detect unusual user activity using
Application logs, with false
positive rate < 1%, for all Azure
customers, in near real-time
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Detect unusual user activity
using Application logs,
with false positive rate < 1%,
for all Azure Customers
in near real-time
=> The Problem
=> Data
=> Model Evaluation
=> Model Deployment
=> Model Scale-out
Operationalize Security Data Science: Components
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Evaluation
How do you know your system works?
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Evaluation
Metrics
Model Usage
Metrics
Model Validation
Metrics
• E.g: False Positive
• Makes your customer (and ergo,
your business) happy
• How to measure this?
• E.g: Call Rate
• How much is the model in use?
• Makes your division happy
• Collected by your pipeline after
deployment
• E.g: MSE, Reconstruction error….
• How well does the model
generalize?
• Makes the data scientist happy
• Comes pre-built with ML
framework (Scikit learn, CNTK)
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Evaluation: How to gather Evaluation
dataset?
• Good: Use Benchmark datasets
• List of curated datasets - www.secrepo.com
• Con: Remember – attackers have ‘em too!
• Better: Use previous Indicators of Compromise
• Honeypots, commercial IOC feeds
• Steps:
• Gather confirmed IOCs
• “Backprop” them through the generated alerts
• This will help you calculate FP and FN
• Best: Curate your own dataset
MoreSpecialized
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Curating your own dataset options
1. Inject Fake Malicious data
Model
Synthetic
data
Storage
How: Label data as “eviluser” and check if “eviluser” pops
to the top of the reports every day
Pro: Low overhead—you don’t have to depend on a red
team to test your detection
Con: The injected data may not be representative of true
attacker activity
Storage
Alerting
System C
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Curating your own dataset options
2. Employ Commonly Used Attacker Tools
How: Spin up a malicious process using
Metasploit, Powersploit, or Veil in your environment.
Look for traces in your logs
Pro: Easy to implement; your development team, with
little tutorial, can run the tool, which would generate
attack data in the logs.
Con: The machine learning system, will only learn to
detect known attacker toolkits and not generalize over
the attack methodology
Model
Storage
Tainted
Data
Alerting
System
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Curating your own dataset options
3. Red Team pentests your environment
How: a red team attacks the system and we try
to get the logs from the attacks, as tainted data
Pro: Closest technique to real-world attacks
Con: Red Teams are point in time exercises;
expensive
Model
Storage
Tainted
Data
Alerting
System
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Growing your dataset: Generative Adversarial Networks
Source: https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-
of-code-pytorch-e81b79659e3f#.djcfc6eo0 Source: http://www.evolvingai.org/ppgn
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Deployment
Tailoring alerts based on customers geographic location
Azure has data centers all around the world!
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Localization affects Model Building
• Privacy Laws vary across the board
• IP address is treated as EII in some regions vs. not EII in other regions
• “Anyone logging into corporate network at midnight during the
weekend is anomalous”
• Weekend in Middle East != Weekend in Americas
• Seasonality varies
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Option 1: Shotgun Deployment
• How: Deploy same model code
across different regions
• Pros:
• Easy deployment;
• Uniform metrics
• Single TSG to debug all service incidents
• Cons:
• Lose macro trends in favor of micro
trends
• Model-Region Incompatibility Region
1
Region
2
Region
3
Model ModelModel
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Option 2: Tiered Modeling
• How:
• Federated Models
• Each region is modeled separately
• Results are scrubbed according to compliance
laws and privacy agreements
• Scrubbed results are used as input to “Model
Prime”
• Model Prime
• Results are collated to search for global trends
• Pros:
• Bespoke modeling for every region
• Balance between Micro and Macro modeling
• Cons:
• Complicated Deployment
• Depending on the agreements, model-prime
may not be possible
Region1 Region2 Region3
Model 1
Model - Prime
Model 2 Model
3
Scrubbed
Results
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Scale-Out
A Case Study
Detecting Malicious Activities
Detect risky or malicious activity
in SharePoint Online activity logs
with precision > 90%
for all SPO users
in near real-time
=> The Problem
=> Data
=> Model Evaluation
=> Model Deployment
=> Model Scale-out
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Exploratory Analysis
• Typical data science work:
• Sample data
• Script for preprocessing data
• Summary statistics
• Script for evaluating approaches
• All done locally on dev machine
using R/Python
• Facilitates quick turn around
• Avoids having to debug at scale
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Evaluation
• Labels from known incidents and investigations
• Inject labels by mimicking malicious activity
• SPO team helps us understand the malicious activity
• Red team helps us simulate the malicious activity
• > 90% precision
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model: Bayesian Network
• Probabilistic Graphical Model
• Related to GMM, CRF, MRF
• Represents variables and conditional
independence assertions in a directed
acyclic graph
• Directed edges encode conditional
dependencies
• Conditional probability distributions for
each variable
Burglary
Alarm
Mary
Calls
John
Calls
Earthquake
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Initial Prototype – v0.1
• One activity model for all users
• Run model in cloud environment with
Azure Worker Role
• Storage accounts for input data and
output scores
• Pros:
• Easy to manage
• Small memory footprint
• Cons:
• Does not scale
• Low throughput
Data
Scores
Azure
Worker
Role
Activity
Model
User 1
User 2
User 3
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Improved Approach
• One model for each user
• Personalized activity suspiciousness
• Cluster low-activity users for better
model results
• Replace storage accounts with
Azure Event Hubs
• Low-latency, cloud-scale “queues”
Azure
Worker
Role
User 1
User 2
User 3
Event
Hub
Event
Hub
Model
1
Model
2
Model
3
Model
n
…
Scores
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Scale-Out: Memory
Azure
Worker
Role
User 1
User 2
User 3
Event
Hub
Event
Hub
Model
1
Model
2
Model
3
Model
n
…
Scores
Model Storage
• Millions of per-user models
• More than can fit in worker
role memory
• Store models in storage
account
• Load as needed
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Model Scale-Out: Latency
Azure
Worker
Role
User 1
User 2
User 3
Event
Hub
Event
Hub
Model
1
Model
2
Model
3
Model
n
…
Scores
Model Storage
Redis
Cache
• Model storage account adds
too much latency
• Redis cache minimizes model
loading latency
• LRU policy as we process user
activity events
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Data Compliance
• Models can not use certain PII
• Balkanized cloud environments
• Tiered model development
• Resolve user information for UX
• UserID -> User Name
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Data Compliance
Azure
Worker
Role
User 1
User 2
User 3
Event
Hub
Event
Hub
Model
1
Model
2
Model
3
Model
n
…
Scores
Model Storage
Redis
Cache
User Account DB
Redis
Cache
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Cloud Resource Competition
Signal
1
Signal
2
Signal
3
Signal
m
User Account DB
Redis
Cache
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Cloud Resource Competition
Signal
1
Signal
2
Signal
3
Signal
m
User Account DB
Redis
Cache
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
From v0.1 to v1.0
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
Conclusion
Operationalize Security Data Science: Components
=> Model Evaluation
=> Model Deployment
=> Model Scale-out
Intro Model Evaluation Model Deployment Model Scale-out Conclusion
The Rand Test
Test to see if your Security Data Science solution operational
Answer Yes/No to the following:
1) Do you have an established pipeline to collect relevant security data?
2) Do you have established SLAs/data contracts with partner teams?
3) Can you seamlessly update the model with new features and re-train?
4) Did you evaluate the model with real attack data?
5) Does your model respect different privacy laws, across all regions?
6) Do you account for model localization?
7) Is your model scalable, end to end?
8) Do you hold live site meetings about your solution?
9) Can security responders leverage the model for insights during an
investigation?
10) Do you have a framework to collect feedback from security
analysts/feedback on the results?
By @ram_ssk, Andrew Wicker
Score - Yes = 1 point
10
5
0
All systems Operational!
Houston! We have a
problem
One small step…
Model Evaluation Model Deployment Model Scale-out
Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Weitere ähnliche Inhalte

Was ist angesagt?

우리가 몰랐던 크롬 개발자 도구
우리가 몰랐던 크롬 개발자 도구우리가 몰랐던 크롬 개발자 도구
우리가 몰랐던 크롬 개발자 도구Jae Sung Park
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020
쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020
쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020AWSKRUG - AWS한국사용자모임
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?blueace
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
우아한테크세미나-우아한멀티모듈
우아한테크세미나-우아한멀티모듈우아한테크세미나-우아한멀티모듈
우아한테크세미나-우아한멀티모듈용근 권
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...Amazon Web Services Korea
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceFarzan Hajian
 
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기Kee Hoon Lee
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systemsKapil Garg
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
법령 온톨로지의 구축 및 검색
법령 온톨로지의 구축 및 검색법령 온톨로지의 구축 및 검색
법령 온톨로지의 구축 및 검색Myungjin Lee
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈Amazon Web Services Korea
 

Was ist angesagt? (20)

TeraSort
TeraSortTeraSort
TeraSort
 
우리가 몰랐던 크롬 개발자 도구
우리가 몰랐던 크롬 개발자 도구우리가 몰랐던 크롬 개발자 도구
우리가 몰랐던 크롬 개발자 도구
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020
쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020
쿠알못이 Amazon EKS로 안정적인 서비스 운영하기 - 최용호(넥슨코리아) :: AWS Community Day 2020
 
Web mining
Web miningWeb mining
Web mining
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
우아한테크세미나-우아한멀티모듈
우아한테크세미나-우아한멀티모듈우아한테크세미나-우아한멀티모듈
우아한테크세미나-우아한멀티모듈
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
법령 온톨로지의 구축 및 검색
법령 온톨로지의 구축 및 검색법령 온톨로지의 구축 및 검색
법령 온톨로지의 구축 및 검색
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
 
Recommender system
Recommender systemRecommender system
Recommender system
 

Andere mochten auch

Strata San Jose 2016 - Reduce False Positives in Security
Strata San Jose 2016 - Reduce False Positives in Security Strata San Jose 2016 - Reduce False Positives in Security
Strata San Jose 2016 - Reduce False Positives in Security Ram Shankar Siva Kumar
 
Como seducir-y-atraer-a-una-mujer
Como seducir-y-atraer-a-una-mujerComo seducir-y-atraer-a-una-mujer
Como seducir-y-atraer-a-una-mujeryomiguelavila
 
Caja de herramientas de seguridad digital
Caja de herramientas de seguridad digitalCaja de herramientas de seguridad digital
Caja de herramientas de seguridad digitalJorge Luis Sierra
 
Venkata ramana assignment 1
Venkata ramana assignment 1Venkata ramana assignment 1
Venkata ramana assignment 1ramanavenkatt
 
Thompson libro del primer grado
Thompson libro del primer gradoThompson libro del primer grado
Thompson libro del primer gradoDavid V
 
Peegar: a new prototyping starter kit for everyone
Peegar: a new prototyping starter kit for everyonePeegar: a new prototyping starter kit for everyone
Peegar: a new prototyping starter kit for everyoneYoshitaka Taguchi
 
Estadistica descriptiva 2017
Estadistica descriptiva 2017Estadistica descriptiva 2017
Estadistica descriptiva 2017Neri Rustrian
 
Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017
 Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017 Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017
Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017pativitxeta
 
Separação de misturas.
Separação de misturas.Separação de misturas.
Separação de misturas.Lara Lídia
 
DesignWizard - A0 Design Class
DesignWizard - A0 Design ClassDesignWizard - A0 Design Class
DesignWizard - A0 Design ClassClaire O'Brien
 
A Bossa Dançante do Sambalanço - Tárik de Souza
A Bossa Dançante do Sambalanço - Tárik de SouzaA Bossa Dançante do Sambalanço - Tárik de Souza
A Bossa Dançante do Sambalanço - Tárik de SouzaalfeuRIO
 
Uso de Librerías Objective-c en Xamarin.iOS
Uso de Librerías Objective-c en Xamarin.iOSUso de Librerías Objective-c en Xamarin.iOS
Uso de Librerías Objective-c en Xamarin.iOSAlejandro Ruiz Varela
 
English for Computer unit 4 Peopleware
English for Computer unit 4 PeoplewareEnglish for Computer unit 4 Peopleware
English for Computer unit 4 Peoplewareanchalee khunseesook
 
It 004 exame pratico lp
It 004 exame pratico lpIt 004 exame pratico lp
It 004 exame pratico lpjunio Juninho
 
Tugas prakerin b. inggris kelas xi sem 6
Tugas prakerin b. inggris kelas xi sem 6Tugas prakerin b. inggris kelas xi sem 6
Tugas prakerin b. inggris kelas xi sem 6Nadia Azahra
 

Andere mochten auch (20)

Strata San Jose 2016 - Reduce False Positives in Security
Strata San Jose 2016 - Reduce False Positives in Security Strata San Jose 2016 - Reduce False Positives in Security
Strata San Jose 2016 - Reduce False Positives in Security
 
Como seducir-y-atraer-a-una-mujer
Como seducir-y-atraer-a-una-mujerComo seducir-y-atraer-a-una-mujer
Como seducir-y-atraer-a-una-mujer
 
Contaminación en los ríos de cali
Contaminación en los ríos de caliContaminación en los ríos de cali
Contaminación en los ríos de cali
 
Caja de herramientas de seguridad digital
Caja de herramientas de seguridad digitalCaja de herramientas de seguridad digital
Caja de herramientas de seguridad digital
 
Venkata ramana assignment 1
Venkata ramana assignment 1Venkata ramana assignment 1
Venkata ramana assignment 1
 
Thompson libro del primer grado
Thompson libro del primer gradoThompson libro del primer grado
Thompson libro del primer grado
 
Calendario upel iprem 2017
Calendario upel iprem 2017Calendario upel iprem 2017
Calendario upel iprem 2017
 
Peegar: a new prototyping starter kit for everyone
Peegar: a new prototyping starter kit for everyonePeegar: a new prototyping starter kit for everyone
Peegar: a new prototyping starter kit for everyone
 
Estadistica descriptiva 2017
Estadistica descriptiva 2017Estadistica descriptiva 2017
Estadistica descriptiva 2017
 
Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017
 Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017 Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017
Jornada Pedagògica "La Vitxeta es mou pel pati" 12/03/2017
 
Separação de misturas.
Separação de misturas.Separação de misturas.
Separação de misturas.
 
DesignWizard - A0 Design Class
DesignWizard - A0 Design ClassDesignWizard - A0 Design Class
DesignWizard - A0 Design Class
 
A Bossa Dançante do Sambalanço - Tárik de Souza
A Bossa Dançante do Sambalanço - Tárik de SouzaA Bossa Dançante do Sambalanço - Tárik de Souza
A Bossa Dançante do Sambalanço - Tárik de Souza
 
ChatBot
ChatBotChatBot
ChatBot
 
Tablas lenguaje
Tablas lenguajeTablas lenguaje
Tablas lenguaje
 
Uso de Librerías Objective-c en Xamarin.iOS
Uso de Librerías Objective-c en Xamarin.iOSUso de Librerías Objective-c en Xamarin.iOS
Uso de Librerías Objective-c en Xamarin.iOS
 
English for Computer unit 4 Peopleware
English for Computer unit 4 PeoplewareEnglish for Computer unit 4 Peopleware
English for Computer unit 4 Peopleware
 
It 004 exame pratico lp
It 004 exame pratico lpIt 004 exame pratico lp
It 004 exame pratico lp
 
Makalah filsafat
Makalah filsafatMakalah filsafat
Makalah filsafat
 
Tugas prakerin b. inggris kelas xi sem 6
Tugas prakerin b. inggris kelas xi sem 6Tugas prakerin b. inggris kelas xi sem 6
Tugas prakerin b. inggris kelas xi sem 6
 

Ähnlich wie Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceDamianMingle
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?Axel de Romblay
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentuŁukasz Grala
 
AP computer barron book ppt AP CS A.pptx
AP computer barron book ppt AP CS A.pptxAP computer barron book ppt AP CS A.pptx
AP computer barron book ppt AP CS A.pptxKoutheeshSellamuthu
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
 
Testing-Tools-Magnitia-Content.pdf
Testing-Tools-Magnitia-Content.pdfTesting-Tools-Magnitia-Content.pdf
Testing-Tools-Magnitia-Content.pdfAnanthReddy38
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisAdamCribbs1
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLsFANCY ARORA
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsAltuna Akalin
 

Ähnlich wie Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs (20)

Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentu
 
AP computer barron book ppt AP CS A.pptx
AP computer barron book ppt AP CS A.pptxAP computer barron book ppt AP CS A.pptx
AP computer barron book ppt AP CS A.pptx
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Testing-Tools-Magnitia-Content.pdf
Testing-Tools-Magnitia-Content.pdfTesting-Tools-Magnitia-Content.pdf
Testing-Tools-Magnitia-Content.pdf
 
Learning from data
Learning from dataLearning from data
Learning from data
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Kaggle nlp approaches
Kaggle nlp approachesKaggle nlp approaches
Kaggle nlp approaches
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLs
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomics
 

Kürzlich hochgeladen

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 

Kürzlich hochgeladen (20)

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 

Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

  • 1.
  • 2. Choosing the Learner Binary Classification Regression Multiclass Classification Unsupervised Ranking Anomaly Detection Collaborative Filtering Sequence Prediction Reinforcement Learning Representation Learning
  • 3. Choosing the Learning Task •Binary Classification •Anomaly Detector •Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) •Data type Applying Data Transforms •Cleaning Missing data •Dealing with categorical data •Dealing with text data •Data Normalization Choosing the Learner •Binary Classification •Regression •Multi class •Unsupervised •Ranking •Anomaly Detection •Collaborative Filtering •Sequence Prediction Choosing Output •Save Features of a model? •Save the model as text? •Save Model as binary? •Save the per-instance results? Choosing Run Options •Run Locally? •Run distributed on HPC cluster? •Are all paths in the experiment node-accessible? •Priority? •Max Concurrent Process? View Results •Too large? •Sampled •Right size •Load data •Histogram •Per feature •Sampled Instances Debug and Visualize Errors •Error in Data •Error in Learner •Error in Optimizer •Error in Experimentation setup Analyze Model Predictions •Root cause analysis •Grading
  • 4. Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions
  • 5. Operationalizing Security Data Science Ram Shankar Siva Kumar (@ram_ssk) Andrew Wicker Microsoft
  • 6. Security Data Science Projects are different • Traditional Programming Projects: spec/prototype → implement → ship • Data Science Projects: at each stage: relabel, refeaturize, retrain • With data-driven features, all components drift: • Learner: more accurate/faster/lower-memory-footprint/… • Features: there are always better ones • Data: all distributions drift • Security Projects: at each stage: assess threat, build detections, respond • All components drift: • Threat: new attacks constantly come out; • Detection: newer log sources • Response: better tooling, newer TSGs Intro Model Evaluation Model Deployment Model Scale-out Conclusion So wait…when do we ship??
  • 7. You ship when your solution is operational Security Experts Engineers Legal Service Engineers Product Managers Machine Learning Experts Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 8. Operational is more than your “model is working”… Detect unusual user activity to prevent data exfiltration Detect unusual user activity using Application logs, with false positive rate < 1%, for all Azure customers, in near real-time Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 9. Detect unusual user activity using Application logs, with false positive rate < 1%, for all Azure Customers in near real-time => The Problem => Data => Model Evaluation => Model Deployment => Model Scale-out Operationalize Security Data Science: Components Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 10. Model Evaluation How do you know your system works?
  • 11. Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 12. Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 13. Model Evaluation Metrics Model Usage Metrics Model Validation Metrics • E.g: False Positive • Makes your customer (and ergo, your business) happy • How to measure this? • E.g: Call Rate • How much is the model in use? • Makes your division happy • Collected by your pipeline after deployment • E.g: MSE, Reconstruction error…. • How well does the model generalize? • Makes the data scientist happy • Comes pre-built with ML framework (Scikit learn, CNTK) Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 14. Model Evaluation: How to gather Evaluation dataset? • Good: Use Benchmark datasets • List of curated datasets - www.secrepo.com • Con: Remember – attackers have ‘em too! • Better: Use previous Indicators of Compromise • Honeypots, commercial IOC feeds • Steps: • Gather confirmed IOCs • “Backprop” them through the generated alerts • This will help you calculate FP and FN • Best: Curate your own dataset MoreSpecialized Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 15. Curating your own dataset options 1. Inject Fake Malicious data Model Synthetic data Storage How: Label data as “eviluser” and check if “eviluser” pops to the top of the reports every day Pro: Low overhead—you don’t have to depend on a red team to test your detection Con: The injected data may not be representative of true attacker activity Storage Alerting System C Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 16. Curating your own dataset options 2. Employ Commonly Used Attacker Tools How: Spin up a malicious process using Metasploit, Powersploit, or Veil in your environment. Look for traces in your logs Pro: Easy to implement; your development team, with little tutorial, can run the tool, which would generate attack data in the logs. Con: The machine learning system, will only learn to detect known attacker toolkits and not generalize over the attack methodology Model Storage Tainted Data Alerting System Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 17. Curating your own dataset options 3. Red Team pentests your environment How: a red team attacks the system and we try to get the logs from the attacks, as tainted data Pro: Closest technique to real-world attacks Con: Red Teams are point in time exercises; expensive Model Storage Tainted Data Alerting System Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 18. Growing your dataset: Generative Adversarial Networks Source: https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines- of-code-pytorch-e81b79659e3f#.djcfc6eo0 Source: http://www.evolvingai.org/ppgn Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 19. Model Deployment Tailoring alerts based on customers geographic location
  • 20. Azure has data centers all around the world! Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 21. Localization affects Model Building • Privacy Laws vary across the board • IP address is treated as EII in some regions vs. not EII in other regions • “Anyone logging into corporate network at midnight during the weekend is anomalous” • Weekend in Middle East != Weekend in Americas • Seasonality varies Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 22. Option 1: Shotgun Deployment • How: Deploy same model code across different regions • Pros: • Easy deployment; • Uniform metrics • Single TSG to debug all service incidents • Cons: • Lose macro trends in favor of micro trends • Model-Region Incompatibility Region 1 Region 2 Region 3 Model ModelModel Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 23. Option 2: Tiered Modeling • How: • Federated Models • Each region is modeled separately • Results are scrubbed according to compliance laws and privacy agreements • Scrubbed results are used as input to “Model Prime” • Model Prime • Results are collated to search for global trends • Pros: • Bespoke modeling for every region • Balance between Micro and Macro modeling • Cons: • Complicated Deployment • Depending on the agreements, model-prime may not be possible Region1 Region2 Region3 Model 1 Model - Prime Model 2 Model 3 Scrubbed Results Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 25. Detecting Malicious Activities Detect risky or malicious activity in SharePoint Online activity logs with precision > 90% for all SPO users in near real-time => The Problem => Data => Model Evaluation => Model Deployment => Model Scale-out Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 26. Exploratory Analysis • Typical data science work: • Sample data • Script for preprocessing data • Summary statistics • Script for evaluating approaches • All done locally on dev machine using R/Python • Facilitates quick turn around • Avoids having to debug at scale Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 27. Model Evaluation • Labels from known incidents and investigations • Inject labels by mimicking malicious activity • SPO team helps us understand the malicious activity • Red team helps us simulate the malicious activity • > 90% precision Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 28. Model: Bayesian Network • Probabilistic Graphical Model • Related to GMM, CRF, MRF • Represents variables and conditional independence assertions in a directed acyclic graph • Directed edges encode conditional dependencies • Conditional probability distributions for each variable Burglary Alarm Mary Calls John Calls Earthquake Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 29. Initial Prototype – v0.1 • One activity model for all users • Run model in cloud environment with Azure Worker Role • Storage accounts for input data and output scores • Pros: • Easy to manage • Small memory footprint • Cons: • Does not scale • Low throughput Data Scores Azure Worker Role Activity Model User 1 User 2 User 3 Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 30. Improved Approach • One model for each user • Personalized activity suspiciousness • Cluster low-activity users for better model results • Replace storage accounts with Azure Event Hubs • Low-latency, cloud-scale “queues” Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 31. Model Scale-Out: Memory Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Model Storage • Millions of per-user models • More than can fit in worker role memory • Store models in storage account • Load as needed Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 32. Model Scale-Out: Latency Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Model Storage Redis Cache • Model storage account adds too much latency • Redis cache minimizes model loading latency • LRU policy as we process user activity events Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 33. Data Compliance • Models can not use certain PII • Balkanized cloud environments • Tiered model development • Resolve user information for UX • UserID -> User Name Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 34. Data Compliance Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Model Storage Redis Cache User Account DB Redis Cache Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 35. Cloud Resource Competition Signal 1 Signal 2 Signal 3 Signal m User Account DB Redis Cache Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 36. Cloud Resource Competition Signal 1 Signal 2 Signal 3 Signal m User Account DB Redis Cache Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 37. From v0.1 to v1.0 Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 39. Operationalize Security Data Science: Components => Model Evaluation => Model Deployment => Model Scale-out Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  • 40. The Rand Test Test to see if your Security Data Science solution operational Answer Yes/No to the following: 1) Do you have an established pipeline to collect relevant security data? 2) Do you have established SLAs/data contracts with partner teams? 3) Can you seamlessly update the model with new features and re-train? 4) Did you evaluate the model with real attack data? 5) Does your model respect different privacy laws, across all regions? 6) Do you account for model localization? 7) Is your model scalable, end to end? 8) Do you hold live site meetings about your solution? 9) Can security responders leverage the model for insights during an investigation? 10) Do you have a framework to collect feedback from security analysts/feedback on the results? By @ram_ssk, Andrew Wicker Score - Yes = 1 point 10 5 0 All systems Operational! Houston! We have a problem One small step… Model Evaluation Model Deployment Model Scale-out Intro Model Evaluation Model Deployment Model Scale-out Conclusion