SlideShare a Scribd company logo
1 of 30
Sta$s$cal 
Learning 
Based 
Anomaly 
Detec$on 
@ 
Twi9er 
Arun Kejariwal 
(@arun_kejariwal) 
Joint work with Jordan Hochenbaum and Owen Vallis 
November 2014
Internet 
trends 
• Real-time 
[1] 
h9p://techcrunch.com/2014/05/05/amazon-­‐extends-­‐its-­‐shopping-­‐cart-­‐to-­‐twi9er/ 
AK 
2 
[1]
Twi9er: 
Global 
Town 
Square 
AK 
3
Data 
Fidelity 
• Data-driven decision making 
q Evolving product landscape 
• Data partners 
q Nielsen 
q Dataminr 
• Operational 
q Performance and Availability 
AK 
4
Data 
Fidelity: 
Challenges 
• Anomalies 
q Exogenic factors 
§ User behavior 
§ Events 
§ Data center 
q Endogenic factors 
§ Agile development 
o Fail fast 
§ Data collection 
• Millions of time series [1,2] 
q Scalability 
AK 
5 
[1] 
h9p://strata.oreilly.com/2013/09/how-­‐twi9er-­‐monitors-­‐millions-­‐of-­‐$me-­‐series.html 
[2] 
h9p://strataconf.com/strata2014/public/schedule/detail/32431
Anomaly 
Detec$on: 
Why 
Bother? 
• Analyze User Engagement 
q Events 
§ Super Bowl, Japanese New Year 
q Year over year analysis (input to forecasting) 
• Identify Attacks 
q DoS 
q Malware attacks 
• Identify Bots 
q Separating actual users from spam 
AK 
6
Anomaly 
Detec$on 
• Visual 
q Prone to errors 
q Not scalable 
§ Machine generated data 
11% of the digital universe in 2005 
to > 40% by 2020 [1] 
§ Cloud Infrastructure 2013-2017 CAGR ~50% [2] 
• Algorithmic approach 
q Automate! 
[1] 
h9p://www.emc.com/about/news/press/2012/20121211-­‐01.htm 
AK 
7 
[2] 
h9p://www.forbes.com/sites/gilpress/2013/12/12/16-­‐1-­‐billion-­‐big-­‐data-­‐market-­‐2014-­‐predic$ons-­‐from-­‐idc-­‐and-­‐iia/
Anomaly 
Detec$on: 
Background 
• Over 50 years of research [1] 
q Statistics 
§ Extreme Value Theory 
§ Robust Statistics, Grubb’s Test, ESD 
q Econometrics 
q Finance 
§ Value at Risk (VaR) 
q Signal Processing 
q Music Information Retrieval 
q Networking 
q E- Commerce 
q Performance Regression 
[1] 
“Anomaly 
Detec$on” 
by 
Chandola 
et 
al. 
ACM 
Compu$ng 
Surveys, 
2009. 
AK 
8 
Jon 
from 
Etsy 
Toufic 
from 
Metafor
Anomaly 
Detec$on: 
Overview 
• Definition 
q “An anomaly is an observation that deviates so much from other observations so 
as to arouse suspicions that it is was generated by a different mechanism” [1,2] 
[1] 
“Iden$fica$on 
of 
outliers” 
by 
Hawkins, 
Douglas 
M. 
London: 
Chapman 
and 
Hall, 
1980. 
AK 
9 
[2] 
“Outlier 
Analysis” 
by 
Charu 
C. 
Aggarwal. 
Springer, 
2013.
Anomaly 
Detec$on 
• Characterization 
q Magnitude 
q Width 
q Frequency 
q Direction 
AK 
10
Anomaly 
Detec$on 
(contd.) 
• Two flavors 
q Global 
§ Max Value 
q Local 
§ Intra-day 
AK 
11 
Global 
Local
Anomaly 
Detec$on 
(contd.) 
• Traditional Approaches 
q Metrics 
§ Mean μ 
§ Variance σ 
q Rule of thumb 
§ μ + 3*σ 
q Which time series? 
§ Raw 
§ Moving Averages 
o SMA, EWMA, PEWMA 
AK 
12 
3 * σ
Anomaly 
Detec$on 
(contd.) 
• Impact of multi-modal distribution 
q μ Shift ~ 0.2% 
q Inflates σ by 4.5% 
§ Miss quite a few anomalies 
q What do multiple modes correspond to? 
§ Seasonality 
AK 
13
• Robust Statistics 
q MAD 
§ Robust Breakdown point 
o Median 50% vs. Mean 0% 
q σMAD 
§ K = 1.4826 for normally distributed data 
AK 
14 
Anomaly 
Detec$on 
(contd.)
• Limitations of using MAD 
AK 
15 
Anomaly 
Detec$on 
(contd.)
• Grubb’s Test 
q Critical value is derived from data using a statistical confidence (α) 
• Limitations 
q Assumes data distribution is normal 
q Good for detecting ONLY 1 outlier 
q Seasonality unaware 
AK 
16 
Anomaly 
Detec$on 
(contd.)
• ESD (Generalized Extreme Studentized Deviate) [1] 
q Critical value (λi) re-calculated every iteration 
q Largest i such that Ri > λi determines # of anomalies 
q An upper-bound on the number of anomalies is an input parameter 
• Limitations 
q Generalized ESD assumes a “normal” distribution 
q Seasonality unaware 
AK 
17 
Anomaly 
Detec$on 
(contd.) 
[1] 
Rosner, 
Bernard. 
“Percentage 
Points 
for 
a 
Generalized 
ESD 
Many-­‐outlier 
Procedure.” 
Technometrics 
25, 
no. 
2 
(1983): 
165–172.
Our 
Approach
• Addressing Seasonality 
q Key Idea 
§ Time Series Decomposition 
AK 
19 
Anomaly 
Detec$on 
(contd.)
• Determining seasonal component 
q Regression on sub-cycle plots [1] 
AK 
20 
Anomaly 
Detec$on 
(contd.) 
[1] 
“STL: 
A 
seasonal-­‐trend 
decomposi$on 
procedure 
based 
on 
loess” 
by 
Cleveland, 
et 
al. 
Journal 
of 
Official 
Sta$s$cs, 
Vol. 
6, 
Issue 
1, 
1990.
• Impact of removal of seasonal and trend 
q Transforms our multi-modal data into unimodal data. 
§ Amenable to ESD/MAD! 
AK 
21 
Anomaly 
Detec$on 
(contd.) 
The decomposed Residual 
becomes "Uni-modal". This 
significantly shrinks the value of 
sigma. 
The original "Multi-Modal" 
Raw Data has a much wider 
value for sigma, leading ESD 
to miss a lot of the outliers.
Trend Smoothing Distortion 
Creates “Phantom” Anomalies 
• Challenges remain! 
AK 
22 
Anomaly 
Detec$on 
(contd.)
• Marrying Robust Statistics with Seasonal Decomposition 
AK 
23 
Anomaly 
Detec$on 
(contd.) 
Median is Free from Distortion
• Applying ESD on the Residual 
AK 
24 
Anomaly 
Detec$on 
(contd.) 
Decomposition Exposes Anomalies
• Recap 
q Extract the seasonal component using STL 
§ Filters out periodic spikes 
q Residual = Raw - Seasonalraw- Medianraw 
q Run ESD on residual (using median and MAD) 
AK 
25 
Anomaly 
Detec$on 
(contd.)
• Illustrative example 
AK 
26 
Anomaly 
Detec$on 
(contd.)
• Applications 
q Three perspectives 
§ Capacity 
o CPU utilization 
o Garbage collection 
o Network activity 
§ User behavior 
o Events 
• Impressions 
• Link clicks 
o Spam 
§ Forecasting 
AK 
27 
Anomaly 
Detec$on 
(contd.)
• Deployed in production 
q Used by large number of services at Twitter 
q Automatic e-mail notification 
§ Only sent if anomalies are present 
§ Anomalies annotated 
§ CSV with anomaly locations attached 
AK 
28 
Anomaly 
Detec$on 
(contd.)
• Skyline from Etsy 
q https://github.com/etsy/skyline/blob/master/src/analyzer/algorithms.py 
• Coming soon! 
q R package 
AK 
29 
Open 
Sourcing
Join 
the 
Flock 
Like 
problem 
solving? 
Like 
challenges? 
Be 
at 
cukng 
Edge 
Make 
an 
impact 
• We are hiring!! 
q https://twitter.com/JoinTheFlock 
q https://twitter.com/jobs 
q Contact us: @arun_kejariwal 
AK 
30

More Related Content

What's hot

제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발BOAZ Bigdata
 
나 혼자 한다: 개발자가 창업을 하면 벌어지는 일
나 혼자 한다: 개발자가 창업을 하면 벌어지는 일나 혼자 한다: 개발자가 창업을 하면 벌어지는 일
나 혼자 한다: 개발자가 창업을 하면 벌어지는 일Hyeonjong Gim
 
‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석
‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석
‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석Hyejeong Song
 
Peter Thiel's Venture Capital Pitch Deck Template
Peter Thiel's Venture Capital Pitch Deck TemplatePeter Thiel's Venture Capital Pitch Deck Template
Peter Thiel's Venture Capital Pitch Deck TemplateAA BB
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
 
UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선]
UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선] UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선]
UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선] RightBrain inc.
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録
#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録
#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録長野市議会議員小泉一真
 
BuzzFeed Pitch Deck
BuzzFeed Pitch DeckBuzzFeed Pitch Deck
BuzzFeed Pitch Deckstartuphome
 
[데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기
[데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기 [데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기
[데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기 choi kyumin
 
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...AA BB
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentAlexey Grigorev
 
데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)
데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)
데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)Minwoo Kim
 
데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스
데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스
데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스Minwoo Kim
 
서비스 기획자의 데이터 분석
서비스 기획자의 데이터 분석서비스 기획자의 데이터 분석
서비스 기획자의 데이터 분석YOO SE KYUN
 
['18여기컨] 스타트업 기획자의 월화수목금
['18여기컨] 스타트업 기획자의 월화수목금['18여기컨] 스타트업 기획자의 월화수목금
['18여기컨] 스타트업 기획자의 월화수목금Susie Lee
 
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스BOAZ Bigdata
 

What's hot (20)

Chse certificate
Chse certificateChse certificate
Chse certificate
 
Notion
NotionNotion
Notion
 
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
 
나 혼자 한다: 개발자가 창업을 하면 벌어지는 일
나 혼자 한다: 개발자가 창업을 하면 벌어지는 일나 혼자 한다: 개발자가 창업을 하면 벌어지는 일
나 혼자 한다: 개발자가 창업을 하면 벌어지는 일
 
‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석
‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석
‘서울 어린이 대공원’의 기간별 주차장 이용 현황 분석
 
Peter Thiel's Venture Capital Pitch Deck Template
Peter Thiel's Venture Capital Pitch Deck TemplatePeter Thiel's Venture Capital Pitch Deck Template
Peter Thiel's Venture Capital Pitch Deck Template
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선]
UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선] UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선]
UX 아카데미 오픈프로젝트 [토스 모바일앱 - UX/UI 개선]
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
task.ly pitch deck
task.ly pitch decktask.ly pitch deck
task.ly pitch deck
 
#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録
#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録
#公園廃止 は1軒のクレームから: 子どもの「手を引いて看板の前に立たせ」令和3年10月12日会議子ども政策課記録
 
BuzzFeed Pitch Deck
BuzzFeed Pitch DeckBuzzFeed Pitch Deck
BuzzFeed Pitch Deck
 
[데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기
[데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기 [데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기
[데이터야놀자2107] 강남 출근길에 판교/정자역에 내릴 사람 예측하기
 
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deployment
 
데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)
데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)
데이터 분석에 필요한 기본 개념: 지표, Funnel 등 데이터를 이해하기 위한 멘탈 모델(Mental Model)
 
데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스
데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스
데이터 기반 성장을 위한 선결 조건: Product-Market Fit, Instrumentation, 그리고 프로세스
 
서비스 기획자의 데이터 분석
서비스 기획자의 데이터 분석서비스 기획자의 데이터 분석
서비스 기획자의 데이터 분석
 
['18여기컨] 스타트업 기획자의 월화수목금
['18여기컨] 스타트업 기획자의 월화수목금['18여기컨] 스타트업 기획자의 월화수목금
['18여기컨] 스타트업 기획자의 월화수목금
 
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
 

Viewers also liked

Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection철 김
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronArun Kejariwal
 
Anomaly Detection @Twitter
Anomaly Detection @TwitterAnomaly Detection @Twitter
Anomaly Detection @TwitterZhan Zhang
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail WhaleArun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionArun Kejariwal
 
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...Forum One
 
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...MRAMidAtlanticChapter
 
Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Simon James
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldArun Kejariwal
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp packageDr. Fiona McGroarty
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop QuantUniversity
 
Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016TOPdesk
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
 

Viewers also liked (20)

Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action Upon
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using Heron
 
Anomaly Detection @Twitter
Anomaly Detection @TwitterAnomaly Detection @Twitter
Anomaly Detection @Twitter
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
 
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
 
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
 
Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp package
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop
 
Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 

Similar to Statistical Learning Based Anomaly Detection @ Twitter

Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Monte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk AnalysisMonte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk AnalysisIntaver Insititute
 
Monte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisMonte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisIntaver Insititute
 
Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910John B. Cook, PE, CEO
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_JunMDO_Lab
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Spark Summit
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality Sathishkumar Samiappan
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simpleIvo Andreev
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...thanhdowork
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Integrated Carbon Observation System (ICOS)
 
Running windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette BergoRunning windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette BergoThoughtworks
 
impervious cover
impervious coverimpervious cover
impervious coverJames Yang
 
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...Marco Cattani
 
Weather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You ThinkWeather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You ThinkMETER Group, Inc. USA
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Shubham Gupta
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomaliesCSIRO
 
7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instruments7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instrumentsJawad Khan
 
autonomus Bike Progress
autonomus Bike Progressautonomus Bike Progress
autonomus Bike ProgressNadeem Qandeel
 

Similar to Statistical Learning Based Anomaly Detection @ Twitter (20)

Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Monte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk AnalysisMonte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk Analysis
 
Spc
SpcSpc
Spc
 
Monte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisMonte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk Analysis
 
Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_Jun
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality TAO Refresh - Automation of Data Spike Flagging Quality
TAO Refresh - Automation of Data Spike Flagging Quality
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...
 
Running windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette BergoRunning windmills with machine learning - Anette Bergo
Running windmills with machine learning - Anette Bergo
 
impervious cover
impervious coverimpervious cover
impervious cover
 
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
Lightweight Neighborhood Cardinality Estimation in Dynamic Wireless Networks ...
 
Weather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You ThinkWeather Data: Why Accuracy is More Complicated Than You Think
Weather Data: Why Accuracy is More Complicated Than You Think
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomalies
 
7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instruments7 8. emi - analog instruments and digital instruments
7 8. emi - analog instruments and digital instruments
 
autonomus Bike Progress
autonomus Bike Progressautonomus Bike Progress
autonomus Bike Progress
 

More from Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly DetectionArun Kejariwal
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal
 

More from Arun Kejariwal (13)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly Detection
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Statistical Learning Based Anomaly Detection @ Twitter

  • 1. Sta$s$cal Learning Based Anomaly Detec$on @ Twi9er Arun Kejariwal (@arun_kejariwal) Joint work with Jordan Hochenbaum and Owen Vallis November 2014
  • 2. Internet trends • Real-time [1] h9p://techcrunch.com/2014/05/05/amazon-­‐extends-­‐its-­‐shopping-­‐cart-­‐to-­‐twi9er/ AK 2 [1]
  • 3. Twi9er: Global Town Square AK 3
  • 4. Data Fidelity • Data-driven decision making q Evolving product landscape • Data partners q Nielsen q Dataminr • Operational q Performance and Availability AK 4
  • 5. Data Fidelity: Challenges • Anomalies q Exogenic factors § User behavior § Events § Data center q Endogenic factors § Agile development o Fail fast § Data collection • Millions of time series [1,2] q Scalability AK 5 [1] h9p://strata.oreilly.com/2013/09/how-­‐twi9er-­‐monitors-­‐millions-­‐of-­‐$me-­‐series.html [2] h9p://strataconf.com/strata2014/public/schedule/detail/32431
  • 6. Anomaly Detec$on: Why Bother? • Analyze User Engagement q Events § Super Bowl, Japanese New Year q Year over year analysis (input to forecasting) • Identify Attacks q DoS q Malware attacks • Identify Bots q Separating actual users from spam AK 6
  • 7. Anomaly Detec$on • Visual q Prone to errors q Not scalable § Machine generated data 11% of the digital universe in 2005 to > 40% by 2020 [1] § Cloud Infrastructure 2013-2017 CAGR ~50% [2] • Algorithmic approach q Automate! [1] h9p://www.emc.com/about/news/press/2012/20121211-­‐01.htm AK 7 [2] h9p://www.forbes.com/sites/gilpress/2013/12/12/16-­‐1-­‐billion-­‐big-­‐data-­‐market-­‐2014-­‐predic$ons-­‐from-­‐idc-­‐and-­‐iia/
  • 8. Anomaly Detec$on: Background • Over 50 years of research [1] q Statistics § Extreme Value Theory § Robust Statistics, Grubb’s Test, ESD q Econometrics q Finance § Value at Risk (VaR) q Signal Processing q Music Information Retrieval q Networking q E- Commerce q Performance Regression [1] “Anomaly Detec$on” by Chandola et al. ACM Compu$ng Surveys, 2009. AK 8 Jon from Etsy Toufic from Metafor
  • 9. Anomaly Detec$on: Overview • Definition q “An anomaly is an observation that deviates so much from other observations so as to arouse suspicions that it is was generated by a different mechanism” [1,2] [1] “Iden$fica$on of outliers” by Hawkins, Douglas M. London: Chapman and Hall, 1980. AK 9 [2] “Outlier Analysis” by Charu C. Aggarwal. Springer, 2013.
  • 10. Anomaly Detec$on • Characterization q Magnitude q Width q Frequency q Direction AK 10
  • 11. Anomaly Detec$on (contd.) • Two flavors q Global § Max Value q Local § Intra-day AK 11 Global Local
  • 12. Anomaly Detec$on (contd.) • Traditional Approaches q Metrics § Mean μ § Variance σ q Rule of thumb § μ + 3*σ q Which time series? § Raw § Moving Averages o SMA, EWMA, PEWMA AK 12 3 * σ
  • 13. Anomaly Detec$on (contd.) • Impact of multi-modal distribution q μ Shift ~ 0.2% q Inflates σ by 4.5% § Miss quite a few anomalies q What do multiple modes correspond to? § Seasonality AK 13
  • 14. • Robust Statistics q MAD § Robust Breakdown point o Median 50% vs. Mean 0% q σMAD § K = 1.4826 for normally distributed data AK 14 Anomaly Detec$on (contd.)
  • 15. • Limitations of using MAD AK 15 Anomaly Detec$on (contd.)
  • 16. • Grubb’s Test q Critical value is derived from data using a statistical confidence (α) • Limitations q Assumes data distribution is normal q Good for detecting ONLY 1 outlier q Seasonality unaware AK 16 Anomaly Detec$on (contd.)
  • 17. • ESD (Generalized Extreme Studentized Deviate) [1] q Critical value (λi) re-calculated every iteration q Largest i such that Ri > λi determines # of anomalies q An upper-bound on the number of anomalies is an input parameter • Limitations q Generalized ESD assumes a “normal” distribution q Seasonality unaware AK 17 Anomaly Detec$on (contd.) [1] Rosner, Bernard. “Percentage Points for a Generalized ESD Many-­‐outlier Procedure.” Technometrics 25, no. 2 (1983): 165–172.
  • 19. • Addressing Seasonality q Key Idea § Time Series Decomposition AK 19 Anomaly Detec$on (contd.)
  • 20. • Determining seasonal component q Regression on sub-cycle plots [1] AK 20 Anomaly Detec$on (contd.) [1] “STL: A seasonal-­‐trend decomposi$on procedure based on loess” by Cleveland, et al. Journal of Official Sta$s$cs, Vol. 6, Issue 1, 1990.
  • 21. • Impact of removal of seasonal and trend q Transforms our multi-modal data into unimodal data. § Amenable to ESD/MAD! AK 21 Anomaly Detec$on (contd.) The decomposed Residual becomes "Uni-modal". This significantly shrinks the value of sigma. The original "Multi-Modal" Raw Data has a much wider value for sigma, leading ESD to miss a lot of the outliers.
  • 22. Trend Smoothing Distortion Creates “Phantom” Anomalies • Challenges remain! AK 22 Anomaly Detec$on (contd.)
  • 23. • Marrying Robust Statistics with Seasonal Decomposition AK 23 Anomaly Detec$on (contd.) Median is Free from Distortion
  • 24. • Applying ESD on the Residual AK 24 Anomaly Detec$on (contd.) Decomposition Exposes Anomalies
  • 25. • Recap q Extract the seasonal component using STL § Filters out periodic spikes q Residual = Raw - Seasonalraw- Medianraw q Run ESD on residual (using median and MAD) AK 25 Anomaly Detec$on (contd.)
  • 26. • Illustrative example AK 26 Anomaly Detec$on (contd.)
  • 27. • Applications q Three perspectives § Capacity o CPU utilization o Garbage collection o Network activity § User behavior o Events • Impressions • Link clicks o Spam § Forecasting AK 27 Anomaly Detec$on (contd.)
  • 28. • Deployed in production q Used by large number of services at Twitter q Automatic e-mail notification § Only sent if anomalies are present § Anomalies annotated § CSV with anomaly locations attached AK 28 Anomaly Detec$on (contd.)
  • 29. • Skyline from Etsy q https://github.com/etsy/skyline/blob/master/src/analyzer/algorithms.py • Coming soon! q R package AK 29 Open Sourcing
  • 30. Join the Flock Like problem solving? Like challenges? Be at cukng Edge Make an impact • We are hiring!! q https://twitter.com/JoinTheFlock q https://twitter.com/jobs q Contact us: @arun_kejariwal AK 30