SlideShare a Scribd company logo
1 of 24
Kappa Architecture In The
Telco Industry
Ignacio Mulas Viela
Nicolas Seyvet
Ericsson Internal | 2011-10-19 | Page 4
Once Upon A Time…
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
“I want an advanced real-time analytics system to
monitor my cloud infrastructure.”
… By your most precious client
Ericsson Internal | 2011-10-19 | Page 5
› Data source
– Events (metrics, logs) from physical and virtual servers
› Analytics:
– Real-time
– Statistical analysis
– Anomaly or novelty detection
High Level View
…
Flink-forward | Ignacio Mulas | 12-October-2015
Data source
Analytics
Ericsson Internal | 2011-10-19 | Page 6
› Bounded  A start and an end
Finite, ingestion stops
› Unbounded  A start but no end
Infinite, ever-growing
Data Set
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
t3 t2 t1 t0…
tn
t3 t2 t1 t0…t∞
Unbounded
Bounded
Ericsson Internal | 2011-10-19 | Page 7
› Twitter’s Nathan Mars
› But
– Two independent pipelines
– Complex maintenance
– Complex merge
Lambda Architecture
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 8Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 9
Kappa Architecture
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 10
› New model to abstract data processing
– Millwheel, Spark Streaming, Dataflow, Stratosphere (Flink)
› Stream engines
› Correctness
- Strong consistency
- Exactly-once-processing
› Resilience, fault tolerance
› Tools that can deal with time *
› APIs
The (Short) Evolution
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 11
Principles
Kappa Architecture
Everything is a
stream
Immutable data
sources
Single analytics
framework
Stream replay
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 12
› Stream representation
– Unbounded dataset composed by a sequence of events
› Data pipeline:
– Sequence of transformations on an unbounded data set that generates another
set with more insightful data
– UNIX pipes
Basics
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
…
Pub/Sub
Ericsson Internal | 2011-10-19 | Page 13
Our Stack
Kafka Elastic
Search
Kibana
Flink
Analytics job 1
Analytics job 2
…
raw data
results job 1
…
…
Data sources
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
…
Ericsson Internal | 2011-10-19 | Page 14
First Data Pipeline
Raw
data
Statistical
analysis DashboardEnriched data
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 15
› Event time, which is when an event occurred
› Processing time, which is when an event is observed in the system
Time
Event time
Processingtime
reality
skew
Time drifts
Unordered events
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 16
Event Time
e0e1e2e3
…
t0t1t2t3
<tp0,e0><tp1,e1><tp2,e2><tp3,e3>
<te0,e0><te1,e1><te2,e2><te3,e3>
EventTimeExstractor()
enableTimestamps()
<te0,e0><te1,e1><te2,e2><te3,e3>
w2 w1 w0
window()
Flink-forward | Ignacio Mulas | 12-October-2015
Execution time
Te0
+ window
+ watermark
Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
e: event
tp: processing time
te: event time
Ericsson Internal | 2011-10-19 | Page 17
2nd Client meeting…
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
“I want an advanced real-time analytics system to
monitor my cloud infrastructure.”
… By your most precious client
It is nice, but… I cannot look at thousands of
numbers simultaneously, can you do better?
Ericsson Internal | 2011-10-19 | Page 18
› Machine learning
– Automatically detect anomalies in the infrastructure
– Learn using raw and advanced metrics
› … add a new transformation to my unbounded data!
Advanced Data Pipeline
…
Stats
ML
analyticsData source
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 19
› Unsupervised machine learning
› Create a statistical model for “normal” behavior
– Poisson: count-based parameters
– Gaussian: value-based parameters
› Model adapts over time
Bayesian Detector
OK ANOMALYANOMALY
Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Ericsson Internal | 2011-10-19 | Page 20
Log-Frequency Novelty Detector
…
…
Frequency_i+1
Frequency_2
Frequency_n
Phase 1: LEARN!
Phase 2: DETECT!
…
Frequency_1
OK
NOK
Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Time
window
Events
…
…
…
…
History
Ericsson Internal | 2011-10-19 | Page 21
Multi-Variable Detector
t0t0t0 hk
hi
hm
… if…
.keyBy(host)
-slave
Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
…
…
…
Ericsson Internal | 2011-10-19 | Page 22
Improved Data Pipeline
Raw
data
Bayesian
novelty
detector
Dashboard
Anomalies
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Statistical
analysis
Enriched dataRaw data
Ericsson Internal | 2011-10-19 | Page 23
3rd Client meeting
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
“I want an advanced real-time analytics system to
monitor my cloud infrastructure.”
… By your most precious client
Great! I can now spot when and where changes
occur … I´ll buy it! 
Ericsson Internal | 2011-10-19 | Page 24
› Tools, abstractions and APIs unifying stream/batch
› Consistency, resiliency, fault-tolerance
› Event time handling
› Kappa architecture simplifies Big Data
– One stack, many pipelines (batch/stream)
– Flexible/extensible architecture
› Machine learning can be applied on unbounded data sets
– Treated as a complex transformation
– Some caveats
Summary
Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
Stream
Batch
Καρρα
Please, feel free to contact us if you have
suggestions/comments/questions
ignacio.mulas.viela@ericsson.com / @ immulvi
nicolas.seyvet@ericsson.com / @NicolasSeyvet
Thank you!
Kappa architecture in the telecom industry

More Related Content

Similar to Kappa architecture in the telecom industry

Launching digit yser a first 18 months review
Launching digit yser a first 18 months reviewLaunching digit yser a first 18 months review
Launching digit yser a first 18 months review
DigitYser
 
WP8 Dissemination and Exploitation
WP8 Dissemination and ExploitationWP8 Dissemination and Exploitation
WP8 Dissemination and Exploitation
INSEMTIVES project
 

Similar to Kappa architecture in the telecom industry (20)

Agile standards - wetransform pitch deck for eit idea challenge 2015
Agile standards - wetransform pitch deck for eit idea challenge 2015Agile standards - wetransform pitch deck for eit idea challenge 2015
Agile standards - wetransform pitch deck for eit idea challenge 2015
 
Horizon 2020 - Calendar calls 2016
Horizon 2020 - Calendar calls 2016Horizon 2020 - Calendar calls 2016
Horizon 2020 - Calendar calls 2016
 
How Cineca supports Higher Education IT_ EUNIS-CZ Annual Congress_May 2015
How Cineca supports Higher Education IT_ EUNIS-CZ Annual Congress_May 2015How Cineca supports Higher Education IT_ EUNIS-CZ Annual Congress_May 2015
How Cineca supports Higher Education IT_ EUNIS-CZ Annual Congress_May 2015
 
How to become the best datascientist in Europe
How to become the best datascientist in EuropeHow to become the best datascientist in Europe
How to become the best datascientist in Europe
 
[LAB] Attilio Broglio - Be aware!!! Build a Context Aware Application using F...
[LAB] Attilio Broglio - Be aware!!! Build a Context Aware Application using F...[LAB] Attilio Broglio - Be aware!!! Build a Context Aware Application using F...
[LAB] Attilio Broglio - Be aware!!! Build a Context Aware Application using F...
 
Labfiles: NetFutures
Labfiles: NetFuturesLabfiles: NetFutures
Labfiles: NetFutures
 
Challenges & Applications in the Industrial Internet of Things (IoT)
Challenges & Applications in the Industrial Internet of Things (IoT)Challenges & Applications in the Industrial Internet of Things (IoT)
Challenges & Applications in the Industrial Internet of Things (IoT)
 
FIRE at the ICT2015
FIRE at the ICT2015FIRE at the ICT2015
FIRE at the ICT2015
 
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEsFIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
 
Launching digit yser a first 18 months review
Launching digit yser a first 18 months reviewLaunching digit yser a first 18 months review
Launching digit yser a first 18 months review
 
Hurricane Electric : IXPs, Global Networking and Partnership Opportunities: S...
Hurricane Electric : IXPs, Global Networking and Partnership Opportunities: S...Hurricane Electric : IXPs, Global Networking and Partnership Opportunities: S...
Hurricane Electric : IXPs, Global Networking and Partnership Opportunities: S...
 
Challenges & Application In Industrial IoT by Sachin Pukale, machinepulse
Challenges & Application In Industrial IoT by Sachin Pukale, machinepulseChallenges & Application In Industrial IoT by Sachin Pukale, machinepulse
Challenges & Application In Industrial IoT by Sachin Pukale, machinepulse
 
One Common Search Service – Niclas Lillman and Nicklas Eriksson, Scania
One Common Search Service – Niclas Lillman and Nicklas Eriksson, ScaniaOne Common Search Service – Niclas Lillman and Nicklas Eriksson, Scania
One Common Search Service – Niclas Lillman and Nicklas Eriksson, Scania
 
Finnish technology industry, March 2017
Finnish technology industry, March 2017Finnish technology industry, March 2017
Finnish technology industry, March 2017
 
RIPE Atlas - A Measurement Network
RIPE Atlas - A Measurement NetworkRIPE Atlas - A Measurement Network
RIPE Atlas - A Measurement Network
 
Thiyagu Palanisamy - Designing Microservices based systems | Codemotion Milan...
Thiyagu Palanisamy - Designing Microservices based systems | Codemotion Milan...Thiyagu Palanisamy - Designing Microservices based systems | Codemotion Milan...
Thiyagu Palanisamy - Designing Microservices based systems | Codemotion Milan...
 
IEEE ICC15 // Featured Tutorials on Emerging Technologies
IEEE ICC15 // Featured Tutorials on Emerging TechnologiesIEEE ICC15 // Featured Tutorials on Emerging Technologies
IEEE ICC15 // Featured Tutorials on Emerging Technologies
 
Remote facility monitoring and control using IoT
Remote facility monitoring and control using IoTRemote facility monitoring and control using IoT
Remote facility monitoring and control using IoT
 
IPv6 Observatory outomes
IPv6 Observatory outomesIPv6 Observatory outomes
IPv6 Observatory outomes
 
WP8 Dissemination and Exploitation
WP8 Dissemination and ExploitationWP8 Dissemination and Exploitation
WP8 Dissemination and Exploitation
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Kappa architecture in the telecom industry

  • 1. Kappa Architecture In The Telco Industry Ignacio Mulas Viela Nicolas Seyvet
  • 2. Ericsson Internal | 2011-10-19 | Page 4 Once Upon A Time… Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 “I want an advanced real-time analytics system to monitor my cloud infrastructure.” … By your most precious client
  • 3. Ericsson Internal | 2011-10-19 | Page 5 › Data source – Events (metrics, logs) from physical and virtual servers › Analytics: – Real-time – Statistical analysis – Anomaly or novelty detection High Level View … Flink-forward | Ignacio Mulas | 12-October-2015 Data source Analytics
  • 4. Ericsson Internal | 2011-10-19 | Page 6 › Bounded  A start and an end Finite, ingestion stops › Unbounded  A start but no end Infinite, ever-growing Data Set Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 t3 t2 t1 t0… tn t3 t2 t1 t0…t∞ Unbounded Bounded
  • 5. Ericsson Internal | 2011-10-19 | Page 7 › Twitter’s Nathan Mars › But – Two independent pipelines – Complex maintenance – Complex merge Lambda Architecture Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 6. Ericsson Internal | 2011-10-19 | Page 8Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 7. Ericsson Internal | 2011-10-19 | Page 9 Kappa Architecture Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 8. Ericsson Internal | 2011-10-19 | Page 10 › New model to abstract data processing – Millwheel, Spark Streaming, Dataflow, Stratosphere (Flink) › Stream engines › Correctness - Strong consistency - Exactly-once-processing › Resilience, fault tolerance › Tools that can deal with time * › APIs The (Short) Evolution Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 9. Ericsson Internal | 2011-10-19 | Page 11 Principles Kappa Architecture Everything is a stream Immutable data sources Single analytics framework Stream replay Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 10. Ericsson Internal | 2011-10-19 | Page 12 › Stream representation – Unbounded dataset composed by a sequence of events › Data pipeline: – Sequence of transformations on an unbounded data set that generates another set with more insightful data – UNIX pipes Basics Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 … Pub/Sub
  • 11. Ericsson Internal | 2011-10-19 | Page 13 Our Stack Kafka Elastic Search Kibana Flink Analytics job 1 Analytics job 2 … raw data results job 1 … … Data sources Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 …
  • 12. Ericsson Internal | 2011-10-19 | Page 14 First Data Pipeline Raw data Statistical analysis DashboardEnriched data Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 13. Ericsson Internal | 2011-10-19 | Page 15 › Event time, which is when an event occurred › Processing time, which is when an event is observed in the system Time Event time Processingtime reality skew Time drifts Unordered events Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 14. Ericsson Internal | 2011-10-19 | Page 16 Event Time e0e1e2e3 … t0t1t2t3 <tp0,e0><tp1,e1><tp2,e2><tp3,e3> <te0,e0><te1,e1><te2,e2><te3,e3> EventTimeExstractor() enableTimestamps() <te0,e0><te1,e1><te2,e2><te3,e3> w2 w1 w0 window() Flink-forward | Ignacio Mulas | 12-October-2015 Execution time Te0 + window + watermark Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 e: event tp: processing time te: event time
  • 15. Ericsson Internal | 2011-10-19 | Page 17 2nd Client meeting… Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 “I want an advanced real-time analytics system to monitor my cloud infrastructure.” … By your most precious client It is nice, but… I cannot look at thousands of numbers simultaneously, can you do better?
  • 16. Ericsson Internal | 2011-10-19 | Page 18 › Machine learning – Automatically detect anomalies in the infrastructure – Learn using raw and advanced metrics › … add a new transformation to my unbounded data! Advanced Data Pipeline … Stats ML analyticsData source Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 17. Ericsson Internal | 2011-10-19 | Page 19 › Unsupervised machine learning › Create a statistical model for “normal” behavior – Poisson: count-based parameters – Gaussian: value-based parameters › Model adapts over time Bayesian Detector OK ANOMALYANOMALY Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016
  • 18. Ericsson Internal | 2011-10-19 | Page 20 Log-Frequency Novelty Detector … … Frequency_i+1 Frequency_2 Frequency_n Phase 1: LEARN! Phase 2: DETECT! … Frequency_1 OK NOK Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 Time window Events … … … … History
  • 19. Ericsson Internal | 2011-10-19 | Page 21 Multi-Variable Detector t0t0t0 hk hi hm … if… .keyBy(host) -slave Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 … … …
  • 20. Ericsson Internal | 2011-10-19 | Page 22 Improved Data Pipeline Raw data Bayesian novelty detector Dashboard Anomalies Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 Statistical analysis Enriched dataRaw data
  • 21. Ericsson Internal | 2011-10-19 | Page 23 3rd Client meeting Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 “I want an advanced real-time analytics system to monitor my cloud infrastructure.” … By your most precious client Great! I can now spot when and where changes occur … I´ll buy it! 
  • 22. Ericsson Internal | 2011-10-19 | Page 24 › Tools, abstractions and APIs unifying stream/batch › Consistency, resiliency, fault-tolerance › Event time handling › Kappa architecture simplifies Big Data – One stack, many pipelines (batch/stream) – Flexible/extensible architecture › Machine learning can be applied on unbounded data sets – Treated as a complex transformation – Some caveats Summary Flink-forward | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 12-October-2015Flink Meetup | Ignacio Mulas | 26-November-2015Strata London | Ignacio Mulas & Nicolas Seyvet | 3 – June – 2016 Stream Batch Καρρα
  • 23. Please, feel free to contact us if you have suggestions/comments/questions ignacio.mulas.viela@ericsson.com / @ immulvi nicolas.seyvet@ericsson.com / @NicolasSeyvet Thank you!

Editor's Notes

  1. 2015-09-30
  2. Ericsson is a Telecommunication equipment supplier. Ever growing volumes of data, shorter time constraints and increasing needs for accuracy are defining the new analytics environment. In the telecom industry, traditional user and network data co-exists with machine-to-machine (M2M) traffic, media data, social activities, etc. In terms of volumes, this can be referred to as an “explosion” of data. This is a great business opportunity for Telco operators and a key angle to take full advantage of current infrastructure investments (4G, LTE). Add some animations with trucks and sensors, etc. Ericsson is moving to cloud and running Virtual Network function which are basically cloud based telco applications for the core network.
  3. There are OSS and monitoring systems on the market but how could we do this better? Build a data pipeline to stream events and perfrom realtime analytics in order to eventually do some machine learning.
  4. The story is that,in general there are two kinds of data sets. Either it is a bunch of data, ie there is a beginning and an end to it. OR it is infinite. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101, The term “streaming” is used today to mean a variety of different things (and for simplicity, I’ve been using it somewhat loosely up until now), which can lead to misunderstandings about what streaming really is, or what streaming systems are actually capable of. As such, I would prefer to define the term somewhat precisely. The crux of the problem is that many things that ought to be described by what they are (e.g., unbounded data processing, approximate results, etc.), have come to be described colloquially by how they historically have been accomplished (i.e., via streaming execution engines). This lack of precision in terminology clouds what streaming really means, and in some cases, burdens streaming systems themselves with the implication that their capabilities are limited to characteristics frequently described as “streaming,” such as approximate or speculative results. Given that well-designed streaming systems are just as capable (technically mo The principles: Bounding unbounded data with windows We use the term unbounded data for an infinite, ever-growing data stream, and the term bounded data for a data stream that happens to have a beginning and an end (data ingestion stops after a while). It is clear that the notion of an unbounded data stream includes (is a superset of) the notion of a bounded data set: Streaming applications create bounded data from unbounded data using windows, i.e., creating bounds using some characteristic of the data, most prominently based on timestamps of events. For example, one can choose to create a window of all records that belong to the same session (a session being defined of a period of activity followed by a period of inactivity). The simplest form of a window is (when we know that the input is bounded), to include all the data in one window. Letís call this a ìglobal windowî. This way, we have created a streaming program that does ìbatch processingî:
  5. Early streaming systems suffered from efficient problems (record-by-record event processing and ACK) This led to belief that a streaming layer can only complement a batch systems, or that hybrids of streamign and batching (mcroi-batching) are required for efficiency. Lambda advocates using a batch system for handling the heavy lifting, augmenting it with a streaming system that catches up with data ingestion producing early but incomplete results (approximate) results. Then logic tries to merge the produced results for serving. low-latency, inaccurate results (either because of the use of an approximation algorithm, or because the streaming system itself does not provide correctness), and some time later a batch system rolls along and provides you with correct output Well known. Merge -> synchonization pbs. The Lambda architecture had well-known disadvantages, in particular that the merging process was often painful, as was the fact that two separate codebases that express the same logic need to be maintained.
  6. Later, Jay Kreps advocated that only one system, the stream processor, should be used for the entirety of data transformations, drastically simplifying the whole architecture: https://www.oreilly.com/ideas/questioning-the-lambda-architecture Given that well-designed streaming systems are just as capable (technically more so) of producing correct, consistent, repeatable results as any existing batch engine, I prefer to isolate the term streaming to a very specific meaning: a type of data processing engine that is designed with infinite data sets in mind.
  7. Early streaming systems suffered from efficiency problems due to design choices that sacrificed throughput, in particular, record-by-record event processing and acknowledgement Spark lineage in batch vs check-pointing. Something that is easy to do with batch is much harder with streams. Tools for reasoning about time Mostly correct is not good enough. Required for exact once processing which is required for repeatable results cannot replace batch otherwise. Correctness: This gets you parity with batch. http://data-artisans.com/why-apache-beam/ Taking a cue from this foundational work, we rewrote Flink s DataStream API in Flink 0.10 to incorporate many of the concepts described in the Dataflow paper, moving away from the old Flink 0.9 DataStream API. We retained this API with Flink 1.0 and made it stable and backwards compatible. As you can see from these tables, Flink is the runner which currently fulfills those requirements. With Flink, Beam becomes a truly compelling platform for the industry.
  8. Lambda -> batch + processing Kappa -> everything is a stream Immutable data sources -> determinsitic results and possibility to generate different views. (see martin kleppman). You store the logs so u store events and the seq of things. Whenever pipeline evolves, you can replay the sseq of events to restore the state of the computations and restart. Single analytics you have transformations and operators to perform analytics. Add some comtent
  9. Make it better, change image Can take a subset. The stream is main abstraction towards the dataset. List od´f ordered eevent, it is a single opbjevt with a representatomn and a set of operators.
  10. Put distributed data source with logstah
  11. Frequency, gradients, median, std dev, To do this we needed to take in consideration aspects like time. Since this a unbounded data set.
  12. How to deal with time? Batch slices datasets into bounded dat sets then computes. But how to deal with late events, events that might never arrive, latency in the network leading to a distortion between expected time and actual arrival time? New data will arrive, old data may be retracted or updated. any system hould be able to cope with these facts on its own, with completeness being a convenient optimization rather than a semantic necessity. Varying event time skew, meaning it is not possible to expect most of the data for a given event time X within some constant e of time Y (Y= X + e).
  13. Single variable.