SlideShare ist ein Scribd-Unternehmen logo
1 von 26
From Rivulets to Rivers:
Elastic Stream Processing in Heron
Bill Graham , Twitter - @billgraham
Ashvin Agrawal, Microsoft
Avrilia Floratou, Microsoft
Prediction is very difficult, especially if it’s about the future.
- Nils Bohr
We cannot direct the wind, but we can adjust the sails.
- Dolly Parton
Outline
 Heron Overview
 Elastic Scaling Opportunities
 Current Implementation
 Work in Progress – Auto-scaling
Heron
A realtime, distributed, fault-tolerant stream processing engine.
About Heron
 Developed by Twitter in 2014
 Open sourced in May 2016
 Storm API compatible
 Isolation at all levels:
- Topology
- Container
- Task (process-based)
 At least once, at most once semantics
 Backpressure
 Low resource overhead (< 10%)
Logical Topology
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
Physical Execution
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
Packing Plan
 How to distribute instances onto containers?
 IPacking.pack()
Topology Submission
Container 2
Container 0
Stream
Manager
S1 S2 B3
B4 B5 B6
Topology
Master
Heron
Client
Heron
Scheduler
Container 3
Stream
Manager
S1 B2 B3
B4 B5 B6
Container 1
Stream
Manager
S1 S2 B3
B4 B5 B6
heron
submit
Packing
Plan
Instances Register
Stream Manager Registers
Data Flows
Containers Allocated
Processes Initialize
Data Rate Variations
Parallelism Challenges
 Anticipating component parallelism is difficult
 Changing parallelism is costly - O(hour)
- code change, review, merge, build, kill, submit
 Tuning for load spikes or valleys is manual - O(day)
 Under-provisioning leads to back pressure leads to support costs
 Over-provisioning is the norm
Over-provisioning
25%
40%
CPU Used
CPU Requested
Elastic Scaling Opportunity
 Reduce administration cost
 Reduce support cost
 Reduce hardware cost
 Provide better SLA
User Tasks Heron System Tasks
Ordinary Topology Management Process
Kill
Topology
Submit
Topology
Create
Packing
Acquire
Resource
s
Monitor /
Estimate
Build
State
Start
Topology
Install
Topology
Time Consuming Tasks
Releases
Resources
Low-cost Topology “update”
32 2 34 4
User Tasks Heron System Tasks
Optimized Topology Scale-up Process
Kill
Topology
Submit
Topology
Create
Packing
Acquire
Resources
Monitor /
Estimate
Build State
Start
Topology
Install
Topology
Update
Topology
Pause
Topology
Un-Pause
Topology
Add /
Reduce
Resources
Prepare
Component
s
heron “update” …
Minimizes Disruption
Aggressively Prunes
Containers
Aims to Maintain Uniform
Component Distribution
$ heron update my_cluster/user/dev MyTopology 
--component-parallelism=bolt1:20 
--component-parallelism=bolt2:40
Available in 0.14.5
Execution Time O(mins)
Customizable Through
IRepacking.repack()
Current Limitations
 Automated state transition not yet supported
- Component scaling event notification : IUpdatable.update()
- Example: KafkaSpout queue partition mappings
 Fields group routing might change
- Workaround: pause topology > cache flush interval before scaling
 Algorithmic Auto-Scaling
User Tasks Heron System TasksUser Tasks Heron System Tasks
Algorithmic Auto-Scaling …
Submit
Topology
Create
Packing
Acquire
Resources
Monitor /
Estimate
Build State
Start
Topology
Install
Topology
Update
Topology
Pause
Topology
Un-Pause
Topology
Add /
Reduce
Resources
Prepare
Component
s
Auto-Scaling
Heron should
automatically identify
variations in the
incoming load and
react to them.
Dhalion periodically
observes the state of the
topology and determines
whether resources
should be scaled up or
down.
Heron uses Dhalion to
adjust to external shocks.
Dhalion is a framework
that provides
self-regulating capabilities
to Heron and will be
open-sourced in the near
future.
Using Dhalion to Auto-Scale
Pending Packets
Detector
Backpressure
Detector
Processing Rate
Skew Detector
Resource
Underprovisioning
Diagnoser
Data Skew
Diagnoser
Slow Instances
Diagnoser
Resolver
Invocation
Diagnosis
Symptom Detection Diagnosis Generation
Bolt Scale
Down
Resolver
Bolt Scale
Up Resolver
Restart
Instances
Resolver
Resource
Overprovisioning
Diagnoser
Data Skew
Resolver
Symptoms
Resolution
Metrics
as needed
while still keeping the topology in a steady state where backpressure is not observed
Initial Results
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
NormalizedThroughput
Time (in minutes)
Spout Splitter Bolt Counter Bolt
Scale
Down
Scale Up
S1
S2
S3
Dhalion is able to adjust the
topology resources on-the-fly
when workload spikes occur.
Our policy eventually reaches a
healthy state where
backpressure is not observed
and the overall throughput is
maximized.
Future Plans
Use Dhalion to enforce
throughput and latency
SLOs
and to auto-tune
Heron topologies.
Open-source Dhalion
and the auto-scaling
policy as part of
Heron.
Combine scaling with
stateful stream
processing.
Get Involved
http://github.com/twitter/heron
http://heronstreaming.io
@heronstreaming
Up Next
Anomaly detection in real-time data streams using Heron
Arun Kejariwal, Machine Zone
Karthik Ramasamy, Twitter
Questions?

Weitere ähnliche Inhalte

Andere mochten auch

Images of ht 2
Images of ht 2Images of ht 2
Images of ht 2
rgrimsey
 

Andere mochten auch (19)

Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
Images of ht 2
Images of ht 2Images of ht 2
Images of ht 2
 
Twitter Heron. Evolution or Revolution
Twitter Heron. Evolution or Revolution Twitter Heron. Evolution or Revolution
Twitter Heron. Evolution or Revolution
 
Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...
Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...
Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...
 
YOGA asana by PARVESH KUMAR
YOGA asana by PARVESH KUMARYOGA asana by PARVESH KUMAR
YOGA asana by PARVESH KUMAR
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Pivotal CF and Continuous Delivery
Pivotal CF and Continuous DeliveryPivotal CF and Continuous Delivery
Pivotal CF and Continuous Delivery
 

Kürzlich hochgeladen

₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
Diya Sharma
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft DatingDubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 

From Rivulets to Rivers: Elastic Stream Processing in Heron

  • 1. From Rivulets to Rivers: Elastic Stream Processing in Heron Bill Graham , Twitter - @billgraham Ashvin Agrawal, Microsoft Avrilia Floratou, Microsoft
  • 2. Prediction is very difficult, especially if it’s about the future. - Nils Bohr We cannot direct the wind, but we can adjust the sails. - Dolly Parton
  • 3. Outline  Heron Overview  Elastic Scaling Opportunities  Current Implementation  Work in Progress – Auto-scaling
  • 4. Heron A realtime, distributed, fault-tolerant stream processing engine.
  • 5. About Heron  Developed by Twitter in 2014  Open sourced in May 2016  Storm API compatible  Isolation at all levels: - Topology - Container - Task (process-based)  At least once, at most once semantics  Backpressure  Low resource overhead (< 10%)
  • 6. Logical Topology Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 7. Physical Execution Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 8. Packing Plan  How to distribute instances onto containers?  IPacking.pack()
  • 9. Topology Submission Container 2 Container 0 Stream Manager S1 S2 B3 B4 B5 B6 Topology Master Heron Client Heron Scheduler Container 3 Stream Manager S1 B2 B3 B4 B5 B6 Container 1 Stream Manager S1 S2 B3 B4 B5 B6 heron submit Packing Plan Instances Register Stream Manager Registers Data Flows Containers Allocated Processes Initialize
  • 11. Parallelism Challenges  Anticipating component parallelism is difficult  Changing parallelism is costly - O(hour) - code change, review, merge, build, kill, submit  Tuning for load spikes or valleys is manual - O(day)  Under-provisioning leads to back pressure leads to support costs  Over-provisioning is the norm
  • 13. Elastic Scaling Opportunity  Reduce administration cost  Reduce support cost  Reduce hardware cost  Provide better SLA
  • 14. User Tasks Heron System Tasks Ordinary Topology Management Process Kill Topology Submit Topology Create Packing Acquire Resource s Monitor / Estimate Build State Start Topology Install Topology Time Consuming Tasks Releases Resources
  • 16. User Tasks Heron System Tasks Optimized Topology Scale-up Process Kill Topology Submit Topology Create Packing Acquire Resources Monitor / Estimate Build State Start Topology Install Topology Update Topology Pause Topology Un-Pause Topology Add / Reduce Resources Prepare Component s
  • 17. heron “update” … Minimizes Disruption Aggressively Prunes Containers Aims to Maintain Uniform Component Distribution $ heron update my_cluster/user/dev MyTopology --component-parallelism=bolt1:20 --component-parallelism=bolt2:40 Available in 0.14.5 Execution Time O(mins) Customizable Through IRepacking.repack()
  • 18. Current Limitations  Automated state transition not yet supported - Component scaling event notification : IUpdatable.update() - Example: KafkaSpout queue partition mappings  Fields group routing might change - Workaround: pause topology > cache flush interval before scaling  Algorithmic Auto-Scaling
  • 19. User Tasks Heron System TasksUser Tasks Heron System Tasks Algorithmic Auto-Scaling … Submit Topology Create Packing Acquire Resources Monitor / Estimate Build State Start Topology Install Topology Update Topology Pause Topology Un-Pause Topology Add / Reduce Resources Prepare Component s
  • 20. Auto-Scaling Heron should automatically identify variations in the incoming load and react to them. Dhalion periodically observes the state of the topology and determines whether resources should be scaled up or down. Heron uses Dhalion to adjust to external shocks. Dhalion is a framework that provides self-regulating capabilities to Heron and will be open-sourced in the near future.
  • 21. Using Dhalion to Auto-Scale Pending Packets Detector Backpressure Detector Processing Rate Skew Detector Resource Underprovisioning Diagnoser Data Skew Diagnoser Slow Instances Diagnoser Resolver Invocation Diagnosis Symptom Detection Diagnosis Generation Bolt Scale Down Resolver Bolt Scale Up Resolver Restart Instances Resolver Resource Overprovisioning Diagnoser Data Skew Resolver Symptoms Resolution Metrics as needed while still keeping the topology in a steady state where backpressure is not observed
  • 22. Initial Results 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 NormalizedThroughput Time (in minutes) Spout Splitter Bolt Counter Bolt Scale Down Scale Up S1 S2 S3 Dhalion is able to adjust the topology resources on-the-fly when workload spikes occur. Our policy eventually reaches a healthy state where backpressure is not observed and the overall throughput is maximized.
  • 23. Future Plans Use Dhalion to enforce throughput and latency SLOs and to auto-tune Heron topologies. Open-source Dhalion and the auto-scaling policy as part of Heron. Combine scaling with stateful stream processing.
  • 25. Up Next Anomaly detection in real-time data streams using Heron Arun Kejariwal, Machine Zone Karthik Ramasamy, Twitter

Hinweis der Redaktion

  1. Modifying an existing packing plan can be more complex than creating one from scratch