Streamlio's Karthik Ramasamy takes a look how the Apache Heron streaming platform uses built-in intelligence to automatically regulate data flow and ensure resiliency.
2. 2
What is self regulating?
Self regula*ng a real *me system refers to its ability to
adapt itself as their environmental condi*ons change without
constant ‘hands-on’ control by a human operator and con*nue to produce results
3. 3
Why?
G
Impact of downtime
during popular events
such as Super Bowl
Oscars, etc
Ü
Impact of not honoring
an SLA leading to
penalty payments
!
Engineers & SRE burn
out attending to
incidents
increased productivityloss of revenue sla violations quality of life
With reduced incidents,
engineers can focus on
actual development
s
9. 9
Heron Groupings
01 02 03 04
Shuffle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,
10. 10
Writing Heron Topologies
Procedural - Low Level API
Directly write your spouts and bolts
Functional - Mid Level API
Use of maps, flat maps, transform, windows
Declarative - SQL (coming)
Use of declara*ve language - specify what you
want, system will figure it out.
,
%
30. 30
Auto Piloting Heron
Maintenance of SLOs in the face of
unpredictable load variations and hardware
or software performance degradation
Manual, time-consuming and error-prone
task of tuning various systems knobs to
achieve SLOs
Auto Piloting Streaming Systems
31. 31
Auto Piloting Streaming Systems
Self tuning Self stabilizing Self healing
G !g
Several tuning knobs
Time consuming tuning phase
The system should take
as input an SLO and
automatically configure
the knobs.
The system should
react to external shocks
a n d a u t o m a t i c a l l y
reconfigure itself
Stream jobs are long running
Load variations are common
The system should
identify internal faults
and attempt to recover
from them
System performance affected
by hardware or software
delivering degraded quality
of service
32. 32
Enter Dhalion
Dhalion periodically executes
well-specified policies that
optimize execution based on
some objective.
We created policies that
dynamically provision resources
in the presence of load variations
and auto-tune streaming
applications so that a throughput
SLO is met.
Dhalion is a policy based
framework integrated into Heron
33. Symptom
Detector 1
Symptom
Detector 2
Symptom
Detector 3
Symptom
Detector N
....
Diagnoser 1
Diagnoser 2
Diagnoser M
....
Resolver
Invocation
D
iagnosis
1
Diagnosis 2
D
iagnosis
M
Symptom 1
Symptom 2
Symptom 3
Symptom N
Symptom
Detection
Diagnosis
Generation
Resolution
Resolver 1
Resolver 2
Resolver M
....
Resolver
Selection
Metrics
Dhalion Policy Phases
34. 34
Incorporating Dhalion into Heron
S1 B2
B3
Stream
Manager
Stream
Manager
S1 B2
B3 B4
B4
Topology
Master
Health
Manager
Metrics
Manager
Metrics
Manager
Action
Log
Action
Blacklist
The Health Manager periodically
executes Dhalion policies that
maintain the health of the topology.
The Action Log maintains a list of
actions taken by the policy and the
corresponding diagnosis.
The Action Blacklist contains a list
of diagnosis descriptions and
corresponding actions taken that
did not produce the expected
outcome.
41. Experimental Setup
% %
Spout Splitter Bolt Counter Bolt
Shuffle Grouping Fields Grouping
Microsoe HDInsight
Intel Xeon ES-2673 CPU@2.40 GHz
28 GB of Memory
Throughput of Spouts (No. Of
tuples emined over 1 min)
Throughput of Bolts (No. of tuples
emined over 1 min)
Number of Heron Instances
provisioned
Hardware and Soeware Configura*on Evalua*on Metrics
42. Dynamic Resource Provisioning
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
0 10 20 30 40 50 60 70 80 90 100 110 120
Normalized Throughput
Time (in minutes)
Spout Splitter Bolt Counter Bolt
Scale
Down
Scale Up
S1
S2
S3
The Dynamic Resource
Provisioning Policy is able to
adjust the topology
resources on-the-fly when
workload spikes occur.
The policy can correctly detect
and resolve bottlenecks even
on multi-stage topologies
where backpressure is
gradually propagated from one
stage of the topology to
another.
43. Dynamic Resource Provisioning
0
5
10
15
0 20 40 60 80 100 120
Number of Bolts
Time (in minutes)
Splitter Bolt Counter Bolt
Heron Instances are
gradually scaled up and
down according to the input
load