Enterprises are increasingly building applications to take advantage of real-time workloads, and Kubernetes is fast becoming the de-facto scheduler and orchestrating system for distributed systems. In this session, Karthik Ramasamy will show how easy it is to deploy Apache Pulsar for queuing and streaming and Twitter Heron for stream processing using Kubernetes and show how end-to-end data pipelines are built. Pulsar and Heron are at the core of Streamlio's end-to-end real-time platform, which is ideally suited to enterprises building next generation real time pipelines
3. 3
Increasingly Connected World
Internet of Things
30 B connected devices by 2020
Health Care
153 Exabytes (2013) -> 2314 Exabytes (2020)
Machine Data
40% of digital universe by 2020
Connected Vehicles
Data transferred per vehicle per month
4 MB -> 5 GB
Digital Assistants (Predictive Analytics)
$2B (2012) -> $6.5B (2019) [1]
Siri/Cortana/Google Now
Augmented/Virtual Reality
$150B by 2020 [2]
Oculus/HoloLens/Magic Leap
Ñ
!+
>
5. 5
The New Era: Streaming Data/Fast Data
Events are analyzed and processed in real-me as they arrive
Decisions are mely, contextual and based on fresh data
Decision latency is eliminated
Data in moon
Modern Data Pipelines Streaming Data Pipelines
22. 22
Broker Failure Recovery
• Topic is reassigned to
an available broker
based on load
• Can reconstruct the
previous state
consistently
• No data needs to be
copied
• Failover handled
transparently by client
library
23. 23
Bookie Failure Recovery
• After a write failure,
BookKeeper will
immediately switch write to
a new bookie, within the
same segment.
24. 24
Bookie Failure Recovery
• In background, starts a
many-to-many recovery
process to regain the
configured replication
factor
25. 25
Pulsar Architecture
Geo Replica=on
GEO REPLICATION
Asynchronous replication
Integrated in the broker message flow
Simple configuration to add/remove regions
Topic (T1) Topic (T1)
Topic (T1)
Subscripon
(S1)
Subscripon
(S1)
Producer
(P1)
Consumer
(C1)
Producer
(P3)
Producer
(P2)
Consumer
(C2)
Data Center A Data Center B
Data Center C
30. 30
Writing Heron Topologies
Procedural - Low Level API
Directly write your spouts and bolts
Functional - Mid Level API
Use of maps, flat maps, transform, windows
Declarative - SQL (in the works)
Use of declarave language - specify what you
want, system will figure it out.
,
%
31. 31
Topology Execution
Topology Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
DATA CONTAINER DATA CONTAINER
Metrics
Manager
Metrics
Manager
Metrics
Manager
Health
Manager
MASTER
CONTAINER
35. 35
Heron Happy Facts :)
! No more pages during midnight for Heron team
" Very rare incidents for Heron customer teams
# Easy to debug during incident for quick turn around
$ Reduced resource ulizaon saving cost
39. 39
Apache BookKeeper - Stream Storage
A storage for log streams
Replicated, durable storage of log streams
Provide fast tailing/streaming facility
Opmized for immutable data
Low-latency durability
Simple repeatable read consistency
High write and read availability