Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data"

Stream Processing
Revolutionizing Big Data
Srikanth Satya
April 2018

pravega.io
Data-Intensive Apps Need Disruptive Technologies
The Unbundled Database vision sounds awesome!
 Loosely coupled data derivations and transformations
 Update derived state by observing data changes
 Observe changes in derived state – all the way to the edge
 Integrity and correctness: end-to-end IDs, idempotence, data
consistency and exactly once semantics
BUT realizing it requires disruptive systems capabilities
 Shared, durable, consistent, unbound distributed log storage
 Ability to dynamically scale both the log(s) and downstream
processors in coordination with data arrival volume
 Ability to deliver timely and accurate results processing the log
continuously even with late arriving or out of order data

pravega.io
The Unbundled Database vision sounds awesome!
 Loosely coupled data derivations and transformations
 Update derived state by observing data changes
 Observe changes in derived state – all the way to the edge
 Integrity and correctness: end-to-end IDs, idempotence, data
consistency and exactly once semantics
BUT realizing it requires disruptive systems capabilities
 Shared, durable, consistent, unbound distributed log storage
 Ability to dynamically scale both the log(s) and downstream
processors in coordination with data arrival volume
 Ability to deliver timely and accurate results processing the log
continuously even with late arriving or out of order data
Data-Intensive Apps Are Disruptive
We passionately believe in these principles.
As the industry leaders in storage, we’re
developing a new, open storage primitive
enabling all of us to realize the full potential of
this powerful vision.

pravega.io
Introducing Pravega Stream Storage

pravega.io
Introducing Pravega Stream Storage
A new storage abstraction – a stream – for continuous and infinite data
 Named, durable, append-only, infinite sequence of bytes
 With low-latency appends to and reads from the tail of the sequence
 With high-throughput reads for older portions of the sequence
Coordinated scaling of stream storage and stream processing
 Stream writes partitioned by app-defined routing key
 Stream reads independently and automatically partitioned by arrival rate SLO
 Scaling protocol to allow stream processors to scale in lockstep with storage
Enabling system-wide exactly once processing across multiple apps
 Streams are ordered and strongly consistent
 Chain independent streaming apps via streams
 Stream transactions integrate with checkpoint schemes such as the one used in Flink

pravega.io
Revisiting the Disruptive Capabilities
Required Systems Capabilities
 Shared, durable, consistent,
unbound distributed log storage
 Dynamically scale logs in
coordination with downstream
processors
 Deliver accurate results processing
continuously even with late arriving
or out of order data
Enabling Pravega Features
 Durable, append-only byte streams
 Consistent tail and replay reads
 Unlimited retention, storage efficiency
 Auto-scaling
 Independently scale readers/writers
 Transactions and exactly once
 Event time and processing time

pravega.io
The Streaming Revolution
Enabling continuous pipelines w/ consistent replay, composability, elasticity, exactly once
Ingest Buffer
& Pub/Sub
Streaming
Search
Streaming
Analytics
Cloud-Scale Storage
Pravega Stream Store
State
Synchronizer

pravega.io
Pravega for Ingest Buffer and Pub/Sub
Ingest Buffer, Distributed Ledger or Messaging
using Pravega Event Client
Stream
01110110
01101001
Consumer
s
Reader
Groups
Consumer
s
Writers

pravega.io
Pravega for Application State Synchronization
Distributed State via State Synchronizer Client
“Shared State” Stream
01110110
01101001
App Process #1
State Synchronizer
Stream Client
App Process #n
State Synchronizer
Stream Client
• Shared Properties
• Shared scalar data
• Shared K/V data

pravega.io
Pravega + Flink = Pure Streaming End-to-End
Dynamically Scale Storage + Compute Based On Data Arrival Volume
Protocol coordination between
streaming storage and streaming
engine to systematically scale up
and down the number of segments
and Flink workers based on load
variance over time
Utilize transactional writes to extend Exactly Once
processing semantics across multiple, chained apps
Writers scale based on app configuration; stream
storage elastically and independently scales
based on aggregate incoming volume of data
Streaming
App
“Raw
Stream” … …
Social,IoT,…
Writers
“Cooked
Stream”
2nd
Streaming
App
Sink
Worker
Worker
WorkerSegment
Segment
Segment
Sink ……
Worker
WorkerSegment
Segment

pravega.io
Search Reimagined for a Streaming World
Advantages of This Approach
• Seamlessly integrate search into streaming pipelines: continuous indexing + continuous query
• Dynamically scale search based on data volume arrival rate and query SLA
• Eliminate redundant storage across input streams and search
Input Streams
Pravega
Search
Continuous
Indexing
Continuous
Query
Result Streams
… stream pipeline …
… stream pipeline …

pravega.io
 Pravega: an open source project with an open community
 Software includes infinite byte stream primitive, event abstraction, ingest
buffer, and pub/sub services
 Flink integration for scale, elasticity, and system-wide exactly once
 Join the community at pravega.io

Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data"

Similar to Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data" (20)

More from Flink Forward

More from Flink Forward (20)

Recently uploaded

Recently uploaded (20)

Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data"