Stream Processing in conjunction with a Consistent, Durable, Reliable stream storage is kicking the revolution up a notch in Big Data processing. This modern paradigm is enabling a new generation of data middleware that delivers on the streaming promise of a simplified and unified programming model. From data ingest, transformation, and messaging to search, time series and more, a robust streaming data ecosystem means we’ll all be able to more quickly build applications that solve problems we could not solve before.
2. pravega.io
Data-Intensive Apps Need Disruptive Technologies
The Unbundled Database vision sounds awesome!
Loosely coupled data derivations and transformations
Update derived state by observing data changes
Observe changes in derived state – all the way to the edge
Integrity and correctness: end-to-end IDs, idempotence, data
consistency and exactly once semantics
BUT realizing it requires disruptive systems capabilities
Shared, durable, consistent, unbound distributed log storage
Ability to dynamically scale both the log(s) and downstream
processors in coordination with data arrival volume
Ability to deliver timely and accurate results processing the log
continuously even with late arriving or out of order data
3. pravega.io
The Unbundled Database vision sounds awesome!
Loosely coupled data derivations and transformations
Update derived state by observing data changes
Observe changes in derived state – all the way to the edge
Integrity and correctness: end-to-end IDs, idempotence, data
consistency and exactly once semantics
BUT realizing it requires disruptive systems capabilities
Shared, durable, consistent, unbound distributed log storage
Ability to dynamically scale both the log(s) and downstream
processors in coordination with data arrival volume
Ability to deliver timely and accurate results processing the log
continuously even with late arriving or out of order data
Data-Intensive Apps Are Disruptive
We passionately believe in these principles.
As the industry leaders in storage, we’re
developing a new, open storage primitive
enabling all of us to realize the full potential of
this powerful vision.
5. pravega.io
Introducing Pravega Stream Storage
A new storage abstraction – a stream – for continuous and infinite data
Named, durable, append-only, infinite sequence of bytes
With low-latency appends to and reads from the tail of the sequence
With high-throughput reads for older portions of the sequence
Coordinated scaling of stream storage and stream processing
Stream writes partitioned by app-defined routing key
Stream reads independently and automatically partitioned by arrival rate SLO
Scaling protocol to allow stream processors to scale in lockstep with storage
Enabling system-wide exactly once processing across multiple apps
Streams are ordered and strongly consistent
Chain independent streaming apps via streams
Stream transactions integrate with checkpoint schemes such as the one used in Flink
6. pravega.io
Revisiting the Disruptive Capabilities
Required Systems Capabilities
Shared, durable, consistent,
unbound distributed log storage
Dynamically scale logs in
coordination with downstream
processors
Deliver accurate results processing
continuously even with late arriving
or out of order data
Enabling Pravega Features
Durable, append-only byte streams
Consistent tail and replay reads
Unlimited retention, storage efficiency
Auto-scaling
Independently scale readers/writers
Transactions and exactly once
Event time and processing time
7. pravega.io
The Streaming Revolution
Enabling continuous pipelines w/ consistent replay, composability, elasticity, exactly once
Ingest Buffer
& Pub/Sub
Streaming
Search
Streaming
Analytics
Cloud-Scale Storage
Pravega Stream Store
State
Synchronizer
8. pravega.io
Pravega for Ingest Buffer and Pub/Sub
Ingest Buffer, Distributed Ledger or Messaging
using Pravega Event Client
Stream
01110110
01101001
Consumer
s
Reader
Groups
Consumer
s
Writers
9. pravega.io
Pravega for Application State Synchronization
Distributed State via State Synchronizer Client
“Shared State” Stream
01110110
01101001
App Process #1
State Synchronizer
Stream Client
App Process #n
State Synchronizer
Stream Client
• Shared Properties
• Shared scalar data
• Shared K/V data
10. pravega.io
Pravega + Flink = Pure Streaming End-to-End
Dynamically Scale Storage + Compute Based On Data Arrival Volume
Protocol coordination between
streaming storage and streaming
engine to systematically scale up
and down the number of segments
and Flink workers based on load
variance over time
Utilize transactional writes to extend Exactly Once
processing semantics across multiple, chained apps
Writers scale based on app configuration; stream
storage elastically and independently scales
based on aggregate incoming volume of data
Streaming
App
“Raw
Stream” … …
Social,IoT,…
Writers
“Cooked
Stream”
2nd
Streaming
App
Sink
Worker
Worker
WorkerSegment
Segment
Segment
Sink ……
Worker
WorkerSegment
Segment
11. pravega.io
Search Reimagined for a Streaming World
Advantages of This Approach
• Seamlessly integrate search into streaming pipelines: continuous indexing + continuous query
• Dynamically scale search based on data volume arrival rate and query SLA
• Eliminate redundant storage across input streams and search
Input Streams
Pravega
Search
Continuous
Indexing
Continuous
Query
Result Streams
… stream pipeline …
… stream pipeline …
12. pravega.io
Pravega: an open source project with an open community
Software includes infinite byte stream primitive, event abstraction, ingest
buffer, and pub/sub services
Flink integration for scale, elasticity, and system-wide exactly once
Join the community at pravega.io