More complex streaming applications generally need to store some state of the running computations in a fault-tolerant manner. This talk discusses the concept of operator state and compares state management in current stream processing frameworks such as Apache Flink Streaming, Apache Spark Streaming, Apache Storm and Apache Samza.
We will go over the recent changes in Flink streaming that introduce a unique set of tools to manage state in a scalable, fault-tolerant way backed by a lightweight asynchronous checkpointing algorithm.
Talk presented in the Apache Flink Bay Area Meetup group on 08/26/15
2. This talk
§ Stateful processing by example
§ Definition and challenges
§ State in current open-source systems
§ State in Apache Flink
§ Closing
2Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
3. Stateful processing by example
§ Window aggregations
• Total number of customers
in the last 10 minutes
• State: Current aggregate
§ Machine learning
• Fitting trends to the evolving
stream
• State: Model
3Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
4. Stateful processing by example
§ Pattern recognition
• Detect suspicious financial
activity
• State: Matched prefix
§ Stream-stream joins
• Match ad views and
impressions
• State: Elements in the window
4Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
5. Stateful operators
§ All these examples use a common processing
pattern
§ Stateful operator (in essence):
𝒇:
𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆.
§ State hangs around and can be read and
modified as the stream evolves
§ Goal: Get as close as possible while
maintaining scalability and fault-tolerance
5Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
6. State-of-the-art systems
§ Most systems allow developers to
implement stateful programs
§ Trick is to limit the scope of 𝒇 (state access)
while maintaining expressivity
§ Issues to tackle:
• Expressivity
• Exactly-once semantics
• Scalability to large inputs
• Scalability to large states
6Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
7. § States available only in Trident API
§ Dedicated operators for state updates and
queries
§ State access methods
• stateQuery(…)
• partitionPersist(…)
• persistentAggregate(…)
§ It’s very difficult to
implement transactional
states
Exactly-‐‑once guarantee
7Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
9. § Stateless runtime by design
• No continuous operators
• UDFs are assumed to be stateless
§ State can be generated as a stream of
RDDs: updateStateByKey(…)
𝒇:
𝑺𝒆𝒒[𝒊𝒏 𝒌], 𝒔𝒕𝒂𝒕𝒆 𝒌 ⟶ 𝒔𝒕𝒂𝒕𝒆.
𝒌
§ 𝒇 is scoped to a specific key
§ Exactly-once semantics
9Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
10. val stateDstream = wordDstream.updateStateByKey[Int](
newUpdateFunc,
new HashPartitioner(ssc.sparkContext.defaultParallelism),
true,
initialRDD)
val updateFunc = (values: Seq[Int], state: Option[Int]) => {
val currentCount = values.sum
val previousCount = state.getOrElse(0)
Some(currentCount + previousCount)
}
Spark Streaming Word Count
10Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
11. § Stateful dataflow operators
(Any task can hold state)
§ State changes are stored
as a log by Kafka
§ Custom storage engines can
be plugged in to the log
§ 𝒇 is scoped to a specific task
§ At-least-once processing
semantics
11Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
12. Samza Word Count
public class WordCounter implements StreamTask, InitableTask {
//Some omitted details…
private KeyValueStore<String, Integer> store;
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
//Get the current count
String word = (String) envelope.getKey();
Integer count = store.get(word);
if (count == null) count = 0;
//Increment, store and send
count += 1;
store.put(word, count);
collector.send(
new OutgoingMessageEnvelope(OUTPUT_STREAM, word ,count));
}
}
12Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
13. What can we say so far?
§ Trident
+ Consistent state accessible from outside
– Only works well with idempotent states
– States are not part of the operators
§ Spark
+ Integrates well with the system guarantees
– Limited expressivity
– Immutability increases update complexity
§ Samza
+ Efficient log based state updates
+ States are well integrated with the operators
– Lack of exactly-once semantics
– State access is not fully transparent
13Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
14. § Take what’s good, make it work + add
some more
§ Clean and powerful abstractions
• Local (Task) state
• Partitioned (Key) state
§ Proper API integration
• Java: OperatorState interface
• Scala: mapWithState, flatMapWithState…
§ Exactly-once semantics by checkpointing
14Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
16. Local State
§ Task scoped state access
§ Can be used to implement
custom access patterns
§ Typical usage:
• Source operators (offset)
• Machine learning models
• Use cyclic flows to simulate
global state access
16Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
17. Local State Example (Java)
public class MySource extends RichParallelSourceFunction {
// Omitted details
private OperatorState<Long> offset;
@Override
public void run(SourceContext ctx) {
Object checkpointLock = ctx.getCheckpointLock();
isRunning = true;
while (isRunning) {
synchronized (checkpointLock) {
offset.update(offset.value() + 1);
// ctx.collect(next);
}
}
}
}
17Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
18. Partitioned State
§ Key scoped state access
§ Highly scalable
§ Allows for incremental
backup/restore
§ Typical usage:
• Any per-key operation
• Grouped aggregations
• Window buffers
18Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
19. Partitioned State Example (Scala)
// Compute the current average of each city's temperature
temps.keyBy("city").mapWithState {
(in: Temp, state: Option[(Double, Long)]) =>
{
val current = state.getOrElse((0.0, 0L))
val updated = (current._1 + in.temp, current._2 + 1)
val avg = Temp(in.city, updated._1 / updated._2)
(avg, Some(updated))
}
}
case class Temp(city: String, temp: Double)
19Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
20. Exactly-once semantics
§ Based on consistent global snapshots
§ Algorithm designed for stateful dataflows
20Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
Detailed mechanism
21. Exactly-once semantics
§ Low runtime overhead
§ Checkpointing logic is separated from
application logic
21Apache Flink Meetup @ MapR2015-‐‑08-‐‑27
Blogpost on streaming fault-‐‑tolerance
22. Summary
§ State is essential to many applications
§ Fault-tolerant streaming state is a hard
problem
§ There is a trade-off between expressivity vs
scalability/fault-tolerance
§ Flink tries to hit the sweet spot with…
• Providing very flexible abstractions
• Keeping good scalability and exactly-once
semantics
22Apache Flink Meetup @ MapR2015-‐‑08-‐‑27