20. Delivery Guarantees
• Automatic offset checkpointing and recovery
– Supports at least once
– Exactly once for connectors that support it
(e.g. HDFS)
– At most once simply swaps write & commit
– On restart: task checks offsets & rewinds
21. Spark Streaming
• Use Direct Kafka streams (1.3+)
– Better integration, more efficient, better
semantics
• Spark Kafka Writer
– At least once
– Kafka community is working on improved
producer semantics
22. Spark Streaming & Kafka Connect
• Increase # of systems Spark Streaming
works with, indirectly
• Reduce friction to adopt Spark Streaming
• Reduce need for Spark-specific connectors
• By leveraging Kafka as de facto streaming
data storage
23. Kafka Connect Summary
23
• Designed for large scale stream or batch data
integration
• Community supported and certified way of using
Kafka
• Soon, large repository of open source connectors
• Easy data pipelines when combined with Spark &
Spark Streaming
24. THANK YOU.
Follow me on Twitter: @ewencp
Try it out: http://confluent.io/download
More like this, but in blog form: http://confluent.io/blog
25. Add Pages as Necessary
• Supporting points go here.