Akka Streams And Kafka Streams: Where Microservices Meet Fast Data

2. Using microservices for streaming data

3. Streaming in Context…

4. lightbend.com/fast-data-platform Free as in🍺

5. Streaming architectures (from the report)

6. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 10 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices ReacBve PlaEorm Go Node.js …

7. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 10 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices ReacBve PlaEorm Go Node.js … Today’s focus: •Kafka - the data backplane •Akka Streams and Kafka Streams - streaming microservices

8. Why Kafka?

9. Files Sockets REST ZooKeeper Cluster ZK Ka5 Ak S3, … HDFS DiskDiskDisk 1 5 6 3 10 KaFa Cluster Broker 2 4 7 8 Events Streams Microservices ReacBve PlaEorm Go Node.js …Why Kafka?

10. Organized into topics Ka#a Partition 1 Partition 2 Topic A Partition 1Topic B Topics are partitioned, replicated, and distributed

11. Unlike queues, consumers don’t delete entries; Kafka manages their lifecycles M Producers N Consumers, who start reading where they want Consumer 1 (at offset 14) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Partition 1Topic B Producer 1 Producer 2 Consumer 2 (at offset 10) writes reads Consumer 3 (at offset 6) earliest latest Logs, not queues!

12. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Kafka for Connectivity X

13. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Kafka for Connectivity Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After: X X

14. Kafka for Connectivity Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After: • Simplify dependencies • Resilient against data loss • M producers, N consumers • Simplicity of one “API” for communication

15. Streaming Engines

16. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … Spark, Flink - services to which you submit work. Large scale, automatic data partitioning. Beam - similar. Google’s project that has been instrumental in defining streaming semantics. Streaming Engines

17. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … They do a lot (Spark example) …NodeNode Spark Driver object MyApp { def main() { val ss = new SparkSession(…) … } } Cluster Manager Spark Executor task task task task Spark Executor task task task task …

18. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … They do a lot (Spark example) Cluster Node input Node Node Node Time filter flatMap join map Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 … stage1stage2 … … … …

19. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … Akka Streams, Kafka Streams - libraries for “data-centric microservices”. Smaller scale, but great flexibility Streaming Engines

20. Microservice All the Things!!

21. “Record-centric” μ-services Events Records A Spectrum of Microservices Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory storage Data Model Training Model Serving Other Logic

22. A Spectrum of Microservices … Browse REST AccountOrders Shopping Cart API Gateway Inventory • Each item has a an identity • Process each item uniquely • Think sessions and state machines

24. A Spectrum of Microservices storage Data Model Training Model Serving Other Logic • Anonymous items • Process en masse • Think SQL queries

26. Events Records A Spectrum of Microservices Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory Akka emerged from the left-hand side of the spectrum, the world of highly Reactive microservices. Akka Streams pushes to the right, more data-centric.

27. “Record-centric” μ-services Events Records A Spectrum of Microservices storage Data Model Training Model Serving Other Logic Emerged from the right-hand side. Kafka Streams pushes to the left, supporting many event- processing scenarios.

28. Kafka Streams

29. • Important stream-processing semantics, e.g., • Distinguish between event time and processing time Kafka Streams 0 Time (minutes) 1 2 3 … Analysis Server 1 Server 2 accumulate 1 1 2 2 2 2 2 2 1 1 2 2 1 1 1 … Key Collect data, Then process accumulate n Event at Server n propagated to Analysis

30. • Important stream-processing semantics, e.g., • Windowing support (e.g., group by within a window) Kafka Streams 0 Time (minutes) 1 2 3 … Analysis Server 1 Server 2 accumulate 1 1 2 2 2 2 2 2 1 1 2 2 1 1 1 … Key Collect data, Then process accumulate n Event at Server n propagated to Analysis See my O’Reilly report for details.

31. • Exactly once • vs. • At most once - “fire and forget” • At least once - keep trying until acknowledged (but you have to handle de-duplication) • Note: it’s really eﬀectively once Kafka Streams

32. • KStream • Per record transformations, one to one mapping • KTable • Last value per key • Useful for holding state values Kafka Streams Consumer 1 (at offset 14) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Partition 1Topic B Producer 1 Producer 2 Consumer 2 (at offset 10) writes reads Consumer 3 (at offset 6) earliest latest Ka#a KTable A KTable B

33. • Read from and write to Kafka topics, memory • Kafka Connect used for other sources and sinks • Load balance and scale using topic partitioning Kafka Streams

34. • Java API • Scala API: written by Lightbend • SQL!! Kafka Streams

35. Kafka Streams Example Data Model Training Model Serving Other Logic Raw Data Model Params Scored Records Final Records storage Data Model Training Model Serving Other Logic

36. Kafka Streams Example We’ll use the new Scala Kafka Streams API: • Developed by Debasish Ghosh, Boris Lublinsky, Sean Glover, et al. from Lightbend • See also the following, if you know about queryable state: • https://github.com/lightbend/kafka-streams-query

37. See github.com/lightbend/ kafka-with-akka-streams-kafka-streams-tutorial

38. Data Model Training Model Serving Raw Data Model Params Scored Records val builder = new StreamsBuilderS // New Scala Wrapper API. val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic) val model = builder.stream[Array[Byte], Array[Byte]](modelTopic) val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) // scorer.score(record) used model.mapValues(bytes => Model.parseBytes(bytes)) // array => record .filter((key, model) => model.valid) // Successful? .mapValues(model => ModelImpl.findModel(model)) .process(() => modelProcessor, …) // Set up actual model data.mapValues(bytes => DataRecord.parseBytes(bytes)) .filter((key, record) => record.valid) .mapValues(record => new ScoredRecord(scorer.score(record),record)) .to(scoredRecordsTopic) val streams = new KafkaStreams( builder.build, streamsConfiguration) streams.start() sys.addShutdownHook(streams.close())

39. val builder = new StreamsBuilderS // New Scala Wrapper API. val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic) val model = builder.stream[Array[Byte], Array[Byte]](modelTopic) val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) // scorer.score(record) used model.mapValues(bytes => Model.parseBytes(bytes)) // array => record .filter((key, model) => model.valid) // Successful? .mapValues(model => ModelImpl.findModel(model)) .process(() => modelProcessor, …) // Set up actual model data.mapValues(bytes => DataRecord.parseBytes(bytes)) .filter((key, record) => record.valid) .mapValues(record => new ScoredRecord(scorer.score(record),record)) .to(scoredRecordsTopic) val streams = new KafkaStreams( builder.build, streamsConfiguration) streams.start() sys.addShutdownHook(streams.close()) Data Model Training Model Serving Raw Data Model Params Scored Records

48. What’s Missing? Kafka Streams is a powerful library, but you’ll need to provide the rest of the microservice support telling through other means. (Some are provided if you run the support services for the SQL interface.) You would embed your KS code in microservices written with more comprehensive toolkits, such as the Lightbend Reactive Platform! We’ll return to this point…

49. Akka Streams

50. Akka Streams • Implements Reactive Streams • http://www.reactive-streams.org/ • Back pressure for flow control

51. Akka Streams Event Event Event Event Event Event Event/Data Stream Consumer Consumer bounded queue back pressure back pressure back pressure

52. Akka Streams Event/Data Stream Consumer Consumer Back pressure composes!

53. Akka Streams • Part of the Akka ecosystem • Akka Actors, Akka Cluster, Akka HTTP, Akka Persistence, … • Alpakka - rich connection library • like Camel, but implements Reactive Streams • Very low overhead and latency

54. Akka Streams • The “gist”: val source: Source[Int, NotUsed] = Source(1 to 10) val factorials = source.scan(BigInt(1)) ( (total, next) => total * next ) factorials.runWith(Sink.foreach(println))

55. val source: Source[Int, NotUsed] = Source(1 to 10) val factorials = source.scan(BigInt(1)) ( (total, next) => total * next ) factorials.runWith(Sink.foreach(println)) Akka Streams • The “gist”: Source Flow Sink A “Graph”

56. Akka Streams Example Akka Cluster Data Model Training Model Serving Other Logic Raw Data Model Params Final Records Alpakka Alpakka storage Data Model Training Model Serving Other Logic Raw Data Model Params Scored Records Final Records

57. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka implicit val system = ActorSystem("ModelServing") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val modelProcessor = new ModelProcessor // Same as KS example val scorer = new Scorer(modelProcessor) // Same as KS example val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage” val dataStream: Source[Record, Consumer.Control] = Consumer.atMostOnceSource(dataConsumerSettings, Subscriptions.topics(rawDataTopic)) .map(input => DataRecord.parseBytes(input.value())) .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model))

63. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka implicit val system = ActorSystem("ModelServing") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val modelProcessor = new ModelProcessor // Same as KS example val scorer = new Scorer(modelProcessor) // Same as KS example val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage” val dataStream: Source[Record, Consumer.Control] = Consumer.atMostOnceSource(dataConsumerSettings, Subscriptions.topics(rawDataTopic)) .map(input => DataRecord.parseBytes(input.value())) .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model)) case class ModelScoringStage(scorer: …) extends GraphStageWithMaterializedValue[…, …] { val dataRecordIn = Inlet[Record]("dataRecordIn") val modelRecordIn = Inlet[ModelImpl]("modelRecordIn") val scoringResultOut = Outlet[ScoredRecord]("scoringOut") … setHandler(dataRecordIn, new InHandler { override def onPush(): Unit = { val record = grab(dataRecordIn) val newRecord = new ScoredRecord(scorer.score(record),record)) push(scoringResultOut, Some(newRecord)) pull(dataRecordIn) }) … }

65. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model)) .collect{ case Success(modImpl) => modImpl } .foreach(modImpl => modelProcessor.setModel(modImpl)) modelStream.to(Sink.ignore).run() // No “sinking” required; just run dataStream .viaMat(modelScoringStage)(Keep.right) .map(result => new ProducerRecord[Array[Byte], ScoredRecord]( scoredRecordsTopic, result)) .runWith(Producer.plainSink(producerSettings))

66. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model)) .collect{ case Success(modImpl) => modImpl } .foreach(modImpl => modelProcessor.setModel(modImpl)) modelStream.to(Sink.ignore).run() // No “sinking” required; just run dataStream .viaMat(modelScoringStage)(Keep.right) .map(result => new ProducerRecord[Array[Byte], ScoredRecord]( scoredRecordsTopic, result)) .runWith(Producer.plainSink(producerSettings))

67. Other Concerns

68. •Scale scoring with workers and routers, across a cluster •Persist actor state with Akka Persistence •Connect to almost anything with Alpakka •StreamRefs for distributed streams (new!) Akka Cluster Model Serving Other Logic Alpakka Alpakka Router Worker Worker Worker Worker Stateful Logic Persistence actor state storage Akka Cluster Data Model Training Model Serving Other Logic Raw Data Model Params Final Records Alpakka Alpakka storage

69. •Extremely low latency •Minimal I/O and memory overhead. No marshaling overhead •Hard to scale, evolve independently Go Direct or Through Kafka? Akka Cluster Model Serving Other Logic Alpakka Alpakka Model Serving Other Logic Scored Records vs. ? •Higher latency (network, queue depth) •Higher I/O and processing (marshaling) overhead •Easy independent scalability, evolution

70. Go Direct or Through Kafka? Akka Cluster Model Serving Other Logic Alpakka Alpakka Model Serving Other Logic Scored Records vs. ? •Reactive Streams back pressure •“Direct” coupling between sender and receiver, but actually uses a URL abstraction •Very deep buﬀer (partition limited by disk size) •Strong decoupling - M producers, N consumers, completely disconnected

71. Wrapping Up…

72. Free as in🍺 lightbend.com/fast-data-platform

73. lightbend.com/fast-data-platform

74. dean.wampler@lightbend.com lightbend.com/fast-data-platform polyglotprogramming.com/talks Questions?

Akka Streams And Kafka Streams: Where Microservices Meet Fast Data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Akka Streams And Kafka Streams: Where Microservices Meet Fast Data

Ähnlich wie Akka Streams And Kafka Streams: Where Microservices Meet Fast Data (20)

Mehr von Lightbend

Mehr von Lightbend (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Akka Streams And Kafka Streams: Where Microservices Meet Fast Data