Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Stream Application Development with Apache Kafka

1.444 Aufrufe

Veröffentlicht am

Talk at "Apache Kafka Bay Area" Meetup as Smyte, San Francisco, CA on 11/10/16.

https://www.meetup.com/http-kafka-apache-org/events/234812858/

Veröffentlicht in: Software

Stream Application Development with Apache Kafka

  1. 1. 1Confidential Stream Application Development with Apache® KafkaTM Matthias J. Sax | Software Engineer matthias@confluent.io @MatthiasJSax
  2. 2. 2Confidential Apache Kafka is … ...a distributed streaming platform Consumers Producers Connectors Processing
  3. 3. 3Confidential Confluent is ... …a company founded by the original creators of Apache Kafka ...a distributed streaming platform • Built on Apache Kafka • Confluent Open Source • Confluent Enterprise All components but Kafka are optional to run Confluent. Mix-and-match them as required. …a company founded by the original creators of Apache Kafka
  4. 4. 4Confidential Kafka Streams is ... … the easiest way to process data in Kafka (as of v0.10) • Easy to use library • Real stream processing / record by record / ms latency • DSL • Focus on applications • No cluster / “cluster to-go” • ”DB to-go” • Expressive • Single record transformations • Aggregations / Joins • Time, windowing, out-of-order data • Stream-table duality • Tightly integrated within Kafka • Fault-tolerant • Scalable (s/m/l/xl), elastic • Encryption, authentication, authorization • Stateful • Backed by Kafka • Queryable / “DB to-go” • Date reprocessing • Application “reset button”
  5. 5. 5Confidential Before Kafka Streams Do-it-yourself stream Processing • Hard to get right / lots of “glue code” • Fault-tolerance / scalability … ??? plain consumer/produc er clients
  6. 6. 6Confidential Before Kafka Streams
  7. 7. 7Confidential Before Kafka Streams Do-it-yourself stream Processing • Hard to get right / lots of “clue code” • Fault-tolerance / scalability … ??? Using a framework • Requires a cluster • Bare metal – hard to manage • YARN / Mesos • Test locally – deploy remotely • “Can you please deploy my code?” • Jar und dependency hell How does you application interact with you stream processing job? plain consumer/produc er clients and others...
  8. 8. 8Confidential Before Kafka Streams
  9. 9. 9Confidential Build apps, not clusters!
  10. 10. 10Confidential Easy to use! $ java -cp MyApp.jar io.confluent.MyApp
  11. 11. 11Confidential Easy to integrate!
  12. 12. 12Confidential Queryable / “DB to-go”
  13. 13. 13Confidential How to install Kafka Streams? Not at all. It’s a library. <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.1.0</version> </dependency>
  14. 14. 14Confidential How do I deploy my app? Whatever works for you. It’s just an app as any other!
  15. 15. 15Confidential If it’s just a regular application… • How does it scale? • How can it be fault-tolerant? • How does it handle distributed state? Off-load hard problems to brokers. • Kafka is a streaming platform: no need to reinvent the wheel • Exploit consumer groups and group management protocol
  16. 16. 16Confidential Scaling
  17. 17. 17Confidential Scaling
  18. 18. 18Confidential Scaling Easy to scale! It’s elastic! “cluster to-go”
  19. 19. 19Confidential Fault-tolerance Rebalance Consumer Group
  20. 20. 20Confidential Distributed State State stores
  21. 21. 21Confidential Distributed State State stores
  22. 22. 22Confidential Distributed State
  23. 23. 23Confidential Yes it’s complicated… API, coding Org. processes Reality™ Operations Security … Architecture
  24. 24. 24Confidential But… API, coding Org. processes Reality™ Operations Security … Architecture You Kafka core / Kafka Streams
  25. 25. 25Confidential KStream/KTable • KStream • Record stream • Each record describes an event in the real world • Example: click stream • KTable • Changelog stream • Each record describes a change to a previous record • Example: position report stream • In Kafka Streams: • KTable holds a materialized view of the latest update per key as internal state
  26. 26. 26Confidential KTable User profile/location information alice paris bob zurich alice berlin Changelog stream alice paris KTable state alice paris KTable state bob zurich alice berlin KTable state bob zurich
  27. 27. 27Confidential KTable (count moves) alice paris bob zurich alice berlin Record stream alice 0 KTable state count() Changelog stream (output) alice 0 alice 0 KTable state bob 0 count() bob 0 alice 1 KTable state bob 0 count() alice 1
  28. 28. 28Confidential KTable (cont.) • Internal state: • Continuously updating materialized view of the latest status • Downstream result (“output”) • Changelog stream, describing every update to the materialized view KStream stream = … KTable table = stream.aggregate(...) It’s the changelog!
  29. 29. 29Confidential KStream/KTable
  30. 30. 30Confidential Time and Windows • Event time (default) • Create time • (Broker) Ingestion time • Customized • (Hopping) Time windows • Overlapping or non-overlapping (tumbling) • For aggregations • Processing Time • Sliding windows • For KStream-KStream joins KStream stream = … KTable table = stream.aggregate(TimeWindow.of(10 * 1000), ...);
  31. 31. 31Confidential KTable Semantics • Non-windowed: • State is kept forever: • Out-of-order/late-arriving records can be handled straightforward • KTable aggregation can be viewed as a landmark window (ie, window size == infinite) • Output is a changelog stream • Windowed: • Windows (ie, state) is kept ”forever” (well, there is a configurable retention time) • Out-of-order/late-arriving records can be handled straightforward • Output is a changelog stream • No watermarks required • Early updates/results
  32. 32. 32Confidential Show Code!
  33. 33. 33Confidential Page Views per Region Stream/Table joinClick Stream Profile Changelog key val Current User Info Cn t PageViews per Region <userId:region> <userId:page> <region:page>
  34. 34. 34Confidential Page Views per Region final KStreamBuilder builder = new KStreamBuilder(); // read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>
  35. 35. 35Confidential Page Views per Region final KStreamBuilder builder = new KStreamBuilder(); // read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region> // enrich page views with user’s region -- stream-table-join final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, (page, userRegion) -> page + “,” + userRegion ) // and set “region” as new key .map( (userId, pageAndRegion) -> new KeyValue<>(pageAndRegion.split(“,”)[1], pageAndRegion.split(“,”)[0]) );
  36. 36. 36Confidential Page Views per Region final KStreamBuilder builder = new KStreamBuilder(); // read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region> // enrich page views with user’s region -- stream-table-join AND set “region” as new key final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page> // count views by region, using hopping windows of size 5 minutes that advance every 1 minute final KTable<Windowed<String>, Long> viewsPerRegion = viewsWithRegionKey .groupByKey() // redistribute data .count(TimeWindow.of(5 * 60 * 1000L).advanceBy(60 * 1000L), "GeoPageViewsStore");
  37. 37. 37Confidential Page Views per Region final KStreamBuilder builder = new KStreamBuilder(); // read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region> // enrich page views with user’s region -- stream-table-join AND set “region” as new key final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page> // count views by region, using hopping windows of size 5 minutes that advance every 1 minute final KTable<Windowed<String>, Long> viewsByRegion = viewsWithRegionKey.groupByKey().count(TimeWindow.of(...)..., ...); // write result viewsByRegion.toStream( (windowedRegion, count) -> windowedRegion.toString() ) // prepare result .to(stringSerde, longSerde, "PageViewsByRegion"); // write to topic “PageViewsByResion”
  38. 38. 38Confidential Page Views per Region final KStreamBuilder builder = new KStreamBuilder(); // read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region> // enrich page views with user’s region -- stream-table-join AND set “region” as new key final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page> // count views by region, using hopping windows of size 5 minutes that advance every 1 minute final KTable<Windowed<String>, Long> viewsByRegion = viewsWithRegionKey.groupByKey().count(TimeWindow.of(...)..., ...); // write result to topic “PageViewsByResion” viewsByRegion.toStream(...).to(..., "PageViewsByRegion"); // start application final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration); // streamsConfiguration omitted for brevity streams.start(); // stop application streams.close(); /* https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/PageViewRegionExample.java */
  39. 39. 39Confidential Interactive Queries • KTable is a changelog stream with materialized internal view (state) • KStream-KTable join can do lookups into the materialized view • What if the application could do lookups, too? https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ Yes, it can! “DB to-go“
  40. 40. 40Confidential Interactive Queries charlie 3bob 5 alice 2 New API to access local state stores of an app instance
  41. 41. 41Confidential Interactive Queries charlie 3bob 5 alice 2 New API to discover running app instances “host1:4460” “host5:5307” “host3:4777”
  42. 42. 42Confidential Interactive Queries charlie 3bob 5 alice 2 You: inter-app communication (RPC layer)
  43. 43. 43Confidential Wrapping Up • Kafka Streams is available in Apache Kafka 0.10 and Confluent Platform 3.1 • http://kafka.apache.org/ • http://www.confluent.io/download (OS + enterprise versions, tar/zip/deb/rpm) • Kafka Streams demos at https://github.com/confluentinc/examples • Java 7, Java 8+ with lambdas, and Scala • WordCount, Joins, Avro integration, Top-N computation, Windowing, Interactive Queries • Apache Kafka documentation: http://kafka.apache.org/documentation.html • Confluent documentation: http://docs.confluent.io/current/streams/ • Quickstart, Concepts, Architecture, Developer Guide, FAQ • Join our bi-weekly Confluent Developer Roundtable sessions on Kafka Streams • Contact me at matthias@confluent.io for detail
  44. 44. 44Confidential Thank You We are hiring!

×