Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS

33 Aufrufe

Veröffentlicht am

Querying streaming data with SQL to derive actionable insights at the point of impact in a timely and continuous fashion offers various benefits over querying data in a traditional database. However, although it is desirable for many use cases to transition to a stream based paradigm, stream processing systems and traditional databases are fundamentally different: in a database, the data is (more or less) fixed and the queries are executed in an ad-hoc manner, whereas in stream processing systems, the queries are fixed and the data flows through the system in real-time. This leads to different primitives that are required to model and query streaming data.
In this session, we will introduce basic stream processing concepts and discuss strategies that are commonly used to address the challenges that arise from querying of streaming data. We will discuss different time semantics, processing guarantees and elaborate how to deal with reordering and late arriving of events. Finally, we will compare how different streaming use cases can be implemented on AWS by leveraging Amazon Kinesis Data Analytics and Apache Flink.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dr. Steffen Hausmann Sr. Solutions Architect, Amazon Web Services Deep Dive into Concepts and Tools for Analyzing Streaming Data
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data originates in real-time Creek 1 by mountainamoeba / cc by 2.0
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analytics is done in batches Königsee by andresumida / cc by 2.0
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Insights are Perishable Chillis by Lucas Cobb / cc by 2.0
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analyzing Streaming Data on AWS
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges of Stream Processing Lines by FollowYour Nose / cc by 2.0
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparing Streams and Relations 𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 Relation 𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒 Stream 7 now
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Querying Streams and Relations Relation Stream Fixed data and ad-hoc queries Fixed queries and continuously ingested data
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges of Querying Infinite Streams SELECT * FROM S WHERE color = ‘black’ SELECT * FROM S JOIN S’ SELECT color, COUNT(1) FROM S GROUP BY color ... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analyzing Streaming Data on AWS • Runs standard SQL queries on top of streaming data • Fully managed and scales automatically • Only pay for the resources your queries consume Amazon Kinesis Analytics • Open-source stream processing framework • Included in Amazon Elastic Map Reduce (EMR) • Flexible APIs with Java and Scalar, SQL, and CEP support Apache Flink SQL
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Queries over Streams Windows by Brad Greenlee / cc by 2.0
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Non-monotonic Operators Tumbling Windows SELECT STREAM color, COUNT(1) FROM ... GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color; t1 t3 t5 t6 t9 10 sec SQL
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Non-monotonic Operators Sliding Windows SELECT STREAM color, COUNT(1) OVER w FROM ... GROUP BY color WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING); t1 t3 t5 t6 t9 SQL
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Non-monotonic Operators Session Windows t5 t6t1 t3 t8 t9 stream .keyBy(<key selector>) .window(EventTimeSessionWindows.withGap(Time.minutes(10))) .<windowed transformation>(<window function>); session gap
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SELECT STREAM * FROM S AS s JOIN S’ AS t ON s.color = t.color SELECT STREAM * FROM S OVER w AS s JOIN S’ OVER w AS t ON s.color = t.color WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING); Evaluating Unbounded Queries t2 t4 t8t7 t1 t3 t5 t6 t9 S S‘ SQL
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Time Semantics
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events t1 t3 t8t7 Event Time t1 t3 t8 7 Processing Time t7 t11 t11
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using processing time based windows t1 t3 t8 t7 Processing Time processing time count 0 processing time count 10 t11
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using multiple time-windows SELECT STREAM STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time, STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time, color, COUNT(1) FROM ... GROUP BY processing_time, event_time, color; SQL
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using multiple time-windows t1 t3 t8 t7 Processing Time processing time event time count 0 0 processing time event time count 10 0 10 10 t11
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using event time and watermarks t1 t3 t8 t7 10 20 event time count 0 event time count 10 0 Processing Time t11
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adding Watermarks to a Stream - Periodic watermarks - Assuming ascending timestamps - Punctuated watermarks stream.assignTimestampsAndWatermarks( new AscendingTimestampExtractor<MyEvent>() { @Override public long extractAscendingTimestamp(MyEvent element) { return element.getCreationTime(); } });
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Watermarks and Allowed Lateness t3 t1 t8 t4 80 Processing Time stream .keyBy(<key selector>) .window(<window assigner>) .allowedLateness(<time>) .sideOutputLateData(lateOutputTag) t5
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics Kaseki 2010 by Dominic Alves / cc by 2.0
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Consuming Data from a Stream Consumer Output sink
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics At-most Once Semantics Consumer Output sink Offset store pos 561 pos 561 pos 1105 pos 1105
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics At-least Once Semantics Consumer Output sink Offset store pos 561 pos 0 pos 0
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics Exactly-once Semantics • At-least-once event delivery plus message deduplication • Keep a transaction log of processed messages • On failure, replay events and remove duplicated events for every operator Message Deduplication • State for each operator is periodically checkpointed • On failure, rewind operator to the previous consistent state Distributed Snapshots
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Go Build!
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the summit mobile app.
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

×