Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Scaling stream data pipelines with Pravega and Apache Flink

1.103 Aufrufe

Veröffentlicht am

Extracting insights out of continuously generated data requires a stream processor with powerful data analytics features such as Apache Flink. A stream data pipeline with Flink typically includes a storage component to ingest and serve the data. Pravega is a stream store that ingests and stores stream data permanently, making the data available for tail, catch-up, and historical reads. One important challenge for such stream data pipelines is coping with the variations in the workload. Daily cycles and seasonal spikes might require the provisioning of the application to adapt accordingly. Pravega has a feature called stream scaling, which enables the capacity offered for the ingestion of events of a stream to grow and shrink over time according to workload. Such a feature is useful when the application downstream has the ability of accommodating such changes and also scale its provisioning accordingly. In this presentation, we introduce stream scaling in Pravega and how Flink jobs leverage this feature to rescale stateful jobs according to variations in the workload.

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Scaling stream data pipelines with Pravega and Apache Flink

  1. 1. Scaling stream data pipelines Flavio Junqueira, Pravega - Dell EMC Till Rohrmann, Data Artisans
  2. 2. Motivation Flink Forward - San Francisco, 2018 2
  3. 3. Flink Forward - San Francisco, 2018 Social networks Online shopping Streams ahoy! Stream of user events • Status updates • Online transactions 3
  4. 4. Flink Forward - San Francisco, 2018 Social networks Online shopping Server monitoring Stream of user events • Status updates • Online transactions Stream of server events • CPU, memory, disk utilization Streams ahoy! 4
  5. 5. Flink Forward - San Francisco, 2018 Social networks Online shopping Server monitoring Sensors (IoT) Stream of user events • Status updates • Online transactions Stream of server events • CPU, memory, disk utilization Stream of sensor events • Temperature samples • Samples from radar and image sensors in cars Streams ahoy! 5
  6. 6. Workload cycles and seasonal spikes Flink Forward - San Francisco, 2018 6 Daily cycles NYC Yellow Taxi Trip Records, March 2015 http://www.nyc.gov/html/tlc/html/about/trip _record_data.shtml Seasonal spikes https://www.slideshare.net/iwmw/building- highly-scalable-web-applications/7- Seasonal_Spikes
  7. 7. Workload cycles and spikes Flink Forward - San Francisco, 2018 7 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Seasonal spikes 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 1:00 3:00 5:00 7:00 9:00 11:00 13:00 15:00 17:00 19:00 21:00 23:00 Daily cycles 0 2 4 6 8 10 12 14 Weekly cycles Unplanned
  8. 8. Overprovisioning… what if we don’t want to overprovision? Flink Forward - San Francisco, 2018 8
  9. 9. Event processing Flink Forward - San Francisco, 2018 9 Processor 1Source Source emits 2 events/second Processor processes 3 events/second Append-only Log Colors represent event keys
  10. 10. Event processing Flink Forward - San Francisco, 2018 10 Source Processor processes 3 events/second Processor 1 Source emits 2 events/second Append-only Log Colors represent event keys
  11. 11. Event processing Flink Forward - San Francisco, 2018 11 Source ü Source rate increases ü New rate: 4 events/second ü Processor still processes 3 events/second ü Can’t keep up with the source rate Processor 1 Append-only Log Colors represent event keys
  12. 12. Event processing Flink Forward - San Francisco, 2018 12 Source ü Source rate increases ü New rate: 4 events/second Processor 1 Append-only Log Processor 2 ü Add a second processor ü Each processor processes 3 events/second ü Can keep up with the rate Colors represent event keys
  13. 13. Event processing Flink Forward - San Francisco, 2018 13 Source ü Source rate increases ü New rate: 4 events/second Processor 1 Append-only Log Processor 2 ü Add a second processor ü Each processor processes 3 events/second ü Can keep up with the rate Problem: Per-key order
  14. 14. Event processing Flink Forward - San Francisco, 2018 14 Source Processor 1 Processor 2 ü Source rate increases ü New rate: 4 events/second ü Add a second processor ü Each processor processes 3 events/second ü Can keep up with the rate Split the input and add processors Append-only Log
  15. 15. Event processing Flink Forward - San Francisco, 2018 15 Source Processor 1 Processor 2 ü Source rate increases ü New rate: 4 events/second ü Add a second processor ü Each processor processes 3 events/second ü Can keep up with the rate Split the input and add processors Append-only Log Problem: Per-key order
  16. 16. Event processing Flink Forward - San Francisco, 2018 16 Source Processor 1 Processor 2 ü Source rate increases ü New rate: 4 events/second ü Add a second processor ü Each processor processes 3 events/second ü Can keep up with the rate Split the input and add processors Processor 2 only starts once earlier events have been processed
  17. 17. Flink Forward - San Francisco, 2018 17 What about the order of events? What happens if the rate increases again? What if it drops?
  18. 18. Scaling in Pravega Flink Forward - San Francisco, 2018 18
  19. 19. Pravega • Storing data streams • Young project, under active development • Open source http://pravega.io http://github.com/pravega/pravega 19Flink Forward - San Francisco, 2018
  20. 20. Flink Forward - San Francisco, 2018 Time PresentRecent Past Distant Past Anatomy of a stream 20
  21. 21. Flink Forward - San Francisco, 2018 Messaging Pub-sub Bulk store Time PresentRecent Past Distant Past Anatomy of a stream 21
  22. 22. Flink Forward - San Francisco, 2018 Time PresentRecent Past Distant Past Anatomy of a stream 22 Pravega
  23. 23. Flink Forward - San Francisco, 2018 Time PresentRecent Past Distant Past Anatomy of a stream Unbounded amount of data Ingestion rate might vary 23 Pravega
  24. 24. Pravega aims to be a stream store able to: • Store stream data permanently • Preserve order • Accommodate unbounded streams • Adapt to varying workloads automatically • Low-latency from append to read Flink Forward - San Francisco, 2018 24
  25. 25. Pravega and Streams ….. 01110110 01100001 01101100 ….. 01001010 01101111 01101001 Pravega 01000110 01110110 Append Read 01000110 01110110 Flink Forward - San Francisco, 2018 Ingest stream data Process stream data 25
  26. 26. Pravega and Streams 01000110 01110110 Append Read Flink Forward - San Francisco, 2018 26 Event writer Event writer Event reader Event reader Group • Load balance • Grow and shrink Pravega Ingest stream data Process stream data
  27. 27. Segments in Pravega Flink Forward - San Francisco, 2018 01000111 01110110 11000110 01000111 01110110 11000110 Pravega Stream Composition of Segment: • Stream unit • Append only • Sequence of bytes 27
  28. 28. Parallelism Flink Forward - San Francisco, 2018 28
  29. 29. Segments in Pravega Pravega 01000110 01110110 Segments Append Read 01000110 01110110 01101111 01101001 01101001 01101111 Segments • Segments are sequences of bytes • Use routing keys to determine segment Flink Forward - San Francisco, 2018 〈key, 01101001 〉 Routing key ….. 01110110 01100001 01101100 ….. 01001010 01101111 01101001 29
  30. 30. Segments can be sealed Flink Forward - San Francisco, 2018 30
  31. 31. Segments in Pravega ….. 01110110 01100001 01101100 ….. 01101001 01110110 01001010 Pravega 01000110 01110110 Segments Append Read 01000110 01110110 01101111 01101001 01101001 01101111 Segments Once sealed, a segment can’t be appended to any longer. Flink Forward - San Francisco, 2018 E.g., ad clicks 31
  32. 32. How is sealing segments useful? Flink Forward - San Francisco, 2018 32
  33. 33. Segments in Pravega Pravega 01000110 Segments Segments 01101111 01000110 01000110 01000110 01101111 01101111 01101111 01101111 01000110 01000110 0110111101101111 01000110 01101111 Stream Compose to form a stream Flink Forward - San Francisco, 2018 33
  34. 34. Segments in Pravega 01000110 Segments Segments 01101111 01000110 01000110 01000110 01101111 01101111 01101111 01101111 01000110 01000110 0110111101101111 01000110 01101111 Stream Compose to form a stream • Each segment can live in a different server • Not limited to the capacity of a single server • Unbounded streams Flink Forward - San Francisco, 2018 00101111 01101001 34 Pravega
  35. 35. Segments in Pravega 01000110 Segments Segments 01101111 01000110 01000110 01000110 01101111 01101111 01101111 01101111 01000110 01000110 01101111 01000110 01101111 Stream Compose to form a stream 01101111 Flink Forward - San Francisco, 2018 35 Pravega
  36. 36. Stream scaling Flink Forward - San Francisco, 2018 36
  37. 37. 01000110 Scaling a stream ….. 01110110 01100001 01101100 01000110 • Stream has one segment 1 ….. 01110110 01100001 01101100 • Seal current segment • Create new ones 2 01000110 01000110 • Say input load has increased • Need more parallelism • Auto or manual scaling Flink Forward - San Francisco, 2018 37
  38. 38. Routing key space 0.0 1.0 Time Split Split Merge 0.5 0.75 Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 t0 t1 t2 Flink Forward - San Francisco, 2018 38
  39. 39. Routing key space 0.0 1.0 Time 0.5 0.75 Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 t0 t1 t2 Key ranges are not statically assigned to segments Flink Forward - San Francisco, 2018 39 Split Split Merge
  40. 40. Flink Forward - San Francisco, 2018 40
  41. 41. Daily Cycles Peak rate is 10x higher than lowest rate 4:00 AM 9:00 AM NYC Yellow Taxi Trip Records, March 2015 http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
  42. 42. Pravega Auto Scaling Merge Split
  43. 43. Source: Virtual cluster - Nautilus Platform Flink Forward - San Francisco, 2018 43
  44. 44. Source: Virtual cluster - Nautilus PlatformScale up Scale down Flink Forward - San Francisco, 2018 44
  45. 45. How do I control scaling? Flink Forward - San Francisco, 2018 45
  46. 46. Scaling policies • Configured on a per stream basis • Specifies a policy for the stream • Policies • Fixed • Set of segments is fixed • Bytes per second • Scales up and down according to volume of data • Target data rate • Events per second • Scales up and down according to volume of events • Target event rate Flink Forward - San Francisco, 2018 46
  47. 47. Auto-Scaling: Triggering a scaling event • By byte and event rates • Target T per segment • Reports every 2 minutes ü 2-min rate (2M) ü 5-min rate (5M) ü 10-min rate (10M) ü 20-min rate (20M) Flink Forward - San Francisco, 2018 47 Scale up x x + 2 min x + 4 min x + 6 min time • Scaling down ∧ 2M, 5M, 10M < T ∧ 20M < T / 2 2M = 60 5M = 56 10M = 46 T = 50 2M = 60 5M = 60 10M = 48 T = 50 2M = 60 5M = 60 10M = 5 T = 50 2M = 60 5M = 60 10M = 52 T = 50 Scale down x x + 2 min x + 4 min x + 6 min time 2M = 20 5M = 20 10M = 20 20M = 27 T = 50 • Scaling up ∨ 2M > 5 x T ∨ 5M > 2 x T ∨ 10M > T 2M = 20 5M = 20 10M = 20 20M = 26 T = 50 2M = 20 5M = 20 10M = 20 20M = 25 T = 50 2M = 20 5M = 20 10M = 20 20M = 24 T = 50
  48. 48. Read order Flink Forward - San Francisco, 2018 48
  49. 49. Reader groups + Scaling Pravega Segment 2 Segment 1 Reader Reader 1 Pravega Segment 2 Segment 1 Reader Reader 2 Segment 3 Segment 4 Scale up! Flink Forward - San Francisco, 2018 49
  50. 50. Reader groups + Scaling Pravega Segment 2 Segment 1 Reader Reader 3 Segment 3 Segment 4 • Hit end of segment • Get successors • Update reader group state Pravega Reader Reader 4 Segment 4 Segment 2 Segment 3 Pravega Reader {3} Reader {2, 4} 5 Segment 4 Segment 2 Segment 3 Flink Forward - San Francisco, 2018 50
  51. 51. Building pipelines – Scaling downstream Flink Forward - San Francisco, 2018 51
  52. 52. Scaling pipelines Flink Forward - San Francisco, 2018 52 Stage 1 Stage 2Source All stages can handle the load induced by the source
  53. 53. Scaling pipelines Flink Forward - San Francisco, 2018 53 Scaled Stage 1 Stage 2Big source Stage 2 can’t cope with the load change Load coming from source increases Stage 1 scales and adapts to the load change
  54. 54. Scaling signals Flink Forward - San Francisco, 2018 54 Pravega AppBig source • Pravega won’t scale the application
  55. 55. Scaling signals Flink Forward - San Francisco, 2018 55 Pravega AppBig source • Pravega won’t scale the application downstream • … but it can signal • E.g., more segments • E.g., number of unread bytes is growing Signals from Pravega
  56. 56. Reader group notifier • Listener API • Register a listener to react to changes • E.g., changes to the number of segments Flink Forward - San Francisco, 2018 56 ReaderGroupManager groupManager = new ReaderGroupManagerImpl(SCOPE, controller, clientFactory, connectionFactory); ReaderGroup readerGroup = groupManager.createReaderGroup(GROUP_NAME, ReaderGroupConfig.builder().build(), Collections.singleton(STREAM)); readerGroup.getSegmentNotifier(executor).registerListener(segmentNotification -> { int numOfReaders = segmentNotification.getNumOfReaders(); int segments = segmentNotification.getNumOfSegments(); if (numOfReaders < segments) { //Scale up number of readers based on application capacity } else { //More readers available time to shut down some } });
  57. 57. Reader group: listener and metrics • Listener API • Register a listener to react to changes • E.g., changes to the number of segments • Metrics • Reports specific values of interest • E.g., number of unread bytes in a stream Flink Forward - San Francisco, 2018 57
  58. 58. Consuming Pravega streams with Apache Flink Flink Forward - San Francisco, 2018 58
  59. 59. How to read Pravega streams with Flink? Flink Forward - San Francisco, 2018 59 Task Manager ReaderPravega Stream • FlinkPravegaReader • ReaderGroup • Assignment of segments • Rebalance • Key to automatic scaling Task Manager Task Manager Task Manager Reader https://github.com/pravega/flink-connectors
  60. 60. How to react to segment changes? Flink Forward - San Francisco, 2018 60 Pravega Stream Task Manager Job Manager Reader Rescaling Policy Task Manager Task Manager (2) Segment change notification (1) Register segment listener (3) Rescale job (4) Take savepoint (5) Redeploy & resume tasks Reader
  61. 61. Scaling signals 61 • Latency • Throughput • Resource utilization • Connector signals Flink Forward - San Francisco, 2018
  62. 62. Rescaling Flink applications Flink Forward - San Francisco, 2018 62
  63. 63. Scaling stateless jobs 63 Scale Up Scale Down Source Mapper Sink • Scale up: Deploy new tasks • Scale down: Cancel running tasks Flink Forward - San Francisco, 2018
  64. 64. Scaling stateful jobs 64 ? • Problem: Which state to assign to new task? Flink Forward - San Francisco, 2018
  65. 65. Different state types in Flink Flink Forward - San Francisco, 2018 65
  66. 66. Keyed vs. operator state 66 • State bound to a key • E.g. Keyed UDF and window state • State bound to a subtask • E.g. Source state Keyed Operator Flink Forward - San Francisco, 2018
  67. 67. Repartitioning keyed state • Similar to consistent hashing • Split key space into key groups • Assign key groups to tasks 67 Key space Key group #1 Key group #2 Key group #3Key group #4 Flink Forward - San Francisco, 2018
  68. 68. Repartitioning keyed state contd. • Rescaling changes key group assignment • Maximum parallelism defined by #key groups 68Flink Forward - San Francisco, 2018
  69. 69. Repartitioning operator state • Breaking operator state up into finer granularity • State has to contain multiple entries • Automatic repartitioning wrt granularity 69 #1 #2 #3 Flink Forward - San Francisco, 2018
  70. 70. Acquiring New Resources – Resource Elasticity Flink Forward - San Francisco, 2018 70
  71. 71. Flink’s Revamped Distributed Architecture Flink Forward - San Francisco, 2018 71 • Motivation • Resource elasticity • Support for different deployments • REST interface for client-cluster communication • Introduce generic building blocks • Compose blocks for different scenarios
  72. 72. The Building Blocks 72 • ClusterManager-specific • May live across jobs • Manages available Containers/TaskManagers • Used to acquire / release resources ResourceManager TaskManagerJobManager • Registers at ResourceManager • Gets tasks from one or more JobManagers • Single job only, started per job • Thinks in terms of "task slots" • Deploys and monitors job/task execution Dispatcher • Lives across jobs • Touch-point for job submissions • Spawns JobManagers Flink Forward - San Francisco, 2018
  73. 73. The Building Blocks 73 ResourceManager (3) Request slots TaskManager JobManager (4) Start TaskManagers (5) Register (7) Deploy Tasks Dispatcher Client (1) Submit Job (2) Start JobManager (6) Offer slots Flink Forward - San Francisco, 2018
  74. 74. Building Flink-on-YARN 74 YARN ResourceManager YARN Cluster YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) Application Master Flink-YARN ResourceManager JobManager TaskManager TaskManager TaskManager (2) Spawn Application Master (4) Start TaskManagers (6) Deploy Tasks (5) Register (3) Request slots
  75. 75. Does It Actually Work? Flink Forward - San Francisco, 2018 75 Flink Forward - San Francisco, 2018
  76. 76. Demo Topology 76 Pravega Source Sink FILE.out • Executed on Yarn to support dynamic resource allocation time Event rate Flink Forward - San Francisco, 2018
  77. 77. Wrap Up Flink Forward - San Francisco, 2018 77 Flink Forward - San Francisco, 2018
  78. 78. Wrap up • Pravega • Stream store • Scalable ingestion of continuously generated data • Stream scaling • Apache Flink • Stateful job scaling • Full resource elasticity • Operator rescaling policies work in progress • Pravega + Apache Flink • End-to-end scalable data pipelines Flink Forward - San Francisco, 2018 78
  79. 79. Flink Forward - San Francisco, 2018 79 Questions? http://pravega.io http://github.com/pravega/pravega http://flink.apache.org http://github.com/pravega/flink-connectors https://github.com/tillrohrmann/flink/tree/rescalingPolicy E-mail: fpj@apache.org, trohrmann@apache.org Twitter: @fpjunqueira, @stsffap

×