Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

2018-01 Seattle Apache Flink Meetup, Talk 1 - Apache Flink at Alibaba

556 Aufrufe

Veröffentlicht am

These slides contain the opening remarks and talk #2 from the first Seattle Apache Flink meetup which had the following talks.

Date: Jan 17th, 2018, Wednesday
Location: Bellevue, WA

OPENING REMARKS (~5min)

TALK #1 (~45min)
Haitao Wang, Senior Staff Engineer at Alibaba, will give a presentation on large-scale streaming processing with Flink and Flink SQL at Alibaba and several internal use cases.

TALK #2 (~30min)
Bowen Li will talk about details of future meetup planning and logistics. He will also present how OfferUp, the largest mobile marketplace in the U.S., does large-scale stream processing with Flink to better serve local buyers and sellers, and what they have contributed to Flink's DataStream APIs, state backends, metrics system, and connectors.
See separately uploaded Slideshare: https://www.slideshare.net/dataArtisans/201801-seattle-apache-flink-meetup-at-offerup-opening-remarks-and-talk-2

We may also talk about what's new in Flink 1.4 and how users can leverage these new features, and what Flink 1.5 would look like and what's users vision on Flink.

SPONSOR: OfferUp

Attendees included: Alibaba Group, OfferUp, Uber, Amazon Web Services, Google, Microsoft, Zions Bank, Gridpoint, Dell/EMC, NeoPrime, Nordstrom, Snowflake, Tableau, Oracle, Expedia, Grab, Snapchat, and many others.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

2018-01 Seattle Apache Flink Meetup, Talk 1 - Apache Flink at Alibaba

  1. 1. Flink at Alibaba Haitao Wang
  2. 2. About me • 2017.3 – present Alibaba Compute Platform • 2012 – 2017.3 Microsoft China • Spark as a Service, with a focus on streaming • Skype for Business data • 2000 – 2012 Microsoft US • SQL Server engine • Speech API & Speech Server 2
  3. 3. About Alibaba
  4. 4. About Alibaba Data EBs Total PBs Everyday 100Ms Events/secTs Events/Day
  5. 5. Web Tier DB Tier MQ DataHub Data Pipeline HBase Dashboard Exactly-Once Lots of Events Sub second latency Why Streaming? Highly Available
  6. 6. Challenges of Data Infra at Alibaba Lots of JobsTons of Data Exactly-OnceComplex Logic Thousands of Machines Strict SLA Low LatencyHigh Throughput
  7. 7. Introducing Alibaba Blink + = Blink Apache Flink Alibaba’s Improvements Alibaba Blink
  8. 8. Blink numbers + = Blink Apache Flink Alibaba’s Improvements Alibaba Blink Unprecedented scale on 2017-11-11 472M events per second 10s of milliseconds latency Accurate results
  9. 9. Improvements in Blink Runtime Async IO Increment CP Process & Deployment Metric
  10. 10. Declarative Optimizable Understandable Why SQL? Stable Unify
  11. 11. Dynamic Table Stream Data Blink Batch Data Dynamic Table Continuous Query Dynamic Table Stream Data Batch Data Stream Job Batch Job
  12. 12. SQL Improvements UDF/UDTF/UDAF Stream JOIN, etc. Retraction Window AGG DML: INSERT etc. DDL
  13. 13. 2016 Scalability & Reliability 2017 Productivity
  14. 14. Use Case — Real-time A/B Test (Analytics) Parser Parser Parser Filter Filter Filter Join UDF Agg Impression Click Transaction Druid Online Logs
  15. 15. Use Case — Search Index Build & Update HBase ClusterMysql Cluster Mysql IC2 Mysql IC3 Mysql IC1 Mysql UIC2 Mysql UIC3 Mysql UIC1 HBase IC HBase UIC HBase Result Engine search Sync Join Export
  16. 16. Use Case — Online Machine Learning Online Logs HDFS Feature Feature Compute Online Learning Model Export Engine online
  17. 17. Flink Advantages • Event time / processing time • Watermark, window, trigger • Temporal join, retraction, … Streaming semantics • Managed state • Exactly once • Distributed checkpoints • Queryable stable Fault tolerant stateful processing at scale • Process Functions, DataStream / DataSet, Table API / SQL • Streaming, batch, event driven apps, CEP, ML, graph Rich programming models and APIs
  18. 18. Blink Focuses • Single job 10K+ parallelism, 10s TB state size Scale • Efficient resource utilization (FLIP-6) • Runtime improvements • Metrics & monitoring Performance, costs and SLA • SQL • Platform – develop, debug, deploy, monitor, migration • Connectors Productivity
  19. 19. Blink Ecosystem in Alibaba Cluster Resource Management Search Storage StreamCompute Platform Blink Alibaba Apps Recommendation BI Lots more DataStream API Runtime Engine Ads DataSet API SQL & Table API Machine Learning Platform
  20. 20. Blink & Flink YARN (Resource Management) HDFS (Persistent Storage) Blink Runtime YARN App Master Resource Manager Job Manager Task Manager Tasks Rocksdb State Backend Web Monitor Flink Client Alibaba Data Lake Submit Job Launch AM Request TM Launch TM Metrics Apache Flink Alibaba Blink Alibaba Monitor SystemMetric Reporter Connectors Read/Write Checkpoint Incrementally Debug Task Scheduling Checkpoint Coordination
  21. 21. Examples of recent Blink runtime optimizations • Credit-based network stack • Dynamic load balancing • Improved check-pointing for large scale jobs • Some of these work has been contributed to Flink and will be released in flink-1.5
  22. 22. Some New Flink Features • Incremental Checkpoints • Fine grained recovery Flink 1.3 • Queryable states improvements • Table API & SQL enhancements Flink 1.4 • FLIP-6 • Network stack improvements • State replication • Eager State Declaration • State evolution Flink 1.5+
  23. 23. Evolution of Large State Handling & Recovery 24
  24. 24. G H C D Full Checkpoints 25 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E @t1 @t2 @t3 A F C D E G H C D I E
  25. 25. G H C D Incremental Checkpoints 26 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E E F G H I @t1 @t2 @t3
  26. 26. Incremental Checkpoints 27 Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4 C1 C3C1 C1 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Storage C2 C4C3
  27. 27. Network Stack Improvements • Removal of redundant copy operations • Event driven network transfer • Removal of artificial latency source • Introduction of flow control • Better control of back pressure 28
  28. 28. State Replication • Decouple state from Tasks • Faster recovery in case of Task failure • Replicate state between Task Managers • Faster failure recovery in case of machine failure • High throughput queryable state 29
  29. 29. Thank You! We are hiring…

×