Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Flink	at	Alibaba
Haitao	Wang
About me
• 2017.3 – present Alibaba Compute Platform
• 2012 – 2017.3 Microsoft	China
• Spark	as	a	Service,	with	a	focus	on...
About Alibaba
About Alibaba Data
EBs Total PBs Everyday 100Ms Events/secTs Events/Day
Web	Tier
DB	Tier MQ
DataHub
Data	Pipeline HBase Dashboard
Exactly-Once
Lots	of	Events
Sub	second	
latency
Why Streaming?
H...
Challenges of Data Infra at Alibaba
Lots of JobsTons of Data
Exactly-OnceComplex Logic
Thousands of Machines Strict SLA
Lo...
Introducing Alibaba Blink
+ = Blink
Apache Flink Alibaba’s Improvements Alibaba Blink
Blink numbers
+ = Blink
Apache Flink Alibaba’s Improvements Alibaba Blink
Unprecedented	scale	on	2017-11-11
472M	events	pe...
Improvements in Blink Runtime
Async IO Increment CP Process & Deployment Metric
Declarative Optimizable Understandable
Why SQL?
Stable Unify
Dynamic Table
Stream	Data
Blink
Batch	Data
Dynamic
Table
Continuous
Query
Dynamic
Table
Stream	Data
Batch	Data
Stream Job
...
SQL Improvements
UDF/UDTF/UDAF Stream JOIN, etc. Retraction
Window AGG DML: INSERT etc. DDL
2016
Scalability & Reliability
2017
Productivity
Use	Case	— Real-time	A/B	Test	(Analytics)
Parser
Parser
Parser
Filter
Filter
Filter
Join UDF Agg
Impression
Click
Transact...
Use	Case	— Search	Index	Build	&	Update
HBase ClusterMysql Cluster
Mysql
IC2
Mysql
IC3
Mysql
IC1
Mysql
UIC2
Mysql
UIC3
Mysq...
Use	Case	— Online	Machine	Learning
Online Logs
HDFS
Feature
Feature
Compute
Online
Learning Model
Export Engine
online
Flink Advantages
• Event time / processing time
• Watermark, window, trigger
• Temporal join, retraction, …
Streaming sema...
Blink Focuses
• Single job 10K+ parallelism, 10s TB state size
Scale
• Efficient resource utilization (FLIP-6)
• Runtime i...
Blink	Ecosystem in Alibaba
Cluster Resource	Management
Search
Storage
StreamCompute	Platform
Blink
Alibaba Apps Recommenda...
Blink	&	Flink
YARN (Resource Management)
HDFS (Persistent Storage)
Blink Runtime
YARN App	Master
Resource Manager
Job Mana...
Examples of recent Blink runtime optimizations
• Credit-based	network	stack
• Dynamic	load	balancing
• Improved	check-poin...
Some	New Flink	Features
• Incremental Checkpoints
• Fine grained recovery
Flink	1.3
• Queryable	states	improvements
• Tabl...
Evolution of Large State Handling & Recovery
24
G
H
C
D
Full Checkpoints
25
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
@t1 @t2 @t3
A
F
C
D
E
G
H...
G
H
C
D
Incremental Checkpoints
26
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
E
F
G
H
I
@t1 @t2 ...
Incremental Checkpoints
27
Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4
C1 C3C1 C1
Chunk
1
Chunk
2
Chunk
3
Chunk
4
...
Network	Stack	Improvements
• Removal	of	redundant	copy	operations
• Event	driven	network	transfer
• Removal	of	artificial	...
State	Replication
• Decouple	state	from	Tasks
• Faster	recovery	in	case	of	Task	failure
• Replicate	state	between	Task	Man...
Thank	You!
We	are	hiring…
2018-01 Seattle Apache Flink Meetup, Talk 1 - Apache Flink at Alibaba
2018-01 Seattle Apache Flink Meetup, Talk 1 - Apache Flink at Alibaba
Nächste SlideShare
Wird geladen in …5
×

2018-01 Seattle Apache Flink Meetup, Talk 1 - Apache Flink at Alibaba

437 Aufrufe

Veröffentlicht am

These slides contain the opening remarks and talk #2 from the first Seattle Apache Flink meetup which had the following talks.

Date: Jan 17th, 2018, Wednesday
Location: Bellevue, WA

OPENING REMARKS (~5min)

TALK #1 (~45min)
Haitao Wang, Senior Staff Engineer at Alibaba, will give a presentation on large-scale streaming processing with Flink and Flink SQL at Alibaba and several internal use cases.

TALK #2 (~30min)
Bowen Li will talk about details of future meetup planning and logistics. He will also present how OfferUp, the largest mobile marketplace in the U.S., does large-scale stream processing with Flink to better serve local buyers and sellers, and what they have contributed to Flink's DataStream APIs, state backends, metrics system, and connectors.
See separately uploaded Slideshare: https://www.slideshare.net/dataArtisans/201801-seattle-apache-flink-meetup-at-offerup-opening-remarks-and-talk-2

We may also talk about what's new in Flink 1.4 and how users can leverage these new features, and what Flink 1.5 would look like and what's users vision on Flink.

SPONSOR: OfferUp

Attendees included: Alibaba Group, OfferUp, Uber, Amazon Web Services, Google, Microsoft, Zions Bank, Gridpoint, Dell/EMC, NeoPrime, Nordstrom, Snowflake, Tableau, Oracle, Expedia, Grab, Snapchat, and many others.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

2018-01 Seattle Apache Flink Meetup, Talk 1 - Apache Flink at Alibaba

  1. 1. Flink at Alibaba Haitao Wang
  2. 2. About me • 2017.3 – present Alibaba Compute Platform • 2012 – 2017.3 Microsoft China • Spark as a Service, with a focus on streaming • Skype for Business data • 2000 – 2012 Microsoft US • SQL Server engine • Speech API & Speech Server 2
  3. 3. About Alibaba
  4. 4. About Alibaba Data EBs Total PBs Everyday 100Ms Events/secTs Events/Day
  5. 5. Web Tier DB Tier MQ DataHub Data Pipeline HBase Dashboard Exactly-Once Lots of Events Sub second latency Why Streaming? Highly Available
  6. 6. Challenges of Data Infra at Alibaba Lots of JobsTons of Data Exactly-OnceComplex Logic Thousands of Machines Strict SLA Low LatencyHigh Throughput
  7. 7. Introducing Alibaba Blink + = Blink Apache Flink Alibaba’s Improvements Alibaba Blink
  8. 8. Blink numbers + = Blink Apache Flink Alibaba’s Improvements Alibaba Blink Unprecedented scale on 2017-11-11 472M events per second 10s of milliseconds latency Accurate results
  9. 9. Improvements in Blink Runtime Async IO Increment CP Process & Deployment Metric
  10. 10. Declarative Optimizable Understandable Why SQL? Stable Unify
  11. 11. Dynamic Table Stream Data Blink Batch Data Dynamic Table Continuous Query Dynamic Table Stream Data Batch Data Stream Job Batch Job
  12. 12. SQL Improvements UDF/UDTF/UDAF Stream JOIN, etc. Retraction Window AGG DML: INSERT etc. DDL
  13. 13. 2016 Scalability & Reliability 2017 Productivity
  14. 14. Use Case — Real-time A/B Test (Analytics) Parser Parser Parser Filter Filter Filter Join UDF Agg Impression Click Transaction Druid Online Logs
  15. 15. Use Case — Search Index Build & Update HBase ClusterMysql Cluster Mysql IC2 Mysql IC3 Mysql IC1 Mysql UIC2 Mysql UIC3 Mysql UIC1 HBase IC HBase UIC HBase Result Engine search Sync Join Export
  16. 16. Use Case — Online Machine Learning Online Logs HDFS Feature Feature Compute Online Learning Model Export Engine online
  17. 17. Flink Advantages • Event time / processing time • Watermark, window, trigger • Temporal join, retraction, … Streaming semantics • Managed state • Exactly once • Distributed checkpoints • Queryable stable Fault tolerant stateful processing at scale • Process Functions, DataStream / DataSet, Table API / SQL • Streaming, batch, event driven apps, CEP, ML, graph Rich programming models and APIs
  18. 18. Blink Focuses • Single job 10K+ parallelism, 10s TB state size Scale • Efficient resource utilization (FLIP-6) • Runtime improvements • Metrics & monitoring Performance, costs and SLA • SQL • Platform – develop, debug, deploy, monitor, migration • Connectors Productivity
  19. 19. Blink Ecosystem in Alibaba Cluster Resource Management Search Storage StreamCompute Platform Blink Alibaba Apps Recommendation BI Lots more DataStream API Runtime Engine Ads DataSet API SQL & Table API Machine Learning Platform
  20. 20. Blink & Flink YARN (Resource Management) HDFS (Persistent Storage) Blink Runtime YARN App Master Resource Manager Job Manager Task Manager Tasks Rocksdb State Backend Web Monitor Flink Client Alibaba Data Lake Submit Job Launch AM Request TM Launch TM Metrics Apache Flink Alibaba Blink Alibaba Monitor SystemMetric Reporter Connectors Read/Write Checkpoint Incrementally Debug Task Scheduling Checkpoint Coordination
  21. 21. Examples of recent Blink runtime optimizations • Credit-based network stack • Dynamic load balancing • Improved check-pointing for large scale jobs • Some of these work has been contributed to Flink and will be released in flink-1.5
  22. 22. Some New Flink Features • Incremental Checkpoints • Fine grained recovery Flink 1.3 • Queryable states improvements • Table API & SQL enhancements Flink 1.4 • FLIP-6 • Network stack improvements • State replication • Eager State Declaration • State evolution Flink 1.5+
  23. 23. Evolution of Large State Handling & Recovery 24
  24. 24. G H C D Full Checkpoints 25 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E @t1 @t2 @t3 A F C D E G H C D I E
  25. 25. G H C D Incremental Checkpoints 26 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E E F G H I @t1 @t2 @t3
  26. 26. Incremental Checkpoints 27 Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4 C1 C3C1 C1 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Storage C2 C4C3
  27. 27. Network Stack Improvements • Removal of redundant copy operations • Event driven network transfer • Removal of artificial latency source • Introduction of flow control • Better control of back pressure 28
  28. 28. State Replication • Decouple state from Tasks • Faster recovery in case of Task failure • Replicate state between Task Managers • Faster failure recovery in case of machine failure • High throughput queryable state 29
  29. 29. Thank You! We are hiring…

×