Stream Processing with

Apache Flink
Maximilian Michels
Flink PMC member
mxm@apache.org
@stadtlegende
The Agenda
▪ What is Apache Flink?
▪ Streaming 101
▪ The Flink Engine
▪ A Quick Look at the API
2
Apache Flink
▪ A distributed open-source data analysis
framework
▪ True streaming at its core
▪ Streaming & Batch API
3
Hi...
Organizations at Flink Forward
4
Featured in
5
Flink Community
Top 5 Apache Big Data project in the Apache
Software Foundation
500+ messages/month on the mailing list
84...
Uses Cases for Flink
7
Use Case: Log File Analysis
▪ Load log files from a distributed file system
▪ Process them, sessionize according to the us...
Use Case: Tweet Impressions
9
Continuous Stream of Tweets
(each with a timestamp)
▪ How do we measure the importance of Tw...
Use Case: Tweet Impressions
10
Max Marie Jonas Tim are tweeting.
Last minute
Last hour
Last day
Impressions
Impression Eve...
Streaming 101
11
Why Stream Processing?
▪ Most problems have streaming nature
▪ Stream processing gives lower latency
▪ Data volumes more e...
Challenges in Streaming
▪ Latency
▪ Throughput
▪ Fault-Tolerance
▪ Correctness
▪ Elements may be out-of-order
▪ Elements m...
Windows
▪ A grouping of records according to time,
count, or session, e.g.
• Count: The last 100 records
• Session: All re...
Event Time
▪ Processing time: when data is processed
▪ Ingestion time: when data is loaded
▪ Event time: when data is gene...
Event Time & Watermarks
▪ Elements arrives: How do we know what time it
is?
▪ Processing time: take the hardware clock
▪ E...
Event Time & Watermarks
17
0
0
0 0
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
Watermark. Event Timewindow operator
Event Time & Watermarks
171
0
0 0
1
Watermark. Event Timewindow operator
Event Time & Watermarks
17
1
0
0 0
1
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
2
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
2
1
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
1
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
2
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
2
2
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
2
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
2
2
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
2
2
2
2
Watermark. Event Timewindow operator
Event Time & Watermarks
17
0
0 0
1
1
2
1
2
2 2
Watermark. Event Timewindow operator
18
Tumbling Windows of 4 Seconds
123412
4
59
9 0
20
20
22212326323321
26
35
18
Tumbling Windows of 4 Seconds
123412
4
59
9
0
20
20
22212326323321
26
35
18
Tumbling Windows of 4 Seconds
123412
4
59
9
20
20
22212326323321
26
35
0-3
18
Tumbling Windows of 4 Seconds
123412
4
59
9
20
20
22212326323321
26
35
0-3
18
Tumbling Windows of 4 Seconds
1
23412
4
59
9
20
20
22212326323321
26
35
0-3
18
Tumbling Windows of 4 Seconds
12
3412
4
59
9
20
20
22212326323321
26
35
0-3
18
Tumbling Windows of 4 Seconds
123
412
4
59
9
20
20
22212326323321
26
35
0-3
4-7
18
Tumbling Windows of 4 Seconds
123
412
4
59
9
20
20
22212326323321
26
35
0-3
4-7
18
Tumbling Windows of 4 Seconds
123
4
12
4
59
9
20
20
22212326323321
26
35
0-3
4-7
18
Tumbling Windows of 4 Seconds
123
4
1
2
4
59
9
20
20
22212326323321
26
35
0-3
4-7
18
Tumbling Windows of 4 Seconds
123
4
12
4
59
9
20
20
22212326323321
26
35
0-3
4-7
18
Tumbling Windows of 4 Seconds
123
4
12
4
59
9
20
20
22212326323321
26
35
4-7
18
Tumbling Windows of 4 Seconds
4
4
59
9
20
20
22212326323321
26
35
4-7
18
Tumbling Windows of 4 Seconds
4
59
9
20
20
22212326323321
26
35
4-7
18
Tumbling Windows of 4 Seconds
45
9
9
20
20
22212326323321
26
35
8-11
4-7
18
Tumbling Windows of 4 Seconds
45
9
9
20
20
22212326323321
26
35
8-11
4-7
18
Tumbling Windows of 4 Seconds
45
9
9
20
20
22212326323321
26
35
8-11
4-7
18
Tumbling Windows of 4 Seconds
45
9
9
20
20
22212326323321
26
35
8-11
18
Tumbling Windows of 4 Seconds
9
9
20
20
22212326323321
26
35
8-11
18
Tumbling Windows of 4 Seconds
9
20
20
22212326323321
26
35
20-23
8-11
18
Tumbling Windows of 4 Seconds
9
20
20
22212326323321
26
35
20-23
8-11
18
Tumbling Windows of 4 Seconds
9
20
20
22212326323321
26
35
20-23
8-11
18
Tumbling Windows of 4 Seconds
9
20
20
22212326323321
26
35
20-23
18
Tumbling Windows of 4 Seconds
20
20
22212326323321
26
35
20-23
18
Tumbling Windows of 4 Seconds
20
22212326323321
26
35
20-23
18
Tumbling Windows of 4 Seconds
20222123
26323321
26
35
24-27
20-23
18
Tumbling Windows of 4 Seconds
20222123
26323321
26
35
24-27
20-23
18
Tumbling Windows of 4 Seconds
20222123
26
323321
26
35
32-35
24-27
20-23
18
Tumbling Windows of 4 Seconds
20222123
26
323321
26
35
32-35
24-27
20-23
18
Tumbling Windows of 4 Seconds
20222123
26
3233
21
26
35
32-35
24-27
20-23
18
Tumbling Windows of 4 Seconds
20222123
26
3233
21
26
35
32-35
24-27
20-23
18
Tumbling Windows of 4 Seconds
20222123
26
3233
21
26
35
32-35
24-27
18
Tumbling Windows of 4 Seconds
26
3233
26
35
32-35
24-27
18
Tumbling Windows of 4 Seconds
26
3233
35
32-35
24-27
18
Tumbling Windows of 4 Seconds
26
323335
The Flink Engine
19
From Program to Execution
case	class	Path	(from:	Long,	to:	Long)	
val	tc	=	edges.iterate(10)	{		
		paths:	DataSet[Path]	=>...
Flink Applications
21
Streaming
topologies
Heavy
Batch jobs
Machine Learning at scale
Graph processing at scale
E.g.: Non-Native Iterations
22
Step Step Step Step Step
Client
for	(int	i	=	0;	i	<	maxIterations;	i++)	{	
	 //	Execute	Map...
Iterative Processing in Flink
▪ Built-in iterations and delta iterations
▪ Executes machine learning and graph
algorithms ...
E.g.: Non-Native Streaming
24
discretize
stream
Job Job Job Job
while	(true)	{	
		//	get	next	few	records	
		//	issue	batc...
Pipelining
25
Basic building block to “keep data moving”
• Low latency
• Operators push data
forward
• Data shipping as
bu...
Flink Engine
1. Execute everything as streams
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state in operators State	+	
Compu...
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state in operators
4. Operate on ...
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state in operators
4. Operate on ...
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state in operators
4. Operate on ...
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state in operators
4. Operate on ...
Flink Eco System
Gelly
Table
ML
SAMOA
DataSet (Java/Scala/Python) DataStream
HadoopM/R
Local Cluster Yarn
Dataflow
Dataflo...
Flink Eco System
Gelly
Table
ML
SAMOA
DataSet (Java/Scala/Python) DataStream
HadoopM/R
Local Cluster Yarn
Dataflow
Dataflo...
A Quick Look at the DataStream API
28
API Structure
//	Create	Environment	
StreamExecutionEnvironment	env	=	
			StreamExecutionEnvironment.getExecutionEnvironme...
Hourly Impressions
//	read	from	Kafka	Tweet	Impressions	topic

DataStream<Tweet>	tweets	=

			env.addSource(new	FlinkKafka...
Up-to-date Daily Impressions
//	read	from	Kafka	Tweet	Impressions	topic

DataStream<Tweet>	tweets	=

			env.addSource(new	...
Hourly Impression Summary
DataStream<Summary>	summaryStream	=	tweets

			.keyBy(tweet	->	tweet.tweetId)

			.window(Tumbli...
Closing
33
Apache Flink
▪ A powerful framework with stream
processor at its core
▪ Features
• True Streaming with great Batch support...
I ♥ , do you?
35
▪ More information on flink.apache.org
▪ Flink Training at data-artisans.com
▪ Subscribe to the mailing l...
Thank you for your attention!
36
Nächste SlideShare
Wird geladen in …5
×

Big Data Warsaw

482 Aufrufe

Veröffentlicht am

An introduction into stream processing with Apache Flink.

Veröffentlicht in: Daten & Analysen
0 Kommentare
1 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
482
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
3
Aktionen
Geteilt
0
Downloads
14
Kommentare
0
Gefällt mir
1
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Big Data Warsaw

  1. 1. Stream Processing with
 Apache Flink Maximilian Michels Flink PMC member mxm@apache.org @stadtlegende
  2. 2. The Agenda ▪ What is Apache Flink? ▪ Streaming 101 ▪ The Flink Engine ▪ A Quick Look at the API 2
  3. 3. Apache Flink ▪ A distributed open-source data analysis framework ▪ True streaming at its core ▪ Streaming & Batch API 3 Historic data Kafka, RabbitMQ, ... HDFS, JDBC, ... Event logs ETL, Graphs,
 Machine Learning
 Relational, … Low latency,
 windowing, aggregations, ...
  4. 4. Organizations at Flink Forward 4
  5. 5. Featured in 5
  6. 6. Flink Community Top 5 Apache Big Data project in the Apache Software Foundation 500+ messages/month on the mailing list 8400+ commits 1500+ pull requests merged 950+ stars 510+ forks
  7. 7. Uses Cases for Flink 7
  8. 8. Use Case: Log File Analysis ▪ Load log files from a distributed file system ▪ Process them, sessionize according to the user id ▪ Write a view to the database or dump more data for further processing 8 • Process • Analyze • Aggregate
  9. 9. Use Case: Tweet Impressions 9 Continuous Stream of Tweets (each with a timestamp) ▪ How do we measure the importance of Tweets? • Total number of views • Views within a time period ▪ We need to process and aggregate Tweets! Max Marie Jonas Tim are tweeting.
  10. 10. Use Case: Tweet Impressions 10 Max Marie Jonas Tim are tweeting. Last minute Last hour Last day Impressions Impression Events Aggregation of Impressions Output More at: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
  11. 11. Streaming 101 11
  12. 12. Why Stream Processing? ▪ Most problems have streaming nature ▪ Stream processing gives lower latency ▪ Data volumes more easily tamed ▪ More predictable resource consumption 12 Event stream batch (solved) event based
  13. 13. Challenges in Streaming ▪ Latency ▪ Throughput ▪ Fault-Tolerance ▪ Correctness ▪ Elements may be out-of-order ▪ Elements may be processed more than once 13
  14. 14. Windows ▪ A grouping of records according to time, count, or session, e.g. • Count: The last 100 records • Session: All records for user X • Time: All records of the last 2 minutes 14
  15. 15. Event Time ▪ Processing time: when data is processed ▪ Ingestion time: when data is loaded ▪ Event time: when data is generated ▪ Almost always, the three are different ▪ Event time helps to process out-of-order or to replay elements as they occurred 15
  16. 16. Event Time & Watermarks ▪ Elements arrives: How do we know what time it is? ▪ Processing time: take the hardware clock ▪ Event time: Watermarks ▪ Watermarks are timestamps ▪ No elements later than the timestamp are expected to arrive 16
  17. 17. Event Time & Watermarks 17 0 0 0 0 Watermark. Event Timewindow operator
  18. 18. Event Time & Watermarks 17 0 0 0 1 Watermark. Event Timewindow operator
  19. 19. Event Time & Watermarks 171 0 0 0 1 Watermark. Event Timewindow operator
  20. 20. Event Time & Watermarks 17 1 0 0 0 1 Watermark. Event Timewindow operator
  21. 21. Event Time & Watermarks 17 0 0 0 1 Watermark. Event Timewindow operator
  22. 22. Event Time & Watermarks 17 0 0 0 1 2 Watermark. Event Timewindow operator
  23. 23. Event Time & Watermarks 17 0 0 0 1 2 2 Watermark. Event Timewindow operator
  24. 24. Event Time & Watermarks 17 0 0 0 1 2 1 2 Watermark. Event Timewindow operator
  25. 25. Event Time & Watermarks 17 0 0 0 1 1 2 Watermark. Event Timewindow operator
  26. 26. Event Time & Watermarks 17 0 0 0 1 1 2 1 Watermark. Event Timewindow operator
  27. 27. Event Time & Watermarks 17 0 0 0 1 1 2 1 1 Watermark. Event Timewindow operator
  28. 28. Event Time & Watermarks 17 0 0 0 1 1 2 1 Watermark. Event Timewindow operator
  29. 29. Event Time & Watermarks 17 0 0 0 1 1 2 1 2 Watermark. Event Timewindow operator
  30. 30. Event Time & Watermarks 17 0 0 0 1 1 2 1 2 2 Watermark. Event Timewindow operator
  31. 31. Event Time & Watermarks 17 0 0 0 1 1 2 1 2 2 2 Watermark. Event Timewindow operator
  32. 32. Event Time & Watermarks 17 0 0 0 1 1 2 1 2 2 Watermark. Event Timewindow operator
  33. 33. Event Time & Watermarks 17 0 0 0 1 1 2 1 2 2 2 Watermark. Event Timewindow operator
  34. 34. Event Time & Watermarks 17 0 0 0 1 1 2 1 2 2 2 2 Watermark. Event Timewindow operator
  35. 35. Event Time & Watermarks 17 0 0 0 1 1 2 1 2 2 2 Watermark. Event Timewindow operator
  36. 36. 18 Tumbling Windows of 4 Seconds 123412 4 59 9 0 20 20 22212326323321 26 35
  37. 37. 18 Tumbling Windows of 4 Seconds 123412 4 59 9 0 20 20 22212326323321 26 35
  38. 38. 18 Tumbling Windows of 4 Seconds 123412 4 59 9 20 20 22212326323321 26 35
  39. 39. 0-3 18 Tumbling Windows of 4 Seconds 123412 4 59 9 20 20 22212326323321 26 35
  40. 40. 0-3 18 Tumbling Windows of 4 Seconds 1 23412 4 59 9 20 20 22212326323321 26 35
  41. 41. 0-3 18 Tumbling Windows of 4 Seconds 12 3412 4 59 9 20 20 22212326323321 26 35
  42. 42. 0-3 18 Tumbling Windows of 4 Seconds 123 412 4 59 9 20 20 22212326323321 26 35
  43. 43. 0-3 4-7 18 Tumbling Windows of 4 Seconds 123 412 4 59 9 20 20 22212326323321 26 35
  44. 44. 0-3 4-7 18 Tumbling Windows of 4 Seconds 123 4 12 4 59 9 20 20 22212326323321 26 35
  45. 45. 0-3 4-7 18 Tumbling Windows of 4 Seconds 123 4 1 2 4 59 9 20 20 22212326323321 26 35
  46. 46. 0-3 4-7 18 Tumbling Windows of 4 Seconds 123 4 12 4 59 9 20 20 22212326323321 26 35
  47. 47. 0-3 4-7 18 Tumbling Windows of 4 Seconds 123 4 12 4 59 9 20 20 22212326323321 26 35
  48. 48. 4-7 18 Tumbling Windows of 4 Seconds 4 4 59 9 20 20 22212326323321 26 35
  49. 49. 4-7 18 Tumbling Windows of 4 Seconds 4 59 9 20 20 22212326323321 26 35
  50. 50. 4-7 18 Tumbling Windows of 4 Seconds 45 9 9 20 20 22212326323321 26 35
  51. 51. 8-11 4-7 18 Tumbling Windows of 4 Seconds 45 9 9 20 20 22212326323321 26 35
  52. 52. 8-11 4-7 18 Tumbling Windows of 4 Seconds 45 9 9 20 20 22212326323321 26 35
  53. 53. 8-11 4-7 18 Tumbling Windows of 4 Seconds 45 9 9 20 20 22212326323321 26 35
  54. 54. 8-11 18 Tumbling Windows of 4 Seconds 9 9 20 20 22212326323321 26 35
  55. 55. 8-11 18 Tumbling Windows of 4 Seconds 9 20 20 22212326323321 26 35
  56. 56. 20-23 8-11 18 Tumbling Windows of 4 Seconds 9 20 20 22212326323321 26 35
  57. 57. 20-23 8-11 18 Tumbling Windows of 4 Seconds 9 20 20 22212326323321 26 35
  58. 58. 20-23 8-11 18 Tumbling Windows of 4 Seconds 9 20 20 22212326323321 26 35
  59. 59. 20-23 18 Tumbling Windows of 4 Seconds 20 20 22212326323321 26 35
  60. 60. 20-23 18 Tumbling Windows of 4 Seconds 20 22212326323321 26 35
  61. 61. 20-23 18 Tumbling Windows of 4 Seconds 20222123 26323321 26 35
  62. 62. 24-27 20-23 18 Tumbling Windows of 4 Seconds 20222123 26323321 26 35
  63. 63. 24-27 20-23 18 Tumbling Windows of 4 Seconds 20222123 26 323321 26 35
  64. 64. 32-35 24-27 20-23 18 Tumbling Windows of 4 Seconds 20222123 26 323321 26 35
  65. 65. 32-35 24-27 20-23 18 Tumbling Windows of 4 Seconds 20222123 26 3233 21 26 35
  66. 66. 32-35 24-27 20-23 18 Tumbling Windows of 4 Seconds 20222123 26 3233 21 26 35
  67. 67. 32-35 24-27 20-23 18 Tumbling Windows of 4 Seconds 20222123 26 3233 21 26 35
  68. 68. 32-35 24-27 18 Tumbling Windows of 4 Seconds 26 3233 26 35
  69. 69. 32-35 24-27 18 Tumbling Windows of 4 Seconds 26 3233 35
  70. 70. 32-35 24-27 18 Tumbling Windows of 4 Seconds 26 323335
  71. 71. The Flink Engine 19
  72. 72. From Program to Execution case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Task scheduling Recovery metadata Pre-flight (Client) Master Workers DataSource orders.tbl Filter Map DataSource lineitem.tbl Join Hybrid Hash buildHT probe hash-part [0] hash-part [0] GroupRed sort forward Program Dataflow
 Graph Memory manager Out-of-core algorithms Batch & Streaming State & Checkpoints deploy
 operators track
 intermediate
 results
  73. 73. Flink Applications 21 Streaming topologies Heavy Batch jobs Machine Learning at scale Graph processing at scale
  74. 74. E.g.: Non-Native Iterations 22 Step Step Step Step Step Client for (int i = 0; i < maxIterations; i++) { // Execute MapReduce job }
  75. 75. Iterative Processing in Flink ▪ Built-in iterations and delta iterations ▪ Executes machine learning and graph algorithms efficiently 23
  76. 76. E.g.: Non-Native Streaming 24 discretize stream Job Job Job Job while (true) { // get next few records // issue batch job }
  77. 77. Pipelining 25 Basic building block to “keep data moving” • Low latency • Operators push data forward • Data shipping as buffers, not tuple- wise • Natural handling of
  78. 78. Flink Engine 1. Execute everything as streams
  79. 79. Flink Engine 1. Execute everything as streams 2. Iterative (cyclic) dataflows
  80. 80. Flink Engine 1. Execute everything as streams 2. Iterative (cyclic) dataflows 3. Mutable state in operators State + Computation
  81. 81. Flink Engine 1. Execute everything as streams 2. Iterative (cyclic) dataflows 3. Mutable state in operators 4. Operate on managed memory State + Computation
  82. 82. Flink Engine 1. Execute everything as streams 2. Iterative (cyclic) dataflows 3. Mutable state in operators 4. Operate on managed memory 5. Special code paths for batch State + Computation
  83. 83. Flink Engine 1. Execute everything as streams 2. Iterative (cyclic) dataflows 3. Mutable state in operators 4. Operate on managed memory 5. Special code paths for batch 6. HA mode – no single point of failure State + Computation
  84. 84. Flink Engine 1. Execute everything as streams 2. Iterative (cyclic) dataflows 3. Mutable state in operators 4. Operate on managed memory 5. Special code paths for batch 6. HA mode – no single point of failure 7. Checkpointing of operator state State + Computation
  85. 85. Flink Eco System Gelly Table ML SAMOA DataSet (Java/Scala/Python) DataStream HadoopM/R Local Cluster Yarn Dataflow Dataflow MRQL Table Cascading Streaming dataflow runtime Storm Zeppelin
  86. 86. Flink Eco System Gelly Table ML SAMOA DataSet (Java/Scala/Python) DataStream HadoopM/R Local Cluster Yarn Dataflow Dataflow MRQL Table Cascading Streaming dataflow runtime Storm Zeppelin HDFS HBase Kafka RabbitMQ Flume HCatalog JDBC
  87. 87. A Quick Look at the DataStream API 28
  88. 88. API Structure // Create Environment StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // Add Source DataStream<Type> source = env.addSource(…); // Perform transformations DataStream<Type2> trans = source.keyBy(“field”).map(…).timeWindow(...) // Add Sink trans.addSink(…); // Execute! env.execute(); 29
  89. 89. Hourly Impressions // read from Kafka Tweet Impressions topic
 DataStream<Tweet> tweets =
 env.addSource(new FlinkKafkaConsumer<>(...));
 // count total number of tweets
 DataStream<Tweet> summaryStream = tweets .filter(tweet -> tweet.tweetId != null)
 .keyBy(tweet -> tweet.tweetId)
 .window(TumblingTimeWindows.of(Time.hours(1)))
 .sum("impressions");
 
 // output to Kafka summaryStream.addSink( new FlinkKafkaProducer<Tweet>(...)); 30 class Tweet {
 String tweetId;
 String userId;
 String text;
 long impressions;
 }
  90. 90. Up-to-date Daily Impressions // read from Kafka Tweet Impressions topic
 DataStream<Tweet> tweets =
 env.addSource(new FlinkKafkaConsumer<>(...));
 // count total number of tweets
 DataStream<Tweet> summaryStream = tweets .filter(tweet -> tweet.tweetId != null)
 .keyBy(tweet -> tweet.tweetId)
 .window(SlidingTimeWindows.of( Time.days(1), Time.minutes(1)))
 .sum("impressions");
 
 // output to database or Kafka summaryStream.addSink( new FlinkKafkaProducer<Tweet>(...)); 31 class Tweet {
 String tweetId;
 String userId;
 String text;
 long impressions;
 }
  91. 91. Hourly Impression Summary DataStream<Summary> summaryStream = tweets
 .keyBy(tweet -> tweet.tweetId)
 .window(TumblingTimeWindows.of(Time.hours(1)))
 .apply(new WindowFunction<>() {
 public void apply(String tweetId, TimeWindow window,
 Iterable<Tweet> impressions,
 Collector<Summary> out) {
 long count = 0; Tweet tweet = null;
 for (Tweet val : impressions) {
 tweet = val; count++;
 }
 // output summary
 out.collect(new Summary(tweet, count,
 window.getStart(),
 window.getEnd())); }
 }); 32 class Tweet {
 String tweetId;
 String userId;
 String text;
 } class Summary {
 Tweet tweet;
 long impressions;
 long beginTime;
 long endTime;
 }
  92. 92. Closing 33
  93. 93. Apache Flink ▪ A powerful framework with stream processor at its core ▪ Features • True Streaming with great Batch support • Easy to use APIs, library ecosystem • Fault-tolerant and Consistent • Low latency - High throughput • Growing community
  94. 94. I ♥ , do you? 35 ▪ More information on flink.apache.org ▪ Flink Training at data-artisans.com ▪ Subscribe to the mailing lists ▪ Follow @ApacheFlink ▪ Next: 1.0.0 release ▪ Soon: Stream SQL, Mesos, Dynamic scaling
  95. 95. Thank you for your attention! 36

×