Apache flink

Apache Flink Introduction
By: Ahmed Nader

Agenda
• What’s Apache Flink?
• Deeper into Flink
• Quick Start and Configuration
• Get your hands dirty
• Tips and some useful links
• References
2

What’s Apache Flink?
 Open Source platform for distributed Stream and
Batch Processing.
 Large scale data processing engine.
 Real Streaming engine, not cutting stream into
batches.
 Flink has 2 APIs.
DataStrea
m DataSet
3

Datastream API
 Represents a continuous stream of data of certain
type.
 Operations applied on each element of the stream or
windows.
Data
Strea
m
Operation
Data
Strea
m
Source Sink
4

Datastream API
5
 Example Live Stock Feed:
Apple 235
Alert if
Microsoft
> 120
Apple 235
Google 516
Sum
every 10
seconds
Microsoft 124
Microsoft 124
Google 516
Write
event to
database
Alert if
sum >
10000

Dataset API
6
 Uses Batch processing.
 Special case for Stream processing where finite data
sources are just streams that happen to end.
 Offers dedicated API with machine learning and
graph processing libraries.
Data
Set
Operation
Data
Set
Source Sink

Dataset API
7
 Example Map/Reduce paradigm:
Map Reduce
a
1
2
…

Analyzing flink stack
9
 Streaming dataflow runtime which interprets every
program as a dataflow graph.
 Some Libraries on top of Datastream and Dataset
API such as:
 Table: enables SQL like queries.
 Gelly: Graph processing to transform and traverse
graphs in a distributed fashion.
 ML: has a couple of machine learning algorithms yet
still too basic.
 CEP: easily detect complex events in a data stream.
Which can allow to get hold of what’s really important
in your data.

Deeper into Flink
10
Data Sources
From an
input file
From a
socket
From a
collection

Deeper into Flink
11
Data Sinks
Write to a
CSV File
Write to a
socket
Print on
the
terminal

Deeper into Flink
12
 Data Transformations(for DataStream API):
 Map: takes 1 element and produces 1 element.
 flatMap: takes 1 element and produces 0 or more
elements.
 Filter: Evaluates a boolean value for each element
and retains those returning true.
 KeyBy: partitions a stream into disjoint partitions
each has elements of the same key.
 Window: groups all stream events according to some
characteristic ex: data arrived in last 5 seconds.
 Union, Join, Split, Select…

Deeper into Flink
13
 Interesting Use cases:
 Processing Twitter feed and one good application for
that can be collecting statistics on that feed.
see: http://blog.brakmic.com/stream-processing-with-apache-flink/
 Identifying popular locations where people arrive by
taxis,
By applying filter and map functions on a datastream
of taxi ride records then getting the most popular
places for the last 15 minutes for example.
see: https://www.mapr.com/blog/essential-guide-streaming-first-
processing-apache-flink

Setup
14
 Pre-requisites:
 Java 7.x or higher.
 Maven 3.0.4 or higher.
 Start a new flink project using Maven:
Run the following script in the terminal:
mvn archetype:generate -DarchetypeGroupId=org.apache.flink -
DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.0.1
OR
 Add flink to an existing project:
see: https://ci.apache.org/projects/flink/flink-docs-release-
1.0/apis/common/index.html

Get your hands dirty:
17
Execution
Local/debugging
cluster Command Line
Interface
Web interface
See: https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.htm

Tips and some useful links:
18
 Subscribe to the mailing list, by sending an empty
email to user-subscribe@flink.apache.org.
 Clone the flink project on Github for more examples.
 There’s a free course by DataArtisans
see: http://dataartisans.github.io/flink-
training/index.html
 Here are some other useful links too:
• http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink
• https://ci.apache.org/projects/flink/flink-docs-release-
0.7/programming_guide.html
• https://ci.apache.org/projects/flink/flink-docs-release-

References
19
 http://blog.brakmic.com/stream-processing-with-apache-flink/
 http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink
 https://www.mapr.com/blog/essential-guide-streaming-first-processing-
apache-flink
 https://ci.apache.org/projects/flink/flink-docs-release-
0.7/programming_guide.html
 http://dataartisans.github.io/flink-training/index.html
 https://ci.apache.org/projects/flink/flink-docs-release-

Apache flink

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Apache flink

Ähnlich wie Apache flink (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Apache flink