SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
Big Data series :
Apache Flink
Jérôme Blachon
Laurent Tardif
Stéphane Thiers
Juin 2015 : Jug Grenoble
Septembre 2015 : Jug Lausanne
Qui sommes nous
Jérôme Blachon Laurent Tardif Stéphane Thiers
Un peu d’histoire
La stack
Flink
Demo
Comment ca marche
Les plus
Roadmap
La soirée
Histoire
BigData success story
Map /
Reduce
OSDI 04
Map /
Reduce
OSDI 04
Hadoop1
Dryad
Euro’Sys
07
Dryad
Euro’Sys
07 TEZ
RDDs
HotCloud’10,
NSDI’12
RDDs
HotCloud’10,
NSDI’12
Spark
PACTs
SOCC’10, VLDB’12
PACTs
SOCC’10, VLDB’12 Flink
Map/Reduce extended to DAG
Backtracking recovery
Map/Reduce extended to DAG
Backtracking recovery
Small recoverable tasks
Sequencial code
Small recoverable tasks
Sequencial code
Functional
implementation of Dryad
recovery
Functional
implementation of Dryad
recovery
Cyclic Graph (and incremental construction)
Query Processing runtime embed in DAG
engine
Cyclic Graph (and incremental construction)
Query Processing runtime embed in DAG
engine
Stonebraker/
Cetintemel /
Zdonik
2005
Stonebraker/
Cetintemel /
Zdonik
2005
●
Keep data moving
●
Low latency on critical path
●
Query on stream
●
High level language
●
Handle stream imperfection
●
Timeout (ex: avg of last 25 securities)
●
Out of order (must leave window open)
●
Generate predictable outcomes
●
Time ordered
Criteria for stream processing (1/2)
●
Integrate stored / streaming data
●
Uniform language for both stored and streamed data
●
Combine streamed and stored data
●
Data safety / availability
●
Resistant to failure
●
Partition and scale automatically
●
Process and respond instantaneously
●
100 000 msg / s
Criteria for stream processing (2/2)
Big data stack
The stack
Data Processing engineData Processing engine
User requirementUser requirement
App and ressource managementApp and ressource management
Storage / streamStorage / stream
Eco system
Applications
Data processing
engines
App and resource
management
Storage/Stream
Une autre vue
http://practicalanalytics.wordpress.com
Demo
Word count
The hello world
// read test file or in Memory, and generate a set of String
DataSet<String> text = getTextDataSet(env);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1“
.groupBy(0)
.sum(1);
Word count
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Flatmap(tojenizer)
groupby
sum
Data in memory
public static final String[] WORDS = new String[] {
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,",
"And by opposing end them?--To die,--to sleep,--",
"No more; and by a sleep to say we end",
"The heartache, and the thousand natural shocks",
"That flesh is heir to,--'tis a consummation",
"Devoutly to be wish'd. To die,--to sleep;--",
….
File
private static DataSet<String> getTextDataSet(ExecutionEnvironment env) {
return env.readTextFile(textPath);
}
With POJO
public static class Word {
// fields
private String word;
private Integer frequency;
// constructors
public Word() { }
public Word(String word, int i) {
this.word = word;
this.frequency = i; }
// getters setters
// to String
@Override
public String toString() {
return "Word="+word+" freq="+frequency;
}
Pojo
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
Word 1 {to,1}Word 1 {to,1}
Word 2 {be,1}Word 2 {be,1}
Word 3 {or,1}Word 3 {or,1}
Word 1 {to,1}
Word 5 {to,1}
Word 1 {to,1}
Word 5 {to,1}
Word 2 {be,2}
Word 6 {be,1}
Word 2 {be,2}
Word 6 {be,1}
Word 3 {be,1}Word 3 {be,1}
Word7 {to,2}Word7 {to,2}
Word8 {be,2}Word8 {be,2}
Word9 {or,1}Word9 {or,1}
Flatmap(tokenizer)
groupby
sum
JDBC
(“To be, or not to be,--that is the question:--")(“To be, or not to be,--that is the question:--")
("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer")
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Map +
Flatmap(tokenizer)
groupby
sum
hamlet
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
Multiple “To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
Multiple
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
......
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
Groupby + sum
(to,6)(to,6)
(be,6)(be,6)
(or,3)(or,3)
......
...... ......
Demo
Produit1 , 14 , 1/6/2015Produit1 , 14 , 1/6/2015
Produit2 , 13.5 , 1/6/2015Produit2 , 13.5 , 1/6/2015
Produit3 , 24 , 1/6/2015Produit3 , 24 , 1/6/2015
Produit1 , 14 ,
30/5/2015
Produit1 , 14 ,
30/5/2015Produit2 , 13 ,
30/5/2015
Produit2 , 13 ,
30/5/2015Produit3 , 24 ,
30/5/2015
Produit3 , 24 ,
30/5/2015Produit4 , 124 ,
30/5/2015
Produit4 , 124 ,
30/5/2015
Produit1
Prix moyen (sur 7j) : 14
Prix moyen (sur 30j) : 14
Prix moyen (sur 365j) : 13.5
Produit1
Prix moyen (sur 7j) : 14
Prix moyen (sur 30j) : 14
Prix moyen (sur 365j) : 13.5
Produit1 :
14 , 1/6/2015
14 , 30/5/2015
13 , 29/5/2015
Produit1 :
14 , 1/6/2015
14 , 30/5/2015
13 , 29/5/2015
Produit2 :
13.5 , 1/6/2015
13 , 30/5/2015
13 , 29/5/2015
Produit2 :
13.5 , 1/6/2015
13 , 30/5/2015
13 , 29/5/2015
Demo 2 : twitter
twit, Flink is…, 1/6/2015twit, Flink is…, 1/6/2015
twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015
twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
Cloud TagCloud Tag
@writer1:
#Flink, 1/6/2015
#Flink, 30/5/2015
#Flink, 29/5/2015
@writer1:
#Flink, 1/6/2015
#Flink, 30/5/2015
#Flink, 29/5/2015
@writer3:
#Flink, 1/6/2015
#Flink, 30/5/2015
#Flink, 29/5/2015
@writer3:
#Flink, 1/6/2015
#Flink, 30/5/2015
#Flink, 29/5/2015
JiraJira
stackoverflowstackoverflow
Demo 3 : scala shell
… Word count demo from flink scalashell ...
Demo 4 : ML demo
Classifier (SVM) from MLLib
– Scala only
Learn + Predict
Some basics (covered by demo)
type, streaming, loop,….
Tuples avec des types primitifs
DataSet<Tuple2<String, Integer>> wordCounts = env.fromElements(
new Tuple2<String, Integer>("hello", 1),
new Tuple2<String, Integer>("world", 2));
Pojo (constructor + get/set)
public class WordWithCount {
public String word;
public int count;
public WordCount() {}
public WordCount(String word, int count) {
this.word = word;
this.count = count;
}
}
Hadoop org.apache.hadoop.Writable interface
Data
//local file system
DataSet<String> localLines =
env.readTextFile("file:///path/to/my/textfile");
// read text file from a HDFS running at nnHost:nnPort
DataSet<String> hdfsLines =
env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile");
// read a CSV file with three fields
DataSet<Tuple3<Integer, String, Double>> csvInput =
env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class,
Double.class);
// create a set from some given elements
DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar");
Data sources : File based
// Read data from a relational database using the JDBC input format
DataSet<Tuple2<String, Integer> dbData =
env.createInput( // create and configure input format
JDBCInputFormat.buildJDBCInputFormat()
.setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
.setDBUrl("jdbc:derby:memory:persons")
.setQuery("select name, age from persons")
.finish(),
// specify type information for DataSet
new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO,
INT_TYPE_INFO) );
Data sources
// text
data DataSet<String> textData = // [...]
// write DataSet to a file on the local file system
textData.writeAsText("file:///my/result/on/localFS");
// write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort
textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS");
// write DataSet to a file and overwrite the file if it exists
textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE);
// tuples as lines with pipe as the separator "a|b|c"
DataSet<Tuple3<String, Integer, Double>> values = // [...]
values.writeAsCsv("file:///path/to/the/result/file", "n", "|");
Data Sinks
Variable and storage
DataSet<Tuple...> large = env.readCsv(...);
DataSet<Tuple...> medium = env.readCsv(...);
DataSet<Tuple...> small = env.readCsv(...);
DataSet<Tuple...> LargeAndMedium = large.join(medium)
.where(3).equals(1)
.with(new JoinFunction() { ... });
DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1)
.where(0).equals(2)
.with(new JoinFunction() { ... });
DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2);
DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2);
DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);
Map
Filter
Reduce
Join
Cross
Union
First-n
….
Lazy Evaluation
Operators
Datastream
continuous, parallel, immutable stream of data
Socket stream (twitter, …)
Message Queue connector (RabbitMQ)
FileStream
Streaming
Iterative
 Algorithms that need iterations
 Clustering (K-Means, Canopy, …)
 Gradient descent (e.g., Logistic Regression, Matrix Factorization)
 Graph Algorithms (e.g., PageRank, Line-Rank, components, paths,
reachability, centrality, )
 Graph communities / dense sub-components
 Inference (believe propagation)
 …
Loop makes multiple passes over the data
40
Windowing
(to,2)(to,2) (be,2)(be,2)……
.window(Count.of(4)).every(Count.of(2))
41
Count
Time
….
Count
Time
….
Count
Time
….
Count
Time
….
Windowing
(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2)……
.window(Count.of(4)).every(Count.of(2))
42
Count
Time
….
Count
Time
….
Count
Time
….
Count
Time
….
Windowing
(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2) (my,2)(my,2) (king,1)(king,1)……
.window(Count.of(4)).every(Count.of(2))
43
Count
Time
….
Count
Time
….
Count
Time
….
Count
Time
….
Go inside Flink
© 2015 Persistent Systems Ltd
45
Comment ca marche : idée naïve
CodeCode
Flink
Job
Mana
ger
Job
Mana
ger
Execution
Plan
Execution
Plan
DataData
ResultsResults
Execution plan
We have resources, let’s optimize it !
CodeCode
Flink
Job
Mana
ger
Job
Mana
ger
Execution
Plan
Execution
Plan
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
Distributed Runtime
49
Master (Job Manager) handles
job submission, scheduling, and
metadata
Workers (Task Managers)
execute operations
Data can be streamed between
nodes
All operators start
in-memory and gradually
go out-of-core
How the magic happen
- Flink Runtime
- Flink Optimizer
50
 The optimizer is the
component that selects
an execution plan for a
Common API program
 Think of an AI system
manipulating your
program for you 
 But don’t be scared – it
works
• Relational databases
have been doing this
for decades – Flink
ports the technology to
API-based systems
Flink Optimizer
51
Program lifecycle
52
valsource1 = …
valsource2 = …
valm axed = source1
.m ap(v = > (v._1,v._2,
m ath.m ax(v._1,v._2))
valfiltered = source2
.filter(v = > (v._1 > 4))
valresult= m axed
.join(filtered).w here(0).equalTo(0)
.filter(_1 > 3)
.groupB y(0)
.reduceG roup {… … }
1
3
4
5
2
Forwarded fields
@ForwardedFields("f0->f2")
public class MyMap implements MapFunction<Tuple2<…>, Tuple3<…>> {
@Override public Tuple3<…> map(Tuple2<…> val) {
return new Tuple3<…>("foo", val.f1 / 2, val.f0);} }
Some fancy stuff to help him
Partitioning
Partitioning controls how individual data points of a stream are
distributed/ordering among the parallel instances of the transformation operators.
There are several partitioning types supported in Flink Streaming:
Ex :
Forward(default): Forward partitioning directs the output data to the next operator
on the same machine (if possible) avoiding expensive network I/O
Shuffle: Shuffle partitioning randomly partitions the output data stream to the next
operator using uniform distribution.
Rebalance: Rebalance partitioning directs the output data stream to the next
operator in a round-robin fashion
Broadcast: Broadcast partitioning sends the output data stream to all parallel
instances of the next operator. Usage: dataStream.broadcast()
Some fancy stuff to help him
Performance
●
-Plus d'info soon
●
Demo sur 100.000 produits/3 ans de prix => ~ 20 minutes
●
Sur un “petit cluster” de 3 noeuds : 4 procs, 8gb de ram virtualisé
Performance
Limites
API still moving
Diagnosic is hard …. Flink, hadoop, network, OS , jvm …
Heap usage (too ?) important
Limitation
API & Big Data eco system
The growing Flink stack
60
Flink Optimizer Flink Stream Builder
Common API
Scala API Java API
Python
API
(upcoming)
Graph API
Apache
MRQL
Flink Local Runtime
Embedded
environment
(Java collections)
Local
Environment
(for debugging)
Remote environment
(Regular cluster execution)
Apache Tez
Data
storage
HDFSFiles S3 JDBC Redis
Rabbit
MQ
Kafka
Azure
tables
…
Single node execution Standalone or YARN cluster
Roadmap
61
Flink Roadmap
Currently being discussed by the Flink community
Flink has a major release every 3 months, and one or more bug-fixing
releases between major releases
Caveat: rough roadmap, depends on volunteer work, outcome of
community discussion, and Apache open source processes
62
Roadmap for 2015 (highlights)
Q1 Q2 Q3
APIs Logical
Query
integration
Additional
operators
Interactive
programs
Interactive
Scala shell
SQL-on-
Flink
Optimizer Semantic
annotations
HCatalog
integration
Optimizer
hints
Runtime Dual engine
(blocking &
pipelining)
Fine-grained
fault
tolerance
Dynamic
memory
allocation
Streaming Better
memory
manageme
nt
More
operators in
API
At-least-
once
processing
guarantees
Unify batch
and
streaming
Exactly-
once
processing
guarantees
ML library First version Additional
algorithms
Mahout
integration
Graph
library
First version
Integratio
n
Tez, Samoa Mahout
63
Integration with other projects
Machine Learning
– Samoa (incubating):
distributed streaming
machine learning (ML)
framework
Apache Tez (run complex directed-
acyclic-graph of tasks for
processing data ) (simplify Pig,
Hive task definition)
Storage
– Tachyon(Tachyon is a
memory-centric distributed
storage system)
Mahout (Data analytics)
– H2O (distributed scalable
machine learning system)
Apache Hive (High level
langage for data processing)
●
Expected Q3/Q4 2015
Apache Zepelin (inc.) A web-
based notebook that enables
interactive data analytics.
64
And many more…
Runtime: even better performance and robustness
Using off-heap memory, dynamic memory allocation
Improvements to the Flink optimizer
Integration with HCatalog, better statistics
Runtime optimization
Streaming graph and ML pipeline libraries
65
Sumary and conclusion
Flink is optimized for cyclic or iterative processes by using iterative
transformations on collections.
Flink streaming processes data streams as true streams, i.e., data
elements are immediately "pipelined" though a streaming program as
soon as they arrive. This allows to perform flexible window operations
on streams.
Built-in optimizer
Flink in one slide
flink.apache.org
http://flink-forward.org/ : 15 oct : Berlin

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 

Kürzlich hochgeladen (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Flink4 jug

  • 1. Big Data series : Apache Flink Jérôme Blachon Laurent Tardif Stéphane Thiers Juin 2015 : Jug Grenoble Septembre 2015 : Jug Lausanne
  • 2. Qui sommes nous Jérôme Blachon Laurent Tardif Stéphane Thiers
  • 3. Un peu d’histoire La stack Flink Demo Comment ca marche Les plus Roadmap La soirée
  • 5. BigData success story Map / Reduce OSDI 04 Map / Reduce OSDI 04 Hadoop1 Dryad Euro’Sys 07 Dryad Euro’Sys 07 TEZ RDDs HotCloud’10, NSDI’12 RDDs HotCloud’10, NSDI’12 Spark PACTs SOCC’10, VLDB’12 PACTs SOCC’10, VLDB’12 Flink Map/Reduce extended to DAG Backtracking recovery Map/Reduce extended to DAG Backtracking recovery Small recoverable tasks Sequencial code Small recoverable tasks Sequencial code Functional implementation of Dryad recovery Functional implementation of Dryad recovery Cyclic Graph (and incremental construction) Query Processing runtime embed in DAG engine Cyclic Graph (and incremental construction) Query Processing runtime embed in DAG engine Stonebraker/ Cetintemel / Zdonik 2005 Stonebraker/ Cetintemel / Zdonik 2005
  • 6. ● Keep data moving ● Low latency on critical path ● Query on stream ● High level language ● Handle stream imperfection ● Timeout (ex: avg of last 25 securities) ● Out of order (must leave window open) ● Generate predictable outcomes ● Time ordered Criteria for stream processing (1/2)
  • 7. ● Integrate stored / streaming data ● Uniform language for both stored and streamed data ● Combine streamed and stored data ● Data safety / availability ● Resistant to failure ● Partition and scale automatically ● Process and respond instantaneously ● 100 000 msg / s Criteria for stream processing (2/2)
  • 9. The stack Data Processing engineData Processing engine User requirementUser requirement App and ressource managementApp and ressource management Storage / streamStorage / stream
  • 10. Eco system Applications Data processing engines App and resource management Storage/Stream
  • 12. Demo
  • 13. Word count The hello world // read test file or in Memory, and generate a set of String DataSet<String> text = getTextDataSet(env); DataSet<Tuple2<String, Integer>> counts = // split up the lines in pairs (2-tuples) containing: (word,1) text.flatMap(new Tokenizer()) // group by the tuple field "0" and sum up tuple field "1“ .groupBy(0) .sum(1);
  • 14. Word count “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) Flatmap(tojenizer) groupby sum
  • 15. Data in memory public static final String[] WORDS = new String[] { "To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", "Or to take arms against a sea of troubles,", "And by opposing end them?--To die,--to sleep,--", "No more; and by a sleep to say we end", "The heartache, and the thousand natural shocks", "That flesh is heir to,--'tis a consummation", "Devoutly to be wish'd. To die,--to sleep;--", ….
  • 16. File private static DataSet<String> getTextDataSet(ExecutionEnvironment env) { return env.readTextFile(textPath); }
  • 17. With POJO public static class Word { // fields private String word; private Integer frequency; // constructors public Word() { } public Word(String word, int i) { this.word = word; this.frequency = i; } // getters setters // to String @Override public String toString() { return "Word="+word+" freq="+frequency; }
  • 18. Pojo “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", Word 1 {to,1}Word 1 {to,1} Word 2 {be,1}Word 2 {be,1} Word 3 {or,1}Word 3 {or,1} Word 1 {to,1} Word 5 {to,1} Word 1 {to,1} Word 5 {to,1} Word 2 {be,2} Word 6 {be,1} Word 2 {be,2} Word 6 {be,1} Word 3 {be,1}Word 3 {be,1} Word7 {to,2}Word7 {to,2} Word8 {be,2}Word8 {be,2} Word9 {or,1}Word9 {or,1} Flatmap(tokenizer) groupby sum
  • 19. JDBC (“To be, or not to be,--that is the question:--")(“To be, or not to be,--that is the question:--") ("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer") (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) Map + Flatmap(tokenizer) groupby sum hamlet “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer",
  • 20. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum (or,1)(or,1)
  • 21. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1)
  • 22. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1)
  • 23. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1) (or,1)(or,1)
  • 24. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1) (or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1) (or,1)(or,1)
  • 25. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1) (or,1) (to,2)(to,2) (be,2)(be,2) (or,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1) (or,1)(or,1)
  • 26. Multiple “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum (or,1)(or,1) “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
  • 27. Multiple “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) ...... (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) Groupby + sum (to,6)(to,6) (be,6)(be,6) (or,3)(or,3) ...... ...... ......
  • 28. Demo Produit1 , 14 , 1/6/2015Produit1 , 14 , 1/6/2015 Produit2 , 13.5 , 1/6/2015Produit2 , 13.5 , 1/6/2015 Produit3 , 24 , 1/6/2015Produit3 , 24 , 1/6/2015 Produit1 , 14 , 30/5/2015 Produit1 , 14 , 30/5/2015Produit2 , 13 , 30/5/2015 Produit2 , 13 , 30/5/2015Produit3 , 24 , 30/5/2015 Produit3 , 24 , 30/5/2015Produit4 , 124 , 30/5/2015 Produit4 , 124 , 30/5/2015 Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5 Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5 Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015 Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015 Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015 Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015
  • 29. Demo 2 : twitter twit, Flink is…, 1/6/2015twit, Flink is…, 1/6/2015 twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015 twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 Cloud TagCloud Tag @writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 @writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 @writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 @writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 JiraJira stackoverflowstackoverflow
  • 30. Demo 3 : scala shell … Word count demo from flink scalashell ...
  • 31. Demo 4 : ML demo Classifier (SVM) from MLLib – Scala only Learn + Predict
  • 32. Some basics (covered by demo) type, streaming, loop,….
  • 33. Tuples avec des types primitifs DataSet<Tuple2<String, Integer>> wordCounts = env.fromElements( new Tuple2<String, Integer>("hello", 1), new Tuple2<String, Integer>("world", 2)); Pojo (constructor + get/set) public class WordWithCount { public String word; public int count; public WordCount() {} public WordCount(String word, int count) { this.word = word; this.count = count; } } Hadoop org.apache.hadoop.Writable interface Data
  • 34. //local file system DataSet<String> localLines = env.readTextFile("file:///path/to/my/textfile"); // read text file from a HDFS running at nnHost:nnPort DataSet<String> hdfsLines = env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile"); // read a CSV file with three fields DataSet<Tuple3<Integer, String, Double>> csvInput = env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class, Double.class); // create a set from some given elements DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar"); Data sources : File based
  • 35. // Read data from a relational database using the JDBC input format DataSet<Tuple2<String, Integer> dbData = env.createInput( // create and configure input format JDBCInputFormat.buildJDBCInputFormat() .setDrivername("org.apache.derby.jdbc.EmbeddedDriver") .setDBUrl("jdbc:derby:memory:persons") .setQuery("select name, age from persons") .finish(), // specify type information for DataSet new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO) ); Data sources
  • 36. // text data DataSet<String> textData = // [...] // write DataSet to a file on the local file system textData.writeAsText("file:///my/result/on/localFS"); // write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS"); // write DataSet to a file and overwrite the file if it exists textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE); // tuples as lines with pipe as the separator "a|b|c" DataSet<Tuple3<String, Integer, Double>> values = // [...] values.writeAsCsv("file:///path/to/the/result/file", "n", "|"); Data Sinks
  • 37. Variable and storage DataSet<Tuple...> large = env.readCsv(...); DataSet<Tuple...> medium = env.readCsv(...); DataSet<Tuple...> small = env.readCsv(...); DataSet<Tuple...> LargeAndMedium = large.join(medium) .where(3).equals(1) .with(new JoinFunction() { ... }); DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1) .where(0).equals(2) .with(new JoinFunction() { ... }); DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);
  • 39. Datastream continuous, parallel, immutable stream of data Socket stream (twitter, …) Message Queue connector (RabbitMQ) FileStream Streaming
  • 40. Iterative  Algorithms that need iterations  Clustering (K-Means, Canopy, …)  Gradient descent (e.g., Logistic Regression, Matrix Factorization)  Graph Algorithms (e.g., PageRank, Line-Rank, components, paths, reachability, centrality, )  Graph communities / dense sub-components  Inference (believe propagation)  … Loop makes multiple passes over the data 40
  • 42. Windowing (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2)…… .window(Count.of(4)).every(Count.of(2)) 42 Count Time …. Count Time …. Count Time …. Count Time ….
  • 43. Windowing (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2) (my,2)(my,2) (king,1)(king,1)…… .window(Count.of(4)).every(Count.of(2)) 43 Count Time …. Count Time …. Count Time …. Count Time ….
  • 45. © 2015 Persistent Systems Ltd 45
  • 46. Comment ca marche : idée naïve CodeCode Flink Job Mana ger Job Mana ger Execution Plan Execution Plan DataData ResultsResults
  • 48. We have resources, let’s optimize it ! CodeCode Flink Job Mana ger Job Mana ger Execution Plan Execution Plan DataData ResultResult DataData ResultResult DataData ResultResult DataData ResultResult
  • 49. Distributed Runtime 49 Master (Job Manager) handles job submission, scheduling, and metadata Workers (Task Managers) execute operations Data can be streamed between nodes All operators start in-memory and gradually go out-of-core
  • 50. How the magic happen - Flink Runtime - Flink Optimizer 50
  • 51.  The optimizer is the component that selects an execution plan for a Common API program  Think of an AI system manipulating your program for you   But don’t be scared – it works • Relational databases have been doing this for decades – Flink ports the technology to API-based systems Flink Optimizer 51
  • 52. Program lifecycle 52 valsource1 = … valsource2 = … valm axed = source1 .m ap(v = > (v._1,v._2, m ath.m ax(v._1,v._2)) valfiltered = source2 .filter(v = > (v._1 > 4)) valresult= m axed .join(filtered).w here(0).equalTo(0) .filter(_1 > 3) .groupB y(0) .reduceG roup {… … } 1 3 4 5 2
  • 53. Forwarded fields @ForwardedFields("f0->f2") public class MyMap implements MapFunction<Tuple2<…>, Tuple3<…>> { @Override public Tuple3<…> map(Tuple2<…> val) { return new Tuple3<…>("foo", val.f1 / 2, val.f0);} } Some fancy stuff to help him
  • 54. Partitioning Partitioning controls how individual data points of a stream are distributed/ordering among the parallel instances of the transformation operators. There are several partitioning types supported in Flink Streaming: Ex : Forward(default): Forward partitioning directs the output data to the next operator on the same machine (if possible) avoiding expensive network I/O Shuffle: Shuffle partitioning randomly partitions the output data stream to the next operator using uniform distribution. Rebalance: Rebalance partitioning directs the output data stream to the next operator in a round-robin fashion Broadcast: Broadcast partitioning sends the output data stream to all parallel instances of the next operator. Usage: dataStream.broadcast() Some fancy stuff to help him
  • 56. ● -Plus d'info soon ● Demo sur 100.000 produits/3 ans de prix => ~ 20 minutes ● Sur un “petit cluster” de 3 noeuds : 4 procs, 8gb de ram virtualisé Performance
  • 58. API still moving Diagnosic is hard …. Flink, hadoop, network, OS , jvm … Heap usage (too ?) important Limitation
  • 59. API & Big Data eco system
  • 60. The growing Flink stack 60 Flink Optimizer Flink Stream Builder Common API Scala API Java API Python API (upcoming) Graph API Apache MRQL Flink Local Runtime Embedded environment (Java collections) Local Environment (for debugging) Remote environment (Regular cluster execution) Apache Tez Data storage HDFSFiles S3 JDBC Redis Rabbit MQ Kafka Azure tables … Single node execution Standalone or YARN cluster
  • 62. Flink Roadmap Currently being discussed by the Flink community Flink has a major release every 3 months, and one or more bug-fixing releases between major releases Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes 62
  • 63. Roadmap for 2015 (highlights) Q1 Q2 Q3 APIs Logical Query integration Additional operators Interactive programs Interactive Scala shell SQL-on- Flink Optimizer Semantic annotations HCatalog integration Optimizer hints Runtime Dual engine (blocking & pipelining) Fine-grained fault tolerance Dynamic memory allocation Streaming Better memory manageme nt More operators in API At-least- once processing guarantees Unify batch and streaming Exactly- once processing guarantees ML library First version Additional algorithms Mahout integration Graph library First version Integratio n Tez, Samoa Mahout 63
  • 64. Integration with other projects Machine Learning – Samoa (incubating): distributed streaming machine learning (ML) framework Apache Tez (run complex directed- acyclic-graph of tasks for processing data ) (simplify Pig, Hive task definition) Storage – Tachyon(Tachyon is a memory-centric distributed storage system) Mahout (Data analytics) – H2O (distributed scalable machine learning system) Apache Hive (High level langage for data processing) ● Expected Q3/Q4 2015 Apache Zepelin (inc.) A web- based notebook that enables interactive data analytics. 64
  • 65. And many more… Runtime: even better performance and robustness Using off-heap memory, dynamic memory allocation Improvements to the Flink optimizer Integration with HCatalog, better statistics Runtime optimization Streaming graph and ML pipeline libraries 65
  • 67. Flink is optimized for cyclic or iterative processes by using iterative transformations on collections. Flink streaming processes data streams as true streams, i.e., data elements are immediately "pipelined" though a streaming program as soon as they arrive. This allows to perform flexible window operations on streams. Built-in optimizer Flink in one slide