5. BigData success story
Map /
Reduce
OSDI 04
Map /
Reduce
OSDI 04
Hadoop1
Dryad
Euro’Sys
07
Dryad
Euro’Sys
07 TEZ
RDDs
HotCloud’10,
NSDI’12
RDDs
HotCloud’10,
NSDI’12
Spark
PACTs
SOCC’10, VLDB’12
PACTs
SOCC’10, VLDB’12 Flink
Map/Reduce extended to DAG
Backtracking recovery
Map/Reduce extended to DAG
Backtracking recovery
Small recoverable tasks
Sequencial code
Small recoverable tasks
Sequencial code
Functional
implementation of Dryad
recovery
Functional
implementation of Dryad
recovery
Cyclic Graph (and incremental construction)
Query Processing runtime embed in DAG
engine
Cyclic Graph (and incremental construction)
Query Processing runtime embed in DAG
engine
Stonebraker/
Cetintemel /
Zdonik
2005
Stonebraker/
Cetintemel /
Zdonik
2005
6. ●
Keep data moving
●
Low latency on critical path
●
Query on stream
●
High level language
●
Handle stream imperfection
●
Timeout (ex: avg of last 25 securities)
●
Out of order (must leave window open)
●
Generate predictable outcomes
●
Time ordered
Criteria for stream processing (1/2)
7. ●
Integrate stored / streaming data
●
Uniform language for both stored and streamed data
●
Combine streamed and stored data
●
Data safety / availability
●
Resistant to failure
●
Partition and scale automatically
●
Process and respond instantaneously
●
100 000 msg / s
Criteria for stream processing (2/2)
9. The stack
Data Processing engineData Processing engine
User requirementUser requirement
App and ressource managementApp and ressource management
Storage / streamStorage / stream
13. Word count
The hello world
// read test file or in Memory, and generate a set of String
DataSet<String> text = getTextDataSet(env);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1“
.groupBy(0)
.sum(1);
14. Word count
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Flatmap(tojenizer)
groupby
sum
15. Data in memory
public static final String[] WORDS = new String[] {
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,",
"And by opposing end them?--To die,--to sleep,--",
"No more; and by a sleep to say we end",
"The heartache, and the thousand natural shocks",
"That flesh is heir to,--'tis a consummation",
"Devoutly to be wish'd. To die,--to sleep;--",
….
17. With POJO
public static class Word {
// fields
private String word;
private Integer frequency;
// constructors
public Word() { }
public Word(String word, int i) {
this.word = word;
this.frequency = i; }
// getters setters
// to String
@Override
public String toString() {
return "Word="+word+" freq="+frequency;
}
18. Pojo
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
Word 1 {to,1}Word 1 {to,1}
Word 2 {be,1}Word 2 {be,1}
Word 3 {or,1}Word 3 {or,1}
Word 1 {to,1}
Word 5 {to,1}
Word 1 {to,1}
Word 5 {to,1}
Word 2 {be,2}
Word 6 {be,1}
Word 2 {be,2}
Word 6 {be,1}
Word 3 {be,1}Word 3 {be,1}
Word7 {to,2}Word7 {to,2}
Word8 {be,2}Word8 {be,2}
Word9 {or,1}Word9 {or,1}
Flatmap(tokenizer)
groupby
sum
19. JDBC
(“To be, or not to be,--that is the question:--")(“To be, or not to be,--that is the question:--")
("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer")
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Map +
Flatmap(tokenizer)
groupby
sum
hamlet
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
20. Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
21. Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
22. Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
23. Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
24. Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
25. Stream
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
26. Multiple “To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
27. Multiple
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
......
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
Groupby + sum
(to,6)(to,6)
(be,6)(be,6)
(or,3)(or,3)
......
...... ......
33. Tuples avec des types primitifs
DataSet<Tuple2<String, Integer>> wordCounts = env.fromElements(
new Tuple2<String, Integer>("hello", 1),
new Tuple2<String, Integer>("world", 2));
Pojo (constructor + get/set)
public class WordWithCount {
public String word;
public int count;
public WordCount() {}
public WordCount(String word, int count) {
this.word = word;
this.count = count;
}
}
Hadoop org.apache.hadoop.Writable interface
Data
34. //local file system
DataSet<String> localLines =
env.readTextFile("file:///path/to/my/textfile");
// read text file from a HDFS running at nnHost:nnPort
DataSet<String> hdfsLines =
env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile");
// read a CSV file with three fields
DataSet<Tuple3<Integer, String, Double>> csvInput =
env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class,
Double.class);
// create a set from some given elements
DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar");
Data sources : File based
35. // Read data from a relational database using the JDBC input format
DataSet<Tuple2<String, Integer> dbData =
env.createInput( // create and configure input format
JDBCInputFormat.buildJDBCInputFormat()
.setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
.setDBUrl("jdbc:derby:memory:persons")
.setQuery("select name, age from persons")
.finish(),
// specify type information for DataSet
new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO,
INT_TYPE_INFO) );
Data sources
36. // text
data DataSet<String> textData = // [...]
// write DataSet to a file on the local file system
textData.writeAsText("file:///my/result/on/localFS");
// write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort
textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS");
// write DataSet to a file and overwrite the file if it exists
textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE);
// tuples as lines with pipe as the separator "a|b|c"
DataSet<Tuple3<String, Integer, Double>> values = // [...]
values.writeAsCsv("file:///path/to/the/result/file", "n", "|");
Data Sinks
37. Variable and storage
DataSet<Tuple...> large = env.readCsv(...);
DataSet<Tuple...> medium = env.readCsv(...);
DataSet<Tuple...> small = env.readCsv(...);
DataSet<Tuple...> LargeAndMedium = large.join(medium)
.where(3).equals(1)
.with(new JoinFunction() { ... });
DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1)
.where(0).equals(2)
.with(new JoinFunction() { ... });
DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2);
DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2);
DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);
48. We have resources, let’s optimize it !
CodeCode
Flink
Job
Mana
ger
Job
Mana
ger
Execution
Plan
Execution
Plan
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
49. Distributed Runtime
49
Master (Job Manager) handles
job submission, scheduling, and
metadata
Workers (Task Managers)
execute operations
Data can be streamed between
nodes
All operators start
in-memory and gradually
go out-of-core
50. How the magic happen
- Flink Runtime
- Flink Optimizer
50
51. The optimizer is the
component that selects
an execution plan for a
Common API program
Think of an AI system
manipulating your
program for you
But don’t be scared – it
works
• Relational databases
have been doing this
for decades – Flink
ports the technology to
API-based systems
Flink Optimizer
51
53. Forwarded fields
@ForwardedFields("f0->f2")
public class MyMap implements MapFunction<Tuple2<…>, Tuple3<…>> {
@Override public Tuple3<…> map(Tuple2<…> val) {
return new Tuple3<…>("foo", val.f1 / 2, val.f0);} }
Some fancy stuff to help him
54. Partitioning
Partitioning controls how individual data points of a stream are
distributed/ordering among the parallel instances of the transformation operators.
There are several partitioning types supported in Flink Streaming:
Ex :
Forward(default): Forward partitioning directs the output data to the next operator
on the same machine (if possible) avoiding expensive network I/O
Shuffle: Shuffle partitioning randomly partitions the output data stream to the next
operator using uniform distribution.
Rebalance: Rebalance partitioning directs the output data stream to the next
operator in a round-robin fashion
Broadcast: Broadcast partitioning sends the output data stream to all parallel
instances of the next operator. Usage: dataStream.broadcast()
Some fancy stuff to help him
56. ●
-Plus d'info soon
●
Demo sur 100.000 produits/3 ans de prix => ~ 20 minutes
●
Sur un “petit cluster” de 3 noeuds : 4 procs, 8gb de ram virtualisé
Performance
60. The growing Flink stack
60
Flink Optimizer Flink Stream Builder
Common API
Scala API Java API
Python
API
(upcoming)
Graph API
Apache
MRQL
Flink Local Runtime
Embedded
environment
(Java collections)
Local
Environment
(for debugging)
Remote environment
(Regular cluster execution)
Apache Tez
Data
storage
HDFSFiles S3 JDBC Redis
Rabbit
MQ
Kafka
Azure
tables
…
Single node execution Standalone or YARN cluster
62. Flink Roadmap
Currently being discussed by the Flink community
Flink has a major release every 3 months, and one or more bug-fixing
releases between major releases
Caveat: rough roadmap, depends on volunteer work, outcome of
community discussion, and Apache open source processes
62
63. Roadmap for 2015 (highlights)
Q1 Q2 Q3
APIs Logical
Query
integration
Additional
operators
Interactive
programs
Interactive
Scala shell
SQL-on-
Flink
Optimizer Semantic
annotations
HCatalog
integration
Optimizer
hints
Runtime Dual engine
(blocking &
pipelining)
Fine-grained
fault
tolerance
Dynamic
memory
allocation
Streaming Better
memory
manageme
nt
More
operators in
API
At-least-
once
processing
guarantees
Unify batch
and
streaming
Exactly-
once
processing
guarantees
ML library First version Additional
algorithms
Mahout
integration
Graph
library
First version
Integratio
n
Tez, Samoa Mahout
63
64. Integration with other projects
Machine Learning
– Samoa (incubating):
distributed streaming
machine learning (ML)
framework
Apache Tez (run complex directed-
acyclic-graph of tasks for
processing data ) (simplify Pig,
Hive task definition)
Storage
– Tachyon(Tachyon is a
memory-centric distributed
storage system)
Mahout (Data analytics)
– H2O (distributed scalable
machine learning system)
Apache Hive (High level
langage for data processing)
●
Expected Q3/Q4 2015
Apache Zepelin (inc.) A web-
based notebook that enables
interactive data analytics.
64
65. And many more…
Runtime: even better performance and robustness
Using off-heap memory, dynamic memory allocation
Improvements to the Flink optimizer
Integration with HCatalog, better statistics
Runtime optimization
Streaming graph and ML pipeline libraries
65
67. Flink is optimized for cyclic or iterative processes by using iterative
transformations on collections.
Flink streaming processes data streams as true streams, i.e., data
elements are immediately "pipelined" though a streaming program as
soon as they arrive. This allows to perform flexible window operations
on streams.
Built-in optimizer
Flink in one slide