SlideShare a Scribd company logo
1 of 89
Download to read offline
Shooting the Rapids:
Getting the Best from Java 8
Streams
Kirk Pepperdine @kcpeppe
Maurice Naftalin @mauricenaftalin
Devoxx Belgium, Nov. 2015
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning workshop
• Co-founder
• performance diagnostic tooling
• Java Champion (since 2006)
About Kirk
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning workshop
• Co-founder
• performance diagnostic tooling
• Java Champion (since 2006)
About Kirk
About Maurice
About Maurice
About Maurice
Co-author Author
About Maurice
Co-author Author
Java
Champion
JavaOne
Rock Star
Subjects Covered in this Talk
• Background – lambdas and streams
• Performance of our example
• Effect of parallelizing
• Splitting input data efficiently
• When to go parallel
• Parallel streams in the real world
Benchmark Alert
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
What is a Lambda?
matcher
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcher
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcher
Predicate<Matcher> matches =
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
matcher.find()
->
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches = matcher.find()->
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
A lambda is a function
from arguments to result
matcher.find()->
matcher
matcher.find()
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
DoubleSummaryStatistics
{count=3, sum=2.181635, min=0.080123, average=0.727212, max=1.101357}
Old School Code
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Old School Code
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Let’s look at the features in this code
Data Source
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Map to Matcher
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Filter
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Map to Double
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Collect Results (Reduce)
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Java 8 Streams
• A sequence of values,“in motion”
• source and intermediate operations set the stream up lazily
• a terminal operation “pulls” values eagerly down the stream
collection.stream()
.intermediateOp
⋮
.intermediateOp
.terminalOp
Stream Sources
• New method Collection.stream()
• Many other sources:
• Arrays.stream(Object[])
• Streams.of(Object...)
• Stream.iterate(Object,UnaryOperator)
• Files.lines()
• BufferedReader.lines()
• Random.ints()
• JarFile.stream()
• …
Imperative to Stream
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Stream Source
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Intermediate Operations
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Method References
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Terminal Operation
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Visualising Sequential Streams
x2x0 x1 x3x0 x1 x2 x3
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3x1 x2 x3 ✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3 x1x2 x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3 x1x2x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Old School: 13.3 secs
Sequential: 13.8 secs
- Should be the same workload
- Stream code is cleaner, easier to read
How Does It Perform?
24M line file, MacBook Pro, Haswell i7, 4 cores, hyperthreaded, Java 9.0
Can We Do Better?
• We might be able to if the workload is parallelizable
• split stream into many segments
• process each segment
• combine results
• Requirements exactly match Fork/Join workflow
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x1
x3
x0
x1
x3
✔
❌
x2
Visualizing Parallel Streams
x1 y3
x0
x1
x3
✔
❌
Splitting Stream Sources
• Stream source is a Spliterator
• can both iterate over data and – where possible – split it
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Parallel Streams
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())
.parallel()

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
About Fork/Join
• Introduced in Java 7
• draws from a common pool of ForkJoinWorkerThread
• default pool size == HW cores – 1
• assumes workload will be CPU bound
• On its own, not an easy coding idiom
• parallel streams provide an abstraction layer
• Spliterator defines how to split stream
• framework code submits sub-tasks to the common Fork/Join pool
Old School: 13.3 secs
Sequential: 13.8 secs
Parallel: 9.5 secs
- 1.45x faster
- but not 8x faster (????)
How Does That Perform?
24M lines, 2.8GHz 8-core i7, 16GB, OS X, Java 9.0
In Fact!!!!
• Different benchmarks yield a mixed bag of results
• some were better
• some were the same
• some were worse!
Open Questions
• Under what conditions are things better
• or worse
• When should we parallelize
• and when is serial better
Open Questions
• Under what conditions are things better
• or worse
• When should we parallelize
• and when is serial better
Answer depends upon where the bottleneck is
Where is Our Bottleneck?
• I/O operations
• not a surprise, we’re reading from a file
• Java 9 uses FileChannelLineSpliterator
• 2x better than Java 8’s implementation
76.0% 0 + 5941 sun.nio.ch.FileDispatcherImpl.pread0
Poorly Splitting Sources
• Some sources split worse than others
• LinkedList vs ArrayList
• Streaming I/O is problematic
• more threads == more pressure on contended resource
• thrashing and other ill effects
• Workload size doesn’t cover the overheads
Streaming I/O Bottleneck
x2x0 x1 x3x0 x1 x2 x3
Streaming I/O Bottleneck
✔
❌
x2x1x0 x1 x3
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliterator coverage
MappedByteBuffer mid
Included in JDK9 as FileChannelLinesSpliterator
In-memory Comparison
• Read GC log into an ArrayList prior to processing
Old School: 9.4 secs
Sequential: 9.9 secs
Parallel: 2.7 secs
- 4.25x faster
- better but still not 8x faster
In-memory Comparison
24M lines, 2.8GHz 8 core i7, 16GB, OS X, JDK 9.0
Justifying the Overhead
CPNQ performance model:
C - number of submitters
P - number of CPUs
N - number of elements
Q - cost of the operation
cost of intermediate operations is N * Q
overhead of setting up F/J framework is ~100µs
Amortizing Setup Costs
• N*Q needs to be large
• Q can often only be estimated
• N may only be known at run time
• Rule of thumb, N > 10,000
• P is the number of processors
• P == number for cores for CPU bound
• P < number of cores otherwise
Other Gotchas
• Frequent hand-offs place pressure on thread schedulers
• effect is magnified when a hypervisor is involved
• estimated 80,000 cycles to handoff data between threads
• you can do a lot of processing in 80,000 cycles
• Too many threads places pressure on thread schedulers
• responsible for other ill effects (TTSP)
• too few threads may leave hardware under-utilized
Simulated Server Environment
ExecutorService threadPool = Executors.newFixedThreadPool(10);
threadPool.execute(() -> {
try {
long timer = System.currentTimeMillis();
value = Files.lines( new File(“gc.log").toPath()).parallel()
.map(applicationStoppedTimePattern::matcher)
.filter(Matcher::find)
.map( matcher -> matcher.group(2))
.mapToDouble(Double::parseDouble)
.summaryStatistics().getSum();
} catch (Exception ex) {}
});
Work Flow and Results
• First task to arrive will consume all ForkJoinWorkerThread
• downstream tasks wait for a ForkJoinWorkerThread
• downstream tasks start intermixing with initial task
• Initial task collects dead time as it competes for threads
• all other tasks collect dead time as they either
• compete or wait for a ForkJoinWorkerThread
Work Flow and Results
• First task to arrive will consume all ForkJoinWorkerThread
• downstream tasks wait for a ForkJoinWorkerThread
• downstream tasks start intermixing with initial task
• Initial task collects dead time as it competes for threads
• all other tasks collect dead time as they either
• compete or wait for a ForkJoinWorkerThread
System is stressed beyond capacity
Intermediate Operation Bottleneck
68.6% 1384 + 0 java.util.regex.Pattern$Curly.match
26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
Intermediate Operation Bottleneck
• Bottleneck is in pattern matching
• but, streaming infrastructure isn’t far behind!
68.6% 1384 + 0 java.util.regex.Pattern$Curly.match
26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
Tragedy of the Commons
Garrett Hardin, ecologist (1968):
Imagine the grazing of animals on a common ground. Each
flock owner gains if they add to their own flock. But
every animal added to the total degrades the commons a
small amount.
Tragedy of the Commons
Tragedy of the Commons
You have a finite amount of hardware
– it might be in your best interest to grab it all
– but if everyone behaves the same way…
Simulated Server Environment
Simulated Server Environment
• Submit 10 tasks to Fork-Join (via Executor thread-pool)
• first result comes out in 32 seconds
• compared to 9.5 seconds for individually submitted task
• high system time reflects task is I/O bounded
In-MemoryVariation
In-MemoryVariation
• Preload log file
In-MemoryVariation
• Preload log file
• Submit 10 tasks to Fork-Join (via Executor thread-pool)
• first result comes out in 23 seconds
• compared to 4.5 seconds for individually submitted task
• task is CPU bound
Conclusions
Sequential stream performance comparable to imperative code
Going parallel is worthwhile IF
- task is suitable
- expensive enough to amortize setup costs
- no inter-task communication needed
- data source is suitable
- environment is suitable
Need to monitor JDK to understanding bottlenecks
- Fork/Join pool is not well instrumented
Resources
http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
Resources
http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html

More Related Content

What's hot

Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascadingjohnynek
 
HBase RowKey design for Akka Persistence
HBase RowKey design for Akka PersistenceHBase RowKey design for Akka Persistence
HBase RowKey design for Akka PersistenceKonrad Malawski
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Holden Karau
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Sunghyouk Bae
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoTaro L. Saito
 
Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Holden Karau
 
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Databricks
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...Vyacheslav Lapin
 
Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014Konrad Malawski
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Samir Bessalah
 
2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japaneseKonrad Malawski
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Sunghyouk Bae
 
Introduction of failsafe
Introduction of failsafeIntroduction of failsafe
Introduction of failsafeSunghyouk Bae
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applicationsKnoldus Inc.
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法x1 ichi
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Stormthe100rabh
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Holden Karau
 

What's hot (20)

Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
HBase RowKey design for Akka Persistence
HBase RowKey design for Akka PersistenceHBase RowKey design for Akka Persistence
HBase RowKey design for Akka Persistence
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
 
Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016
 
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
 
Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
 
2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017
 
Introduction of failsafe
Introduction of failsafeIntroduction of failsafe
Introduction of failsafe
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
 

Viewers also liked

erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09Bhasker Kode
 
Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"Kiwamu Okabe
 
Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013Gabriel Pettier
 
Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012Piotr Burdylo
 
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009Alexandre Morgaut
 
Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)Bartek Zdanowski
 
Agile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko itAgile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko itPiotr Burdylo
 
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...thegdb
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCJohan Tibell
 
O'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIAO'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIAOliver Steele
 
Federated CDNs: What every service provider should know
Federated CDNs: What every service provider should knowFederated CDNs: What every service provider should know
Federated CDNs: What every service provider should knowPatrick Hurley
 
Jensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesignJensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesignJen Simmons
 
Mendeley presentation
Mendeley presentationMendeley presentation
Mendeley presentationDiogo Provete
 
Monadologie
MonadologieMonadologie
Monadologieleague
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing UpDavid Padbury
 
Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)Piotr Burdylo
 
Sneaking Scala through the Back Door
Sneaking Scala through the Back DoorSneaking Scala through the Back Door
Sneaking Scala through the Back DoorDianne Marsh
 

Viewers also liked (20)

erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09
 
Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"
 
Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013
 
Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012
 
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
 
Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)
 
Agile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko itAgile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko it
 
Laszlo PyCon 2005
Laszlo PyCon 2005Laszlo PyCon 2005
Laszlo PyCon 2005
 
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
 
O'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIAO'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIA
 
Federated CDNs: What every service provider should know
Federated CDNs: What every service provider should knowFederated CDNs: What every service provider should know
Federated CDNs: What every service provider should know
 
Jensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesignJensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesign
 
Mendeley presentation
Mendeley presentationMendeley presentation
Mendeley presentation
 
Monadologie
MonadologieMonadologie
Monadologie
 
Masters Defense 2013
Masters Defense 2013Masters Defense 2013
Masters Defense 2013
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)
 
Sneaking Scala through the Back Door
Sneaking Scala through the Back DoorSneaking Scala through the Back Door
Sneaking Scala through the Back Door
 

Similar to Shooting the Rapids

AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestPavan Chitumalla
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
What is new in java 8 concurrency
What is new in java 8 concurrencyWhat is new in java 8 concurrency
What is new in java 8 concurrencykshanth2101
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Spark Summit
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...Juan Cruz Nores
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 

Similar to Shooting the Rapids (20)

AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
What is new in java 8 concurrency
What is new in java 8 concurrencyWhat is new in java 8 concurrency
What is new in java 8 concurrency
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 

Recently uploaded (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

Shooting the Rapids

  • 1. Shooting the Rapids: Getting the Best from Java 8 Streams Kirk Pepperdine @kcpeppe Maurice Naftalin @mauricenaftalin Devoxx Belgium, Nov. 2015
  • 2. • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder • performance diagnostic tooling • Java Champion (since 2006) About Kirk
  • 3. • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder • performance diagnostic tooling • Java Champion (since 2006) About Kirk
  • 8. Subjects Covered in this Talk • Background – lambdas and streams • Performance of our example • Effect of parallelizing • Splitting input data efficiently • When to go parallel • Parallel streams in the real world
  • 10. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; What is a Lambda? matcher matcher.find() matcher matcher.find()
  • 11. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher matcher.find() matcher matcher.find()
  • 12. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher Predicate<Matcher> matches = matcher.find() matcher matcher.find()
  • 13. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() matcher matcher.find()
  • 14. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() -> matcher matcher.find()
  • 15. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find()-> matcher matcher.find()
  • 16. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = A lambda is a function from arguments to result matcher.find()-> matcher matcher.find()
  • 17. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  • 18. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮ DoubleSummaryStatistics {count=3, sum=2.181635, min=0.080123, average=0.727212, max=1.101357}
  • 19. Old School Code DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 20. Old School Code DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 } Let’s look at the features in this code
  • 21. Data Source DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 22. Map to Matcher DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 23. Filter DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 24. Map to Double DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 25. Collect Results (Reduce) DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 26. Java 8 Streams • A sequence of values,“in motion” • source and intermediate operations set the stream up lazily • a terminal operation “pulls” values eagerly down the stream collection.stream() .intermediateOp ⋮ .intermediateOp .terminalOp
  • 27. Stream Sources • New method Collection.stream() • Many other sources: • Arrays.stream(Object[]) • Streams.of(Object...) • Stream.iterate(Object,UnaryOperator) • Files.lines() • BufferedReader.lines() • Random.ints() • JarFile.stream() • …
  • 28. Imperative to Stream DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 29. Stream Source DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 30. Intermediate Operations DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 31. Method References DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 32. Terminal Operation DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 33. Visualising Sequential Streams x2x0 x1 x3x0 x1 x2 x3 Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 34. Visualising Sequential Streams x2x0 x1 x3x1 x2 x3 ✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 35. Visualising Sequential Streams x2x0 x1 x3 x1x2 x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 36. Visualising Sequential Streams x2x0 x1 x3 x1x2x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 37. Old School: 13.3 secs Sequential: 13.8 secs - Should be the same workload - Stream code is cleaner, easier to read How Does It Perform? 24M line file, MacBook Pro, Haswell i7, 4 cores, hyperthreaded, Java 9.0
  • 38. Can We Do Better? • We might be able to if the workload is parallelizable • split stream into many segments • process each segment • combine results • Requirements exactly match Fork/Join workflow
  • 42. x2 Visualizing Parallel Streams x1 y3 x0 x1 x3 ✔ ❌
  • 43. Splitting Stream Sources • Stream source is a Spliterator • can both iterate over data and – where possible – split it
  • 53. Parallel Streams DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath()) .parallel()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 54. About Fork/Join • Introduced in Java 7 • draws from a common pool of ForkJoinWorkerThread • default pool size == HW cores – 1 • assumes workload will be CPU bound • On its own, not an easy coding idiom • parallel streams provide an abstraction layer • Spliterator defines how to split stream • framework code submits sub-tasks to the common Fork/Join pool
  • 55. Old School: 13.3 secs Sequential: 13.8 secs Parallel: 9.5 secs - 1.45x faster - but not 8x faster (????) How Does That Perform? 24M lines, 2.8GHz 8-core i7, 16GB, OS X, Java 9.0
  • 56. In Fact!!!! • Different benchmarks yield a mixed bag of results • some were better • some were the same • some were worse!
  • 57. Open Questions • Under what conditions are things better • or worse • When should we parallelize • and when is serial better
  • 58. Open Questions • Under what conditions are things better • or worse • When should we parallelize • and when is serial better Answer depends upon where the bottleneck is
  • 59. Where is Our Bottleneck? • I/O operations • not a surprise, we’re reading from a file • Java 9 uses FileChannelLineSpliterator • 2x better than Java 8’s implementation 76.0% 0 + 5941 sun.nio.ch.FileDispatcherImpl.pread0
  • 60. Poorly Splitting Sources • Some sources split worse than others • LinkedList vs ArrayList • Streaming I/O is problematic • more threads == more pressure on contended resource • thrashing and other ill effects • Workload size doesn’t cover the overheads
  • 61. Streaming I/O Bottleneck x2x0 x1 x3x0 x1 x2 x3
  • 63. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage
  • 64. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer
  • 65. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  • 66. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  • 67. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid
  • 68. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid Included in JDK9 as FileChannelLinesSpliterator
  • 69. In-memory Comparison • Read GC log into an ArrayList prior to processing
  • 70. Old School: 9.4 secs Sequential: 9.9 secs Parallel: 2.7 secs - 4.25x faster - better but still not 8x faster In-memory Comparison 24M lines, 2.8GHz 8 core i7, 16GB, OS X, JDK 9.0
  • 71. Justifying the Overhead CPNQ performance model: C - number of submitters P - number of CPUs N - number of elements Q - cost of the operation cost of intermediate operations is N * Q overhead of setting up F/J framework is ~100µs
  • 72. Amortizing Setup Costs • N*Q needs to be large • Q can often only be estimated • N may only be known at run time • Rule of thumb, N > 10,000 • P is the number of processors • P == number for cores for CPU bound • P < number of cores otherwise
  • 73. Other Gotchas • Frequent hand-offs place pressure on thread schedulers • effect is magnified when a hypervisor is involved • estimated 80,000 cycles to handoff data between threads • you can do a lot of processing in 80,000 cycles • Too many threads places pressure on thread schedulers • responsible for other ill effects (TTSP) • too few threads may leave hardware under-utilized
  • 74. Simulated Server Environment ExecutorService threadPool = Executors.newFixedThreadPool(10); threadPool.execute(() -> { try { long timer = System.currentTimeMillis(); value = Files.lines( new File(“gc.log").toPath()).parallel() .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map( matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics().getSum(); } catch (Exception ex) {} });
  • 75. Work Flow and Results • First task to arrive will consume all ForkJoinWorkerThread • downstream tasks wait for a ForkJoinWorkerThread • downstream tasks start intermixing with initial task • Initial task collects dead time as it competes for threads • all other tasks collect dead time as they either • compete or wait for a ForkJoinWorkerThread
  • 76. Work Flow and Results • First task to arrive will consume all ForkJoinWorkerThread • downstream tasks wait for a ForkJoinWorkerThread • downstream tasks start intermixing with initial task • Initial task collects dead time as it competes for threads • all other tasks collect dead time as they either • compete or wait for a ForkJoinWorkerThread System is stressed beyond capacity
  • 77. Intermediate Operation Bottleneck 68.6% 1384 + 0 java.util.regex.Pattern$Curly.match 26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
  • 78. Intermediate Operation Bottleneck • Bottleneck is in pattern matching • but, streaming infrastructure isn’t far behind! 68.6% 1384 + 0 java.util.regex.Pattern$Curly.match 26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
  • 79. Tragedy of the Commons Garrett Hardin, ecologist (1968): Imagine the grazing of animals on a common ground. Each flock owner gains if they add to their own flock. But every animal added to the total degrades the commons a small amount.
  • 80. Tragedy of the Commons
  • 81. Tragedy of the Commons You have a finite amount of hardware – it might be in your best interest to grab it all – but if everyone behaves the same way…
  • 83. Simulated Server Environment • Submit 10 tasks to Fork-Join (via Executor thread-pool) • first result comes out in 32 seconds • compared to 9.5 seconds for individually submitted task • high system time reflects task is I/O bounded
  • 86. In-MemoryVariation • Preload log file • Submit 10 tasks to Fork-Join (via Executor thread-pool) • first result comes out in 23 seconds • compared to 4.5 seconds for individually submitted task • task is CPU bound
  • 87. Conclusions Sequential stream performance comparable to imperative code Going parallel is worthwhile IF - task is suitable - expensive enough to amortize setup costs - no inter-task communication needed - data source is suitable - environment is suitable Need to monitor JDK to understanding bottlenecks - Fork/Join pool is not well instrumented