SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
ScrewDriver Rebirth:
Generate-Test-and-Aggregate Framework on
Hadoop
Yu Liu
NII
January 27, 2012
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
System Overview 2
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Impl. of Generate-Test-Aggregate Algorithm
The impl. of generator
generate is a polymorphism over semiring structures.
generate⊕,⊗ f :: (A → S) → [A] → S
generate⊕,⊗ f [] = id⊗
generate⊕,⊗ f [x] = id⊗ ⊕ f x
generate⊕,⊗ f (xs ++ ys) =
generate⊕,⊗f xs ⊗ generate⊕,⊗f ys
Java class Generator is defined as:
public abstract class Generator<In extends Writable,Val extends
Writable,Res extends Writable >implements MapReducer <In,Val,Res >
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Generator
The fields and methods in Generator
G+A : change the semiring
G+T+A: invoke embed(), and generate a new Generator
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Impl. of Generate-Test-Aggregate Algorithm
aggregate is a semiring homomorphism
Semiring homomorphism aggregate is from (S, ⊕, ⊗) to
(S , ⊕ , ⊗ ).
aggregate : S → S
By definition, the aggregate function is a semiring
homomorphism.
But, the implemented aggregater is a semiring.(Sebastian’s and
my second version)
A real semiring homomorphism, eg, from [Int] → Int, is not
efficient. (20 -30 percent slower – in my first-version impl.)
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Aggregater
It has a field: semiring on set A
singleton() function is from S → A. S is the input data type but
not [A] .
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Impl. of Generate-Test-Aggregate Algorithm
test is an almost list homomorphism.
If hom is a monoid homomorphism from ([A], ++ ) to (M, ), ok is
a bool function. then, test can be written as:
test = filter(ok ◦ hom)
Generator::embed() methods take a Generator a Aggregater and
a test to make a new Generator which has a lifted semiring SM
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Semiring Fusion
aggregate ◦ generate ,x++ (λx → [x] )
= generate⊕,⊗(λx → aggregate [x] )
It is done by replace the freesemiring of generator to aggregater’s
semiring.
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Filter-Embedding : Lifted Semiring
aggregate◦test◦generate = postprocessM ok◦aggregateM ◦generate
Define the lifted semiring and a new generator, use this lifted
semiring as new generator’s semiring.
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Examples
1 Knapsack
2 SublistGenerator
3 SublistGenerator + MaxSumAggregator
4 SublistGenerator + EvenLengthTest + MaxSumAggregator
5 SublistGenerator + EvenSumTest + MaxSumAggregator
6 SublistGenerator + LimitedLengthTest + MaxSumAggregator
7 InitsGenerator
8 InitsGenerator + MaxSumAggregator
9 InitsGenerator + EvenLengthTest + MaxSumAggregator
10 InitsGenerator + EvenSumTest + MaxSumAggregator
11 InitsGenerator + LimitedLengthTest + MaxSumAggregator
12 SegmentsGenerator + MaxSumAggregator ...
13 SublistGenerator + LimitedLengthTest (special cases)
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Java Codes
Compare to first version, interface/APIs have fewer type
parameters
G, G+A, G+T+A are all runnable programs
Let’s have a look at the actual implementation...
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Evaluation on Hadoop (OLD vs NEW)
We test 3.2GB data on {2 , 4, 8, 16, 32} nodes clusters and 32
GB data on {32, 64} nodes clusters1.
2 nodes 4 nodes 8 nodes 16 nodes 32 nodes 64 nodes
time(sec.) 1602 882 482 317 961 511
speedup – × 1.82 × 1.83 × 1.52 – × 1.88
new time(sec.) – 668 348 – – –
speedup – – × 1.91 – – –
new score on 8 nodes-cluster:
MPS: 241 s total SUM: 226 s MEPS: EVEN-SUM: Pair-SUM:
Performance is better than before.
1
Thunk size = 64MB, duplication = 10
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Impl. Issuses
Generate + (some special) Test can be implemented and
computed,
eg. sublists + length-limited-test
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
Problems
Generic data type (eg, Pair<K,V>) is not sported by Hadoop,
nested generic-data-type is not allowed
Have to use rum-time class generation creating
generic-data-type class and reload new class to JVM
Correctness checking/verification is weak
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o

Weitere ähnliche Inhalte

Was ist angesagt?

The Ring programming language version 1.7 book - Part 67 of 196
The Ring programming language version 1.7 book - Part 67 of 196The Ring programming language version 1.7 book - Part 67 of 196
The Ring programming language version 1.7 book - Part 67 of 196Mahmoud Samir Fayed
 
The Ring programming language version 1.3 book - Part 45 of 88
The Ring programming language version 1.3 book - Part 45 of 88The Ring programming language version 1.3 book - Part 45 of 88
The Ring programming language version 1.3 book - Part 45 of 88Mahmoud Samir Fayed
 
The Ring programming language version 1.5.2 book - Part 59 of 181
The Ring programming language version 1.5.2 book - Part 59 of 181The Ring programming language version 1.5.2 book - Part 59 of 181
The Ring programming language version 1.5.2 book - Part 59 of 181Mahmoud Samir Fayed
 
The Ring programming language version 1.5.3 book - Part 71 of 184
The Ring programming language version 1.5.3 book - Part 71 of 184The Ring programming language version 1.5.3 book - Part 71 of 184
The Ring programming language version 1.5.3 book - Part 71 of 184Mahmoud Samir Fayed
 
The Ring programming language version 1.5.1 book - Part 58 of 180
The Ring programming language version 1.5.1 book - Part 58 of 180The Ring programming language version 1.5.1 book - Part 58 of 180
The Ring programming language version 1.5.1 book - Part 58 of 180Mahmoud Samir Fayed
 
The Ring programming language version 1.2 book - Part 42 of 84
The Ring programming language version 1.2 book - Part 42 of 84The Ring programming language version 1.2 book - Part 42 of 84
The Ring programming language version 1.2 book - Part 42 of 84Mahmoud Samir Fayed
 
The Ring programming language version 1.5.4 book - Part 61 of 185
The Ring programming language version 1.5.4 book - Part 61 of 185The Ring programming language version 1.5.4 book - Part 61 of 185
The Ring programming language version 1.5.4 book - Part 61 of 185Mahmoud Samir Fayed
 
The Ring programming language version 1.7 book - Part 71 of 196
The Ring programming language version 1.7 book - Part 71 of 196The Ring programming language version 1.7 book - Part 71 of 196
The Ring programming language version 1.7 book - Part 71 of 196Mahmoud Samir Fayed
 
Mobile TechTalk - Interesting talks from NSConference 6
Mobile TechTalk - Interesting talks from NSConference 6Mobile TechTalk - Interesting talks from NSConference 6
Mobile TechTalk - Interesting talks from NSConference 6GlobalLogic Ukraine
 
The Ring programming language version 1.8 book - Part 73 of 202
The Ring programming language version 1.8 book - Part 73 of 202The Ring programming language version 1.8 book - Part 73 of 202
The Ring programming language version 1.8 book - Part 73 of 202Mahmoud Samir Fayed
 
What they don't tell you about JavaScript
What they don't tell you about JavaScriptWhat they don't tell you about JavaScript
What they don't tell you about JavaScriptRaphael Cruzeiro
 
The Ring programming language version 1.8 book - Part 67 of 202
The Ring programming language version 1.8 book - Part 67 of 202The Ring programming language version 1.8 book - Part 67 of 202
The Ring programming language version 1.8 book - Part 67 of 202Mahmoud Samir Fayed
 
The Ring programming language version 1.5.3 book - Part 73 of 184
The Ring programming language version 1.5.3 book - Part 73 of 184The Ring programming language version 1.5.3 book - Part 73 of 184
The Ring programming language version 1.5.3 book - Part 73 of 184Mahmoud Samir Fayed
 
Operators in Kotlin
Operators in KotlinOperators in Kotlin
Operators in KotlinSagar Modi
 
Zone.js 2017
Zone.js 2017Zone.js 2017
Zone.js 2017Jia Li
 
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기NHN FORWARD
 
The Ring programming language version 1.6 book - Part 63 of 189
The Ring programming language version 1.6 book - Part 63 of 189The Ring programming language version 1.6 book - Part 63 of 189
The Ring programming language version 1.6 book - Part 63 of 189Mahmoud Samir Fayed
 

Was ist angesagt? (20)

The Ring programming language version 1.7 book - Part 67 of 196
The Ring programming language version 1.7 book - Part 67 of 196The Ring programming language version 1.7 book - Part 67 of 196
The Ring programming language version 1.7 book - Part 67 of 196
 
The Ring programming language version 1.3 book - Part 45 of 88
The Ring programming language version 1.3 book - Part 45 of 88The Ring programming language version 1.3 book - Part 45 of 88
The Ring programming language version 1.3 book - Part 45 of 88
 
The Ring programming language version 1.5.2 book - Part 59 of 181
The Ring programming language version 1.5.2 book - Part 59 of 181The Ring programming language version 1.5.2 book - Part 59 of 181
The Ring programming language version 1.5.2 book - Part 59 of 181
 
The Ring programming language version 1.5.3 book - Part 71 of 184
The Ring programming language version 1.5.3 book - Part 71 of 184The Ring programming language version 1.5.3 book - Part 71 of 184
The Ring programming language version 1.5.3 book - Part 71 of 184
 
The Ring programming language version 1.5.1 book - Part 58 of 180
The Ring programming language version 1.5.1 book - Part 58 of 180The Ring programming language version 1.5.1 book - Part 58 of 180
The Ring programming language version 1.5.1 book - Part 58 of 180
 
The Ring programming language version 1.2 book - Part 42 of 84
The Ring programming language version 1.2 book - Part 42 of 84The Ring programming language version 1.2 book - Part 42 of 84
The Ring programming language version 1.2 book - Part 42 of 84
 
The Ring programming language version 1.5.4 book - Part 61 of 185
The Ring programming language version 1.5.4 book - Part 61 of 185The Ring programming language version 1.5.4 book - Part 61 of 185
The Ring programming language version 1.5.4 book - Part 61 of 185
 
R and cpp
R and cppR and cpp
R and cpp
 
The Ring programming language version 1.7 book - Part 71 of 196
The Ring programming language version 1.7 book - Part 71 of 196The Ring programming language version 1.7 book - Part 71 of 196
The Ring programming language version 1.7 book - Part 71 of 196
 
Mobile TechTalk - Interesting talks from NSConference 6
Mobile TechTalk - Interesting talks from NSConference 6Mobile TechTalk - Interesting talks from NSConference 6
Mobile TechTalk - Interesting talks from NSConference 6
 
The Ring programming language version 1.8 book - Part 73 of 202
The Ring programming language version 1.8 book - Part 73 of 202The Ring programming language version 1.8 book - Part 73 of 202
The Ring programming language version 1.8 book - Part 73 of 202
 
Queue oop
Queue   oopQueue   oop
Queue oop
 
What they don't tell you about JavaScript
What they don't tell you about JavaScriptWhat they don't tell you about JavaScript
What they don't tell you about JavaScript
 
The Ring programming language version 1.8 book - Part 67 of 202
The Ring programming language version 1.8 book - Part 67 of 202The Ring programming language version 1.8 book - Part 67 of 202
The Ring programming language version 1.8 book - Part 67 of 202
 
Single qubit-gates operations
Single qubit-gates operationsSingle qubit-gates operations
Single qubit-gates operations
 
The Ring programming language version 1.5.3 book - Part 73 of 184
The Ring programming language version 1.5.3 book - Part 73 of 184The Ring programming language version 1.5.3 book - Part 73 of 184
The Ring programming language version 1.5.3 book - Part 73 of 184
 
Operators in Kotlin
Operators in KotlinOperators in Kotlin
Operators in Kotlin
 
Zone.js 2017
Zone.js 2017Zone.js 2017
Zone.js 2017
 
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기
 
The Ring programming language version 1.6 book - Part 63 of 189
The Ring programming language version 1.6 book - Part 63 of 189The Ring programming language version 1.6 book - Part 63 of 189
The Ring programming language version 1.6 book - Part 63 of 189
 

Andere mochten auch

DMFA (Cordless Screwdriver)
DMFA (Cordless Screwdriver)DMFA (Cordless Screwdriver)
DMFA (Cordless Screwdriver)soham pal
 
How to make a sonic screwdriver
How to make a sonic screwdriverHow to make a sonic screwdriver
How to make a sonic screwdrivernzde
 
Adjustable Torque Screwdriver tools by Phoenix Contact
Adjustable Torque Screwdriver tools by Phoenix ContactAdjustable Torque Screwdriver tools by Phoenix Contact
Adjustable Torque Screwdriver tools by Phoenix ContactPhoenix Contact
 
Screwdriver Manufacturers
Screwdriver ManufacturersScrewdriver Manufacturers
Screwdriver ManufacturersKaleem
 
Corporate Video Production Company, Corporate Film Production, Video Product...
 Corporate Video Production Company, Corporate Film Production, Video Product... Corporate Video Production Company, Corporate Film Production, Video Product...
Corporate Video Production Company, Corporate Film Production, Video Product...Screwdriver.in
 

Andere mochten auch (6)

DMFA (Cordless Screwdriver)
DMFA (Cordless Screwdriver)DMFA (Cordless Screwdriver)
DMFA (Cordless Screwdriver)
 
How to make a sonic screwdriver
How to make a sonic screwdriverHow to make a sonic screwdriver
How to make a sonic screwdriver
 
AUTOMATIC SCREWDRIVER
AUTOMATIC SCREWDRIVERAUTOMATIC SCREWDRIVER
AUTOMATIC SCREWDRIVER
 
Adjustable Torque Screwdriver tools by Phoenix Contact
Adjustable Torque Screwdriver tools by Phoenix ContactAdjustable Torque Screwdriver tools by Phoenix Contact
Adjustable Torque Screwdriver tools by Phoenix Contact
 
Screwdriver Manufacturers
Screwdriver ManufacturersScrewdriver Manufacturers
Screwdriver Manufacturers
 
Corporate Video Production Company, Corporate Film Production, Video Product...
 Corporate Video Production Company, Corporate Film Production, Video Product... Corporate Video Production Company, Corporate Film Production, Video Product...
Corporate Video Production Company, Corporate Film Production, Video Product...
 

Ähnlich wie ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop

COSCUP: Introduction to Julia
COSCUP: Introduction to JuliaCOSCUP: Introduction to Julia
COSCUP: Introduction to Julia岳華 杜
 
20170415 當julia遇上資料科學
20170415 當julia遇上資料科學20170415 當julia遇上資料科學
20170415 當julia遇上資料科學岳華 杜
 
20171127 當julia遇上資料科學
20171127 當julia遇上資料科學20171127 當julia遇上資料科學
20171127 當julia遇上資料科學岳華 杜
 
Openstack taskflow 簡介
Openstack taskflow 簡介Openstack taskflow 簡介
Openstack taskflow 簡介kao kuo-tung
 
Core Java introduction | Basics | free course
Core Java introduction | Basics | free course Core Java introduction | Basics | free course
Core Java introduction | Basics | free course Kernel Training
 
Real world scala
Real world scalaReal world scala
Real world scalalunfu zhong
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcastingRudra Narayan Paul
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Jonathan Felch
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
Functional Programming with JavaScript
Functional Programming with JavaScriptFunctional Programming with JavaScript
Functional Programming with JavaScriptMark Shelton
 
Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.
Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.
Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.Rahul Krishna Upadhyaya
 
Method overloading, recursion, passing and returning objects from method, new...
Method overloading, recursion, passing and returning objects from method, new...Method overloading, recursion, passing and returning objects from method, new...
Method overloading, recursion, passing and returning objects from method, new...JAINAM KAPADIYA
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia岳華 杜
 
Testing with JUnit 5 and Spring - Spring I/O 2022
Testing with JUnit 5 and Spring - Spring I/O 2022Testing with JUnit 5 and Spring - Spring I/O 2022
Testing with JUnit 5 and Spring - Spring I/O 2022Sam Brannen
 
The Ring programming language version 1.5.4 book - Part 70 of 185
The Ring programming language version 1.5.4 book - Part 70 of 185The Ring programming language version 1.5.4 book - Part 70 of 185
The Ring programming language version 1.5.4 book - Part 70 of 185Mahmoud Samir Fayed
 

Ähnlich wie ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop (20)

Clojure And Swing
Clojure And SwingClojure And Swing
Clojure And Swing
 
ppopoff
ppopoffppopoff
ppopoff
 
COSCUP: Introduction to Julia
COSCUP: Introduction to JuliaCOSCUP: Introduction to Julia
COSCUP: Introduction to Julia
 
20170415 當julia遇上資料科學
20170415 當julia遇上資料科學20170415 當julia遇上資料科學
20170415 當julia遇上資料科學
 
20171127 當julia遇上資料科學
20171127 當julia遇上資料科學20171127 當julia遇上資料科學
20171127 當julia遇上資料科學
 
Openstack taskflow 簡介
Openstack taskflow 簡介Openstack taskflow 簡介
Openstack taskflow 簡介
 
Core Java introduction | Basics | free course
Core Java introduction | Basics | free course Core Java introduction | Basics | free course
Core Java introduction | Basics | free course
 
Scala vs java 8
Scala vs java 8Scala vs java 8
Scala vs java 8
 
Real world scala
Real world scalaReal world scala
Real world scala
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcasting
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
Functional Programming with JavaScript
Functional Programming with JavaScriptFunctional Programming with JavaScript
Functional Programming with JavaScript
 
Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.
Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.
Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.
 
Svcc Groovy Testing
Svcc Groovy TestingSvcc Groovy Testing
Svcc Groovy Testing
 
Method overloading, recursion, passing and returning objects from method, new...
Method overloading, recursion, passing and returning objects from method, new...Method overloading, recursion, passing and returning objects from method, new...
Method overloading, recursion, passing and returning objects from method, new...
 
Testing Spring Applications
Testing Spring ApplicationsTesting Spring Applications
Testing Spring Applications
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
Testing with JUnit 5 and Spring - Spring I/O 2022
Testing with JUnit 5 and Spring - Spring I/O 2022Testing with JUnit 5 and Spring - Spring I/O 2022
Testing with JUnit 5 and Spring - Spring I/O 2022
 
The Ring programming language version 1.5.4 book - Part 70 of 185
The Ring programming language version 1.5.4 book - Part 70 of 185The Ring programming language version 1.5.4 book - Part 70 of 185
The Ring programming language version 1.5.4 book - Part 70 of 185
 

Mehr von Yu Liu

A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoYu Liu
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsYu Liu
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocYu Liu
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)Yu Liu
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesYu Liu
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthYu Liu
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionYu Liu
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013Yu Liu
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)Yu Liu
 
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...Yu Liu
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)Yu Liu
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkYu Liu
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkYu Liu
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Yu Liu
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsYu Liu
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce worldYu Liu
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsYu Liu
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)Yu Liu
 
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel ProgrammingA Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel ProgrammingYu Liu
 

Mehr von Yu Liu (20)

A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel ProgrammingA Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop

  • 1. ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop Yu Liu NII January 27, 2012 Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 2. System Overview 2 Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 3. Impl. of Generate-Test-Aggregate Algorithm The impl. of generator generate is a polymorphism over semiring structures. generate⊕,⊗ f :: (A → S) → [A] → S generate⊕,⊗ f [] = id⊗ generate⊕,⊗ f [x] = id⊗ ⊕ f x generate⊕,⊗ f (xs ++ ys) = generate⊕,⊗f xs ⊗ generate⊕,⊗f ys Java class Generator is defined as: public abstract class Generator<In extends Writable,Val extends Writable,Res extends Writable >implements MapReducer <In,Val,Res > Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 4. Generator The fields and methods in Generator G+A : change the semiring G+T+A: invoke embed(), and generate a new Generator Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 5. Impl. of Generate-Test-Aggregate Algorithm aggregate is a semiring homomorphism Semiring homomorphism aggregate is from (S, ⊕, ⊗) to (S , ⊕ , ⊗ ). aggregate : S → S By definition, the aggregate function is a semiring homomorphism. But, the implemented aggregater is a semiring.(Sebastian’s and my second version) A real semiring homomorphism, eg, from [Int] → Int, is not efficient. (20 -30 percent slower – in my first-version impl.) Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 6. Aggregater It has a field: semiring on set A singleton() function is from S → A. S is the input data type but not [A] . Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 7. Impl. of Generate-Test-Aggregate Algorithm test is an almost list homomorphism. If hom is a monoid homomorphism from ([A], ++ ) to (M, ), ok is a bool function. then, test can be written as: test = filter(ok ◦ hom) Generator::embed() methods take a Generator a Aggregater and a test to make a new Generator which has a lifted semiring SM Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 8. Semiring Fusion aggregate ◦ generate ,x++ (λx → [x] ) = generate⊕,⊗(λx → aggregate [x] ) It is done by replace the freesemiring of generator to aggregater’s semiring. Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 9. Filter-Embedding : Lifted Semiring aggregate◦test◦generate = postprocessM ok◦aggregateM ◦generate Define the lifted semiring and a new generator, use this lifted semiring as new generator’s semiring. Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 10. Examples 1 Knapsack 2 SublistGenerator 3 SublistGenerator + MaxSumAggregator 4 SublistGenerator + EvenLengthTest + MaxSumAggregator 5 SublistGenerator + EvenSumTest + MaxSumAggregator 6 SublistGenerator + LimitedLengthTest + MaxSumAggregator 7 InitsGenerator 8 InitsGenerator + MaxSumAggregator 9 InitsGenerator + EvenLengthTest + MaxSumAggregator 10 InitsGenerator + EvenSumTest + MaxSumAggregator 11 InitsGenerator + LimitedLengthTest + MaxSumAggregator 12 SegmentsGenerator + MaxSumAggregator ... 13 SublistGenerator + LimitedLengthTest (special cases) Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 11. Java Codes Compare to first version, interface/APIs have fewer type parameters G, G+A, G+T+A are all runnable programs Let’s have a look at the actual implementation... Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 12. Evaluation on Hadoop (OLD vs NEW) We test 3.2GB data on {2 , 4, 8, 16, 32} nodes clusters and 32 GB data on {32, 64} nodes clusters1. 2 nodes 4 nodes 8 nodes 16 nodes 32 nodes 64 nodes time(sec.) 1602 882 482 317 961 511 speedup – × 1.82 × 1.83 × 1.52 – × 1.88 new time(sec.) – 668 348 – – – speedup – – × 1.91 – – – new score on 8 nodes-cluster: MPS: 241 s total SUM: 226 s MEPS: EVEN-SUM: Pair-SUM: Performance is better than before. 1 Thunk size = 64MB, duplication = 10 Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 13. Impl. Issuses Generate + (some special) Test can be implemented and computed, eg. sublists + length-limited-test Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
  • 14. Problems Generic data type (eg, Pair<K,V>) is not sported by Hadoop, nested generic-data-type is not allowed Have to use rum-time class generation creating generic-data-type class and reload new class to JVM Correctness checking/verification is weak Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o