2. System Overview 2
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
3. Impl. of Generate-Test-Aggregate Algorithm
The impl. of generator
generate is a polymorphism over semiring structures.
generate⊕,⊗ f :: (A → S) → [A] → S
generate⊕,⊗ f [] = id⊗
generate⊕,⊗ f [x] = id⊗ ⊕ f x
generate⊕,⊗ f (xs ++ ys) =
generate⊕,⊗f xs ⊗ generate⊕,⊗f ys
Java class Generator is defined as:
public abstract class Generator<In extends Writable,Val extends
Writable,Res extends Writable >implements MapReducer <In,Val,Res >
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
4. Generator
The fields and methods in Generator
G+A : change the semiring
G+T+A: invoke embed(), and generate a new Generator
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
5. Impl. of Generate-Test-Aggregate Algorithm
aggregate is a semiring homomorphism
Semiring homomorphism aggregate is from (S, ⊕, ⊗) to
(S , ⊕ , ⊗ ).
aggregate : S → S
By definition, the aggregate function is a semiring
homomorphism.
But, the implemented aggregater is a semiring.(Sebastian’s and
my second version)
A real semiring homomorphism, eg, from [Int] → Int, is not
efficient. (20 -30 percent slower – in my first-version impl.)
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
6. Aggregater
It has a field: semiring on set A
singleton() function is from S → A. S is the input data type but
not [A] .
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
7. Impl. of Generate-Test-Aggregate Algorithm
test is an almost list homomorphism.
If hom is a monoid homomorphism from ([A], ++ ) to (M, ), ok is
a bool function. then, test can be written as:
test = filter(ok ◦ hom)
Generator::embed() methods take a Generator a Aggregater and
a test to make a new Generator which has a lifted semiring SM
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
8. Semiring Fusion
aggregate ◦ generate ,x++ (λx → [x] )
= generate⊕,⊗(λx → aggregate [x] )
It is done by replace the freesemiring of generator to aggregater’s
semiring.
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
9. Filter-Embedding : Lifted Semiring
aggregate◦test◦generate = postprocessM ok◦aggregateM ◦generate
Define the lifted semiring and a new generator, use this lifted
semiring as new generator’s semiring.
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
11. Java Codes
Compare to first version, interface/APIs have fewer type
parameters
G, G+A, G+T+A are all runnable programs
Let’s have a look at the actual implementation...
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
12. Evaluation on Hadoop (OLD vs NEW)
We test 3.2GB data on {2 , 4, 8, 16, 32} nodes clusters and 32
GB data on {32, 64} nodes clusters1.
2 nodes 4 nodes 8 nodes 16 nodes 32 nodes 64 nodes
time(sec.) 1602 882 482 317 961 511
speedup – × 1.82 × 1.83 × 1.52 – × 1.88
new time(sec.) – 668 348 – – –
speedup – – × 1.91 – – –
new score on 8 nodes-cluster:
MPS: 241 s total SUM: 226 s MEPS: EVEN-SUM: Pair-SUM:
Performance is better than before.
1
Thunk size = 64MB, duplication = 10
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
13. Impl. Issuses
Generate + (some special) Test can be implemented and
computed,
eg. sublists + length-limited-test
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o
14. Problems
Generic data type (eg, Pair<K,V>) is not sported by Hadoop,
nested generic-data-type is not allowed
Have to use rum-time class generation creating
generic-data-type class and reload new class to JVM
Correctness checking/verification is weak
Yu Liu ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework o