Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
A MapReduce-based   Programming Model forSelf-maintainable Aggregate Views                          2012-08-31            ...
Motivation
11    26        14 23 37             39   41         26   19    8                     2519   22 15 18 10 16               ...
Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41Pickup: 12…
Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41Pickup: 12…
Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41                        Δ                   Balisto: -1Pi...
Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41                        Δ                   Balisto: -8Pi...
Increment InstallationKinderriegel: 26Balisto: 31Hanuta: 14Snickers: 43Ritter Sport: 34                          Δ        ...
Overwrite InstallationKinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19                                   Δ               ...
Fundamentals &                 The Marimba Framework   EvaluationRelated Work
public class WordCount extends Configured implements Tool {        public static class WordCountMapper extends            ...
A   A   EB   E   FC   F   BD   C   D
> create person, default> put person, p27, default:forename, Anton> put person, p27, default:surname, Schmidt> get person,...
bananedieissnimm          3          4          1          1                Δ              kaufe     1schale    1         ...
bananedieisskaufenimm          4          4          1          1          1                              Δ               ...
bananedieissnimm          3          4          1          1                              Δ                            kau...
bananedieisskaufenimm          4          4          1          1          1                              Δ               ...
void map(key, value) { if(value is inserted) {    for(word : value.split(" ")) {       write(word, 1);    } else if(value ...
void map(key, value) { if(value is inserted) {    for(word : value.split(" ")) {       write(word, 1);    } else if(value ...
Overwrite Installationvoid reduce(key, values) {  sum = 0;  for(value : value) {    sum += value;  }  put = new Put(key); ...
Increment Installationvoid reduce(key, values) {  sum = 0;  for(value : value) {    sum += value;  }  inc = new Increment(...
Formalization
Formalization
Generic Mapper
Generic Reducer
Fundamentals &                 The Marimba Framework   EvaluationRelated Work
Core functionality:Distributed computationswith MapReduce                                          I care about:          ...
public class WordTranslator extends    Translator<LongWritable, Text> {  public void translate(…) {    …}IncJob job = new ...
public class WordAbelian implements    Abelian<WordAbelian> { WordAbelian invert() { … } WordAbelian aggregate(WordAbelian...
public class WordSerializer implements Serializer<WordAbelian> { Writable serialize(Writable key,                    WordA...
How To Write A Marimba-Job1. Abelian-Class2. Translator-Class3. Serializer-Class4. Write a Hadoop-Job and use the   class ...
Implementation                                     setInputTable(…)                    IncJob          setOutputTable(…)  ...
NeutralOutputStrategy (for IncJobOverwrite)
public interface Abelian<T extends Abelian<?>> extends WritableComparable<BinaryComparable>{ T invert(); T aggregate(T oth...
public interface Serialzer<T extends Abelian<?>> { Writable serialize(T obj); T deserializeHBase(    byte[] rowId, byte[] ...
public abstract class Translator <KEYIN, VALUEIN> { public abstract void translate    (KEYIN key, VALUEIN value,     Conte...
GenericMapperFrom InputFormat:            ValueOverwriteResult   InsertedValue      DeletedValue         PreservedValue  d...
GenericReducer            1. aggregate            2. serialize            3. writeIncDec:                    Overwrite:   ...
GenericCombiner„Write A Combiner“   -- 7 Tips for Improving MapReduce Performance, (Tipp 4)                1. aggregate
TextWindowInputFormat
Example:                        1. WordCountvoid translate(key, value) {                                   WordAbelian inv...
Example:    2. Friends Of Friends                      FRIENDSA             D    B                 FRIENDS OF FRIENDSC    ...
Example:                 2. Friends Of Friends             translate(person, friends):aggregate(…):Merge friends-of-friend...
Example:3. Reverse WebLink-Graph                            REVERSE WEB LINK GRAPH                            (Row-ID -> C...
Example:         4. BigramsHi, kannst du mich ___?___ amBahnhof abholen? So in etwa10 ___?___. Viele liebe ___?__.P.S. Ich...
Idea:Analize large amount of       text data
Example:                          4. BigramsextractKey()  a                                             invert():         ...
Beispielanwendungen:                     4. BigramsextractKey()  a                                         invert():      ...
„Which input  data?“
bitteHi, kannst du mich ___ ___ am                    nichtBahnhof abholen? So in etwa<num>Minuten     <num>     Jahre10 _...
Fundamentals &                 The Marimba Framework   EvaluationRelated Work
WordCount               01:10               01:00               00:50               00:40Zeit [hh:mm]               00:30 ...
Reverse Weblink-Graph               02:51               02:41               02:31               02:21               02:11 ...
ConclusionFull RecomputationIncDec / Overwrite
ImagesFolie 5-9:                                                  Folie 37-44:Flammen und Smilie: Microsoft Office 2010   ...
Bibliography (1/2)[0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten.Mas...
Bibliography (2/2)[15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012.http://play.google.com/stor...
Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views
Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views
Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views
Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views
Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views
Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views
Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views
Nächste SlideShare
Wird geladen in …5
×

Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views

1.247 Aufrufe

Veröffentlicht am

Master's thesis, TU Kaiserslautern, August 2012

http://code.google.com/p/marimba-framework

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Marimba - A MapReduce-based Programming Model for Self-maintainable Aggregate Views

  1. 1. A MapReduce-based Programming Model forSelf-maintainable Aggregate Views 2012-08-31 Johannes Schildgen TU Kaiserslautern schildgen@cs.uni-kl.de
  2. 2. Motivation
  3. 3. 11 26 14 23 37 39 41 26 19 8 2519 22 15 18 10 16 27 8 9 12 14 15
  4. 4. Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41Pickup: 12…
  5. 5. Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41Pickup: 12…
  6. 6. Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41 Δ Balisto: -1Pickup: 12…
  7. 7. Kinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19Ritter Sport: 41 Δ Balisto: -8Pickup: 12 Snickers: +24… Ritter Sport: -7
  8. 8. Increment InstallationKinderriegel: 26Balisto: 31Hanuta: 14Snickers: 43Ritter Sport: 34 Δ Balisto: -8Pickup: 12 Snickers: +24… Ritter Sport: -7
  9. 9. Overwrite InstallationKinderriegel: 26Balisto: 39Hanuta: 14Snickers: 19 Δ Balisto: -8Ritter Sport: 41Kinderriegel: 26Pickup: 12 Balisto: 31 Snickers: +24 Hanuta: 14 Ritter Sport: -7… Snickers: 43 Ritter Sport: 34 Pickup: 12
  10. 10. Fundamentals & The Marimba Framework EvaluationRelated Work
  11. 11. public class WordCount extends Configured implements Tool { public static class WordCountMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, LongWritable> { private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { this.word.set(tokenizer.nextToken());
  12. 12. A A EB E FC F BD C D
  13. 13. > create person, default> put person, p27, default:forename, Anton> put person, p27, default:surname, Schmidt> get person, p27COLUMN CELL default:forname timestamp=1338991497408, value=Schmidt default:surname timestamp=1338991436688, value=Anton2 row(s) in 0.0640 seconds
  14. 14. bananedieissnimm 3 4 1 1 Δ kaufe 1schale 1 die 0schäle 1 banane 1schmeiß 1 schmeiß -1weg 1 schale -1 weg -1
  15. 15. bananedieisskaufenimm 4 4 1 1 1 Δ kaufe 1 increment()schale 0 die 0schäle 1 banane 1schmeiß 0 schmeiß -1weg 0 schale -1 weg -1
  16. 16. bananedieissnimm 3 4 1 1 Δ kaufe 1schale 1 overwrite() die 0schäle 1 banane 1schmeiß 1 schmeiß -1weg 1 schale -1 weg -1
  17. 17. bananedieisskaufenimm 4 4 1 1 1 Δ kaufe 1schale 0 overwrite() die 0schäle 1 banane 1schmeiß 0 schmeiß -1weg 0 schale -1 weg -1
  18. 18. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } }}
  19. 19. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } else { // old result write(key, value); }}
  20. 20. Overwrite Installationvoid reduce(key, values) { sum = 0; for(value : value) { sum += value; } put = new Put(key); put.add("fam", "col", sum); context.write(key, put);}
  21. 21. Increment Installationvoid reduce(key, values) { sum = 0; for(value : value) { sum += value; } inc = new Increment(key); inc.add("fam", "col", sum); context.write(key, inc);}
  22. 22. Formalization
  23. 23. Formalization
  24. 24. Generic Mapper
  25. 25. Generic Reducer
  26. 26. Fundamentals & The Marimba Framework EvaluationRelated Work
  27. 27. Core functionality:Distributed computationswith MapReduce I care about: IncDec, Overwrite, reading old results, producing of Increments,… I tell you how to read input data,Core functionality: aggregate, invert and write the outputIncremental computations
  28. 28. public class WordTranslator extends Translator<LongWritable, Text> { public void translate(…) { …}IncJob job = new IncJobOverwrite(conf);job.setTranslatorClass( WordTranslator.class);job.setAbelianClass(WordAbelian.class);
  29. 29. public class WordAbelian implements Abelian<WordAbelian> { WordAbelian invert() { … } WordAbelian aggregate(WordAbelian other) { … } WordAbelian neutral() { … } boolean isNeutral() { … } Writable extractKey() { … } void write(…) { … } void readFields(…) { … }}
  30. 30. public class WordSerializer implements Serializer<WordAbelian> { Writable serialize(Writable key, WordAbelian v) { … } WordAbelian deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value) { … }}
  31. 31. How To Write A Marimba-Job1. Abelian-Class2. Translator-Class3. Serializer-Class4. Write a Hadoop-Job and use the class IncJob
  32. 32. Implementation setInputTable(…) IncJob setOutputTable(…) IncJobFull IncJobIncDec IncJobOverwriteRecomputation setResultInputTable(…)
  33. 33. NeutralOutputStrategy (for IncJobOverwrite)
  34. 34. public interface Abelian<T extends Abelian<?>> extends WritableComparable<BinaryComparable>{ T invert(); T aggregate(T other); T neutral(); boolean isNeutral(); Writable extractKey(); void write(…); void readFields(…);}
  35. 35. public interface Serialzer<T extends Abelian<?>> { Writable serialize(T obj); T deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value);}
  36. 36. public abstract class Translator <KEYIN, VALUEIN> { public abstract void translate (KEYIN key, VALUEIN value, Context context);this.mapContext.write( abelianValue.extractKey(), this.invertValue ? abelianValue.invert() : abelianValue);
  37. 37. GenericMapperFrom InputFormat: ValueOverwriteResult InsertedValue DeletedValue PreservedValue deserialize translate set invertValue=true; ignore translate
  38. 38. GenericReducer 1. aggregate 2. serialize 3. writeIncDec: Overwrite: PUT → writeputToIncrement(…) IGNORE → don‘t write DELETE → putToDelete(...)
  39. 39. GenericCombiner„Write A Combiner“ -- 7 Tips for Improving MapReduce Performance, (Tipp 4) 1. aggregate
  40. 40. TextWindowInputFormat
  41. 41. Example: 1. WordCountvoid translate(key, value) { WordAbelian invert() { return new WordAbelian( for(word : value.split(" ")) { this.word, write( -1 * this.count); new WordAbelian(word, 1)); } }} WordAbelian aggregate( WordAbelian other) { return new WordAbelian(Writable serialize( this.word, WordAbelian w) { this.count Put p = new Put( + other.count); w.getWord()); } p.add(…); return p; boolean neutral() {} return new WordAbelian( this.word, 0); } boolean isNeutral() { Translator } return (this.count == 0); Serializer WordAbelian
  42. 42. Example: 2. Friends Of Friends FRIENDSA D B FRIENDS OF FRIENDSC E
  43. 43. Example: 2. Friends Of Friends translate(person, friends):aggregate(…):Merge friends-of-friends-sets
  44. 44. Example:3. Reverse WebLink-Graph REVERSE WEB LINK GRAPH (Row-ID -> Columns) Google -> {eBay, Wikipedia} aggregate(…): -> {Google, Wikipedia} eBay Merge link-sets Mensa-KL -> {Google} Facebook -> {Google, Mensa- KL, Uni-KL} Wikipedia -> {Google} Uni-KL -> {Google, Wikipedia}
  45. 45. Example: 4. BigramsHi, kannst du mich ___?___ amBahnhof abholen? So in etwa10 ___?___. Viele liebe ___?__.P.S. Ich habe viel ___?___.
  46. 46. Idea:Analize large amount of text data
  47. 47. Example: 4. BigramsextractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other):write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  48. 48. Beispielanwendungen: 4. BigramsextractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other):write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  49. 49. „Which input data?“
  50. 50. bitteHi, kannst du mich ___ ___ am nichtBahnhof abholen? So in etwa<num>Minuten <num> Jahre10 ___ ___. Grüße dich Viele liebe ___ __. zu SpaßP.S. Ich habe viel ___ ___.
  51. 51. Fundamentals & The Marimba Framework EvaluationRelated Work
  52. 52. WordCount 01:10 01:00 00:50 00:40Zeit [hh:mm] 00:30 FULL INCDEC OVERWRITE 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  53. 53. Reverse Weblink-Graph 02:51 02:41 02:31 02:21 02:11 02:00 01:50 01:40Zeit [hh:mm] 01:30 FULL 01:20 INCDEC 01:10 OVERWRITE 01:00 00:50 00:40 00:30 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  54. 54. ConclusionFull RecomputationIncDec / Overwrite
  55. 55. ImagesFolie 5-9: Folie 37-44:Flammen und Smilie: Microsoft Office 2010 Puzzle: http://www.flickr.com/photos/dps/136565237/Folie 10: Folie 46 - 48:Google: http://www.google.de Junge: Microsoft Office 2010Folie 11: Folie 49:Amazon: http://www.amazon.de Google: http://www.google.de eBay: http://www.ebay.deFolie 12: Mensa-KL: http://www.mensa-kl.deHadoop: http://hadoop.apache.org facebook: http://www.facebook.deCasio Wristwatch: Wikipedia: http://de.wikipedia.orghttp://www.flickr.com/photos/andresrueda/3448240252 TU Kaiserslautern: http://www.uni-kl.deFolie 16: Folie 50-51:Hadoop: http://hadoop.apache.org Handy: Microsoft Office 2010Folie 17: Folie 56:Hadoop: http://hadoop.apache.org Wikipedia: http://de.wikipedia.orgNotebook: Microsoft Office 2010 Twitter: http://www.twitter.comFolie 18: Folie 57:HBase: http://hbase.apache.org Handy: Microsoft Office 2010Folie 31: Folie 58:Hadoop: http://hadoop.apache.org Hadoop: http://hadoop.apache.orgCasio Wristwatch: Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252http://www.flickr.com/photos/andresrueda/3448240252Folie 32:Gerüst: http://www.flickr.com/photos/michale/94538528/Hadoop: http://hadoop.apache.orgJunge: Microsoft Office 2010
  56. 56. Bibliography (1/2)[0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten.Masterarbeit, TU Kaiserslautern, August 2012[1] Apache Hadoop project. http://hadoop.apache.org/.[2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526.[3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010.http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/.[4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012.http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast.[5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012.http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1.[6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004.[7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011.[8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis.http://database.cs.brown.edu/projects/mapreduce-vs-dbms/.[9] Ricky Ho. Map/Reduce to recommend people connection, August 2010.http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html.[10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012.[11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views?In LADIS 2011, pages 1 – 5, 9 2011.[12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011.[13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce.[14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/.
  57. 57. Bibliography (2/2)[15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012.http://play.google.com/store/apps/details?id=com.touchtype.swiftkey.[16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf.[17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures.ICDE, pages 183–194, 2011.[18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012.http://heise.de/-1569837.[19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009.http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&rev=7.[20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern,Juni 2011.[21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press.Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.[22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012.http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/.[23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012.[24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009.http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation.[25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009.[26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012.http://www.formtek.com/blog/?p=3032.[27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60

×