SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Ein MapReduce-basiertes
    Programmiermodell für
selbstwartbare Aggregatsichten
                          2012-08-31




                            Johannes Schildgen
                              TU Kaiserslautern
                                schildgen@cs.uni-kl.de
Motivation
11    26        14 23 37
             39   41         26
   19    8
                     25
19   22 15 18 10 16
                    27
 8 9    12 14 15
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -1
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -8
Pickup: 12         Snickers: +24
…                  Ritter Sport: -7
Increment Installation

Kinderriegel: 26
Balisto: 31
Hanuta: 14
Snickers: 43
Ritter Sport: 34
                          Δ
                     Balisto: -8
Pickup: 12           Snickers: +24
…                    Ritter Sport: -7
Overwrite Installation

Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
                                   Δ
                               Balisto: -8
Ritter Sport: 41Kinderriegel: 26
Pickup: 12      Balisto: 31    Snickers: +24
                Hanuta: 14 Ritter Sport: -7
…               Snickers: 43
                Ritter Sport: 34
                Pickup: 12
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
public class WordCount extends Configured implements Tool {

        public static class WordCountMapper extends
                         Mapper<LongWritable, Text, ImmutableBytesWritable,
                                          LongWritable> {

                 private Text word = new Text();

                 @Override
                 public void map(LongWritable key, Text value, Context context)
                                   throws IOException, InterruptedException {
                          String line = value.toString();
                          StringTokenizer tokenizer = new StringTokenizer(line);
                          while (tokenizer.hasMoreTokens()) {
                                   this.word.set(tokenizer.nextToken());
A   A   E
B   E   F
C   F   B
D   C   D
> create 'person', 'default'
> put 'person', 'p27', 'default:vorname', 'Anton'
> put 'person', 'p27', 'default:nachname', 'Schmidt'

> get 'person', 'p27'
COLUMN                CELL
 default:nachname     timestamp=1338991497408, value=Schmidt
 default:vorname      timestamp=1338991436688, value=Anton
2 row(s) in 0.0640 seconds
banane
die
iss
nimm
          3
          4
          1
          1
                Δ
              kaufe     1
schale    1
              die       0
schäle    1
              banane    1
schmeiß   1
              schmeiß   -1
weg       1
              schale    -1
              weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
              increment()
schale    0                 die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
banane
die
iss
nimm
          3
          4
          1
          1
                              Δ
                            kaufe     1
schale    1   overwrite()   die       0
schäle    1
                            banane    1
schmeiß   1
                            schmeiß   -1
weg       1
                            schale    -1
                            weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
schale    0   overwrite()   die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }



}
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }
 else { // old result
    write(key, value);
 }
}
Overwrite Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  put = new Put(key);
  put.add("fam", "col", sum);
  context.write(key, put);
}
Increment Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  inc = new Increment(key);
  inc.add("fam", "col", sum);
  context.write(key, inc);
}
Formalisierung
Formalisierung
Allgemeiner Mapper
Allgemeiner Reducer
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
Kernfunktionalität:
Verteiltes Rechnen mittels MapReduce

                                          Ich kümmere mich um:
                                             IncDec, Overwrite,
                                           Lesen alter Ergebnisse,
                                       Erzeugung von Inkrementen,…


                             Ich erkläre dir, wie man
                              die Eingabedaten liest,
Kernfunktionalität:        aggregiert, invertiert und die
                                 Ausgabe schreibt.
Inkrementelles Rechnen
public class WordTranslator extends
    Translator<LongWritable, Text> {
  public void translate(…) {
    …
}


IncJob job = new IncJobOverwrite(conf);
job.setTranslatorClass(
             WordTranslator.class);
job.setAbelianClass(WordAbelian.class);
public class WordAbelian implements
    Abelian<WordAbelian> {
 WordAbelian invert() { … }
 WordAbelian aggregate(WordAbelian
                        other) { … }
 WordAbelian neutral() { … }
 boolean isNeutral() { … }
 Writable extractKey() { … }
 void write(…) { … }
 void readFields(…) { … }
}
public class WordSerializer
 implements Serializer<WordAbelian> {

 Writable serialize(Writable key,
                    WordAbelian v) {
    …
 }
 WordAbelian deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value) {
    …
 }
}
How To Write A Marimba-Job

1. Abelian-Klasse
2. Translator-Klasse
3. Serializer-Klasse
4. Hadoop-Job schreiben unter
   Verwendung der Klasse IncJob
Implementierung

                                      setInputTable(…)


                     IncJob          setOutputTable(…)




  IncJobFull
                    IncJobIncDec      IncJobOverwrite
Recomputation

                                   setResultInputTable(…)
NeutralOutputStrategy
(bei IncJobOverwrite)
public interface Abelian<T extends
 Abelian<?>> extends
 WritableComparable<BinaryComparable>{

 T invert();
 T aggregate(T other);
 T neutral();
 boolean isNeutral();
 Writable extractKey();
 void write(…);
 void readFields(…);
}
public interface Serialzer<T extends
 Abelian<?>> {

 Writable serialize(T obj);
 T deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value);
}
public abstract class Translator
 <KEYIN, VALUEIN> {

 public abstract void translate
    (KEYIN key, VALUEIN value,
     Context context);

this.mapContext.write(
    abelianValue.extractKey(),
    this.invertValue ?
         abelianValue.invert() :
         abelianValue);
GenericMapper


Eingabeformat liefert:      Value


OverwriteResult    InsertedValue      DeletedValue           PreservedValue




 deserialisieren    weitergeben    setze invertValue=true;     ignorieren
                                   weitergeben
GenericReducer


            1. Aggregieren
            2. Serialisieren
            3. Schreiben


IncDec:                        Overwrite:
                               PUT → schreiben
putToIncrement(…)              IGNORE → nicht schreiben
                               DELETE → putToDelete(...)
GenericCombiner

„Write A Combiner“
   -- 7 Tips for Improving MapReduce Performance, (Tipp 4)




                1. Aggregieren
TextWindowInputFormat
Beispielanwendungen:
                   1. WordCount
void translate(key, value) {
                                   WordAbelian invert() {
                                    return new WordAbelian(
 for(word : value.split(" ")) {        this.word,
  write(                               -1 * this.count);
    new WordAbelian(word, 1));     }
 }
}                                   WordAbelian aggregate(
                                    WordAbelian other) {
                                     return new WordAbelian(
Writable serialize(                     this.word,
    WordAbelian w) {                    this.count
 Put p = new Put(                       + other.count);
        w.getWord());               }
 p.add(…);
 return p;                         boolean neutral() {
}                                   return new WordAbelian(
                                       this.word, 0);
                                   }

                                  boolean isNeutral() {

  Translator                      }
                                   return (this.count == 0);



  Serializer                      WordAbelian
Beispielanwendungen:
     2. Friends Of Friends
                      FRIENDS


A
             D
     B                FRIENDS OF FRIENDS

C
         E
Beispielanwendungen:
              2. Friends Of Friends
            translate(person, friends):

aggregate(…):
Menge der Freundes-
freunde mischen
Beispielanwendungen:
3. Reverse WebLink-Graph

                            REVERSE WEB LINK GRAPH
                            (Row-ID -> Columns)

                            Google -> {eBay, Wikipedia}

          aggregate(…): -> {Google, Wikipedia}
                      eBay

          Menge der LinksMensa-KL -> {Google}
                          mischen
                            Facebook -> {Google, Mensa-
                            KL, Uni-KL}

                            Wikipedia -> {Google}

                            Uni-KL -> {Google, Wikipedia}
Beispielanwendungen:
         4. Bigrams
Hi, kannst du mich ___?___ am
Bahnhof abholen? So in etwa
10 ___?___. Viele liebe ___?__.
P.S. Ich habe viel ___?___.
Idee:
 Große Mengen von
Text mit MapReduce
    analysieren.
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
„Woher Daten
 nehmen?“
bitte
Hi, kannst du mich ___ ___ am
                    nicht
Bahnhof abholen? So in etwa
<num>Minuten
     <num>
     Jahre
10 ___ ___.                Grüße
                            dich
               Viele liebe ___ __.
                   zu
                   Spaß
P.S. Ich habe viel ___ ___.
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
WordCount

               01:10




               01:00




               00:50




               00:40
Zeit [hh:mm]




               00:30                                                               FULL
                                                                                   INCDEC
                                                                                   OVERWRITE
               00:20




               00:10




               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Reverse Weblink-Graph

               02:51

               02:41

               02:31

               02:21

               02:11

               02:00

               01:50

               01:40
Zeit [hh:mm]




               01:30
                                                                                   FULL
               01:20
                                                                                   INCDEC
               01:10
                                                                                   OVERWRITE
               01:00

               00:50

               00:40

               00:30

               00:20

               00:10

               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Fazit
Vollständige Neuberechnung
IncDec / Overwrite
Verwendete Grafiken
Folie 5-9:                                               Folie 37-44:
Flammen und Smilie: Microsoft Office 2010                Puzzle: http://www.flickr.com/photos/dps/136565237/

Folie 10:                                                Folie 46 - 48:
Google: http://www.google.de                             Junge: Microsoft Office 2010

Folie 11:                                                Folie 49:
Amazon: http://www.amazon.de                             Google: http://www.google.de
                                                         eBay: http://www.ebay.de
Folie 12:                                                Mensa-KL: http://www.mensa-kl.de
Hadoop: http://hadoop.apache.org                         facebook: http://www.facebook.de
Casio Wristwatch:                                        Wikipedia: http://de.wikipedia.org
http://www.flickr.com/photos/andresrueda/3448240252      TU Kaiserslautern: http://www.uni-kl.de

Folie 16:                                                Folie 50-51:
Hadoop: http://hadoop.apache.org                         Handy: Microsoft Office 2010

Folie 17:                                                Folie 56:
Hadoop: http://hadoop.apache.org                         Wikipedia: http://de.wikipedia.org
Notebook: Microsoft Office 2010                          Twitter: http://www.twitter.com

Folie 18:                                                Folie 57:
HBase: http://hbase.apache.org                           Handy: Microsoft Office 2010

Folie 31:                                                Folie 58:
Hadoop: http://hadoop.apache.org                         Hadoop: http://hadoop.apache.org
Casio Wristwatch:                                        Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252
http://www.flickr.com/photos/andresrueda/3448240252

Folie 32:
Gerüst: http://www.flickr.com/photos/michale/94538528/
Hadoop: http://hadoop.apache.org
Junge: Microsoft Office 2010
Literaturverzeichnis (1/2)
[0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten.
Masterarbeit, TU Kaiserslautern, August 2012

[1] Apache Hadoop project. http://hadoop.apache.org/.
[2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526.
[3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010.
http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/.
[4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012.
http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast.
[5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012.
http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1.
[6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004.
[7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011.
[8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis.
http://database.cs.brown.edu/projects/mapreduce-vs-dbms/.
[9] Ricky Ho. Map/Reduce to recommend people connection, August 2010.
http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html.
[10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012.
[11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views?
In LADIS 2011, pages 1 – 5, 9 2011.
[12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011.
[13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce.
[14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving-
mapreduce-performance/.
Literaturverzeichnis (2/2)
[15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012.
http://play.google.com/store/apps/details?id=com.touchtype.swiftkey.
[16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf.
[17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures.
ICDE, pages 183–194, 2011.
[18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012.
http://heise.de/-1569837.
[19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7.
[20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern,
Juni 2011.
[21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012.
http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/.
[23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012.
[24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009.
http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation.
[25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009.
[26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012.
http://www.formtek.com/blog/?p=3032.
[27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60

Weitere ähnliche Inhalte

Was ist angesagt?

Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
MongoSF
 

Was ist angesagt? (20)

Empathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible codeEmpathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible code
 
Scala in a Java 8 World
Scala in a Java 8 WorldScala in a Java 8 World
Scala in a Java 8 World
 
TDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hypeTDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hype
 
Introduction to Groovy
Introduction to GroovyIntroduction to Groovy
Introduction to Groovy
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Exhibition of Atrocity
Exhibition of AtrocityExhibition of Atrocity
Exhibition of Atrocity
 
The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212
 
Unit testing pig
Unit testing pigUnit testing pig
Unit testing pig
 
집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링
 
Functional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heavenFunctional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heaven
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 
Go &lt;-> Ruby
Go &lt;-> RubyGo &lt;-> Ruby
Go &lt;-> Ruby
 
Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기
 
Corona sdk
Corona sdkCorona sdk
Corona sdk
 
Ruby 1.9
Ruby 1.9Ruby 1.9
Ruby 1.9
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
 
A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Lập trình Python cơ bản
Lập trình Python cơ bảnLập trình Python cơ bản
Lập trình Python cơ bản
 
Begin with Python
Begin with PythonBegin with Python
Begin with Python
 

Ähnlich wie Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten

Coffee script
Coffee scriptCoffee script
Coffee script
timourian
 
JavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyJavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovy
Yasuharu Nakano
 
Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)
intelliyole
 

Ähnlich wie Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten (20)

Coffee script
Coffee scriptCoffee script
Coffee script
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter Bootstrap
 
Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with Groovy
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
 
Programmation fonctionnelle Scala
Programmation fonctionnelle ScalaProgrammation fonctionnelle Scala
Programmation fonctionnelle Scala
 
Monadologie
MonadologieMonadologie
Monadologie
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined Functions
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 
Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)
 
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour HadoopOSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
 
Introduction to Kotlin.pptx
Introduction to Kotlin.pptxIntroduction to Kotlin.pptx
Introduction to Kotlin.pptx
 
01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx
 
JavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyJavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovy
 
Python speleology
Python speleologyPython speleology
Python speleology
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)
 
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
 
Pydiomatic
PydiomaticPydiomatic
Pydiomatic
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7
 

Mehr von Johannes Schildgen (6)

Precision and Recall
Precision and RecallPrecision and Recall
Precision and Recall
 
Visualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using SamplingVisualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using Sampling
 
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
 
Incremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQLIncremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQL
 
Big-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-DatenbankenBig-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-Datenbanken
 
Precision und Recall
Precision und RecallPrecision und Recall
Precision und Recall
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten

  • 1. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten 2012-08-31 Johannes Schildgen TU Kaiserslautern schildgen@cs.uni-kl.de
  • 3. 11 26 14 23 37 39 41 26 19 8 25 19 22 15 18 10 16 27 8 9 12 14 15
  • 4. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 5. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 6. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -1 Pickup: 12 …
  • 7. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 8. Increment Installation Kinderriegel: 26 Balisto: 31 Hanuta: 14 Snickers: 43 Ritter Sport: 34 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 9. Overwrite Installation Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Δ Balisto: -8 Ritter Sport: 41Kinderriegel: 26 Pickup: 12 Balisto: 31 Snickers: +24 Hanuta: 14 Ritter Sport: -7 … Snickers: 43 Ritter Sport: 34 Pickup: 12
  • 10.
  • 11.
  • 12. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 13.
  • 14.
  • 15.
  • 16. public class WordCount extends Configured implements Tool { public static class WordCountMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, LongWritable> { private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { this.word.set(tokenizer.nextToken());
  • 17. A A E B E F C F B D C D
  • 18. > create 'person', 'default' > put 'person', 'p27', 'default:vorname', 'Anton' > put 'person', 'p27', 'default:nachname', 'Schmidt' > get 'person', 'p27' COLUMN CELL default:nachname timestamp=1338991497408, value=Schmidt default:vorname timestamp=1338991436688, value=Anton 2 row(s) in 0.0640 seconds
  • 19. banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 20. banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 increment() schale 0 die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 21. banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 overwrite() die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 22. banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 schale 0 overwrite() die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 23. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } }
  • 24. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } else { // old result write(key, value); } }
  • 25. Overwrite Installation void reduce(key, values) { sum = 0; for(value : value) { sum += value; } put = new Put(key); put.add("fam", "col", sum); context.write(key, put); }
  • 26. Increment Installation void reduce(key, values) { sum = 0; for(value : value) { sum += value; } inc = new Increment(key); inc.add("fam", "col", sum); context.write(key, inc); }
  • 31. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 32. Kernfunktionalität: Verteiltes Rechnen mittels MapReduce Ich kümmere mich um: IncDec, Overwrite, Lesen alter Ergebnisse, Erzeugung von Inkrementen,… Ich erkläre dir, wie man die Eingabedaten liest, Kernfunktionalität: aggregiert, invertiert und die Ausgabe schreibt. Inkrementelles Rechnen
  • 33. public class WordTranslator extends Translator<LongWritable, Text> { public void translate(…) { … } IncJob job = new IncJobOverwrite(conf); job.setTranslatorClass( WordTranslator.class); job.setAbelianClass(WordAbelian.class);
  • 34. public class WordAbelian implements Abelian<WordAbelian> { WordAbelian invert() { … } WordAbelian aggregate(WordAbelian other) { … } WordAbelian neutral() { … } boolean isNeutral() { … } Writable extractKey() { … } void write(…) { … } void readFields(…) { … } }
  • 35. public class WordSerializer implements Serializer<WordAbelian> { Writable serialize(Writable key, WordAbelian v) { … } WordAbelian deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value) { … } }
  • 36. How To Write A Marimba-Job 1. Abelian-Klasse 2. Translator-Klasse 3. Serializer-Klasse 4. Hadoop-Job schreiben unter Verwendung der Klasse IncJob
  • 37. Implementierung setInputTable(…) IncJob setOutputTable(…) IncJobFull IncJobIncDec IncJobOverwrite Recomputation setResultInputTable(…)
  • 39. public interface Abelian<T extends Abelian<?>> extends WritableComparable<BinaryComparable>{ T invert(); T aggregate(T other); T neutral(); boolean isNeutral(); Writable extractKey(); void write(…); void readFields(…); }
  • 40. public interface Serialzer<T extends Abelian<?>> { Writable serialize(T obj); T deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value); }
  • 41. public abstract class Translator <KEYIN, VALUEIN> { public abstract void translate (KEYIN key, VALUEIN value, Context context); this.mapContext.write( abelianValue.extractKey(), this.invertValue ? abelianValue.invert() : abelianValue);
  • 42. GenericMapper Eingabeformat liefert: Value OverwriteResult InsertedValue DeletedValue PreservedValue deserialisieren weitergeben setze invertValue=true; ignorieren weitergeben
  • 43. GenericReducer 1. Aggregieren 2. Serialisieren 3. Schreiben IncDec: Overwrite: PUT → schreiben putToIncrement(…) IGNORE → nicht schreiben DELETE → putToDelete(...)
  • 44. GenericCombiner „Write A Combiner“ -- 7 Tips for Improving MapReduce Performance, (Tipp 4) 1. Aggregieren
  • 46. Beispielanwendungen: 1. WordCount void translate(key, value) { WordAbelian invert() { return new WordAbelian( for(word : value.split(" ")) { this.word, write( -1 * this.count); new WordAbelian(word, 1)); } } } WordAbelian aggregate( WordAbelian other) { return new WordAbelian( Writable serialize( this.word, WordAbelian w) { this.count Put p = new Put( + other.count); w.getWord()); } p.add(…); return p; boolean neutral() { } return new WordAbelian( this.word, 0); } boolean isNeutral() { Translator } return (this.count == 0); Serializer WordAbelian
  • 47. Beispielanwendungen: 2. Friends Of Friends FRIENDS A D B FRIENDS OF FRIENDS C E
  • 48. Beispielanwendungen: 2. Friends Of Friends translate(person, friends): aggregate(…): Menge der Freundes- freunde mischen
  • 49. Beispielanwendungen: 3. Reverse WebLink-Graph REVERSE WEB LINK GRAPH (Row-ID -> Columns) Google -> {eBay, Wikipedia} aggregate(…): -> {Google, Wikipedia} eBay Menge der LinksMensa-KL -> {Google} mischen Facebook -> {Google, Mensa- KL, Uni-KL} Wikipedia -> {Google} Uni-KL -> {Google, Wikipedia}
  • 50. Beispielanwendungen: 4. Bigrams Hi, kannst du mich ___?___ am Bahnhof abholen? So in etwa 10 ___?___. Viele liebe ___?__. P.S. Ich habe viel ___?___.
  • 51. Idee: Große Mengen von Text mit MapReduce analysieren.
  • 52.
  • 53.
  • 54. Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 55. Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 57. bitte Hi, kannst du mich ___ ___ am nicht Bahnhof abholen? So in etwa <num>Minuten <num> Jahre 10 ___ ___. Grüße dich Viele liebe ___ __. zu Spaß P.S. Ich habe viel ___ ___.
  • 58. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 59. WordCount 01:10 01:00 00:50 00:40 Zeit [hh:mm] 00:30 FULL INCDEC OVERWRITE 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 60. Reverse Weblink-Graph 02:51 02:41 02:31 02:21 02:11 02:00 01:50 01:40 Zeit [hh:mm] 01:30 FULL 01:20 INCDEC 01:10 OVERWRITE 01:00 00:50 00:40 00:30 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 62. Verwendete Grafiken Folie 5-9: Folie 37-44: Flammen und Smilie: Microsoft Office 2010 Puzzle: http://www.flickr.com/photos/dps/136565237/ Folie 10: Folie 46 - 48: Google: http://www.google.de Junge: Microsoft Office 2010 Folie 11: Folie 49: Amazon: http://www.amazon.de Google: http://www.google.de eBay: http://www.ebay.de Folie 12: Mensa-KL: http://www.mensa-kl.de Hadoop: http://hadoop.apache.org facebook: http://www.facebook.de Casio Wristwatch: Wikipedia: http://de.wikipedia.org http://www.flickr.com/photos/andresrueda/3448240252 TU Kaiserslautern: http://www.uni-kl.de Folie 16: Folie 50-51: Hadoop: http://hadoop.apache.org Handy: Microsoft Office 2010 Folie 17: Folie 56: Hadoop: http://hadoop.apache.org Wikipedia: http://de.wikipedia.org Notebook: Microsoft Office 2010 Twitter: http://www.twitter.com Folie 18: Folie 57: HBase: http://hbase.apache.org Handy: Microsoft Office 2010 Folie 31: Folie 58: Hadoop: http://hadoop.apache.org Hadoop: http://hadoop.apache.org Casio Wristwatch: Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252 http://www.flickr.com/photos/andresrueda/3448240252 Folie 32: Gerüst: http://www.flickr.com/photos/michale/94538528/ Hadoop: http://hadoop.apache.org Junge: Microsoft Office 2010
  • 63. Literaturverzeichnis (1/2) [0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten. Masterarbeit, TU Kaiserslautern, August 2012 [1] Apache Hadoop project. http://hadoop.apache.org/. [2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526. [3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010. http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/. [4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012. http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast. [5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012. http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1. [6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004. [7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011. [8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis. http://database.cs.brown.edu/projects/mapreduce-vs-dbms/. [9] Ricky Ho. Map/Reduce to recommend people connection, August 2010. http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html. [10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012. [11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views? In LADIS 2011, pages 1 – 5, 9 2011. [12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011. [13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce. [14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving- mapreduce-performance/.
  • 64. Literaturverzeichnis (2/2) [15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012. http://play.google.com/store/apps/details?id=com.touchtype.swiftkey. [16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf. [17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures. ICDE, pages 183–194, 2011. [18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012. http://heise.de/-1569837. [19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009. http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7. [20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern, Juni 2011. [21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. [22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012. http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/. [23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012. [24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009. http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation. [25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009. [26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012. http://www.formtek.com/blog/?p=3032. [27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60