SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Ein MapReduce-basiertes
    Programmiermodell für
selbstwartbare Aggregatsichten
                          2012-08-31




                            Johannes Schildgen
                              TU Kaiserslautern
                                schildgen@cs.uni-kl.de
Motivation
11    26        14 23 37
             39   41         26
   19    8
                     25
19   22 15 18 10 16
                    27
 8 9    12 14 15
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -1
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -8
Pickup: 12         Snickers: +24
…                  Ritter Sport: -7
Increment Installation

Kinderriegel: 26
Balisto: 31
Hanuta: 14
Snickers: 43
Ritter Sport: 34
                          Δ
                     Balisto: -8
Pickup: 12           Snickers: +24
…                    Ritter Sport: -7
Overwrite Installation

Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
                                   Δ
                               Balisto: -8
Ritter Sport: 41Kinderriegel: 26
Pickup: 12      Balisto: 31    Snickers: +24
                Hanuta: 14 Ritter Sport: -7
…               Snickers: 43
                Ritter Sport: 34
                Pickup: 12
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
public class WordCount extends Configured implements Tool {

        public static class WordCountMapper extends
                         Mapper<LongWritable, Text, ImmutableBytesWritable,
                                          LongWritable> {

                 private Text word = new Text();

                 @Override
                 public void map(LongWritable key, Text value, Context context)
                                   throws IOException, InterruptedException {
                          String line = value.toString();
                          StringTokenizer tokenizer = new StringTokenizer(line);
                          while (tokenizer.hasMoreTokens()) {
                                   this.word.set(tokenizer.nextToken());
A   A   E
B   E   F
C   F   B
D   C   D
> create 'person', 'default'
> put 'person', 'p27', 'default:vorname', 'Anton'
> put 'person', 'p27', 'default:nachname', 'Schmidt'

> get 'person', 'p27'
COLUMN                CELL
 default:nachname     timestamp=1338991497408, value=Schmidt
 default:vorname      timestamp=1338991436688, value=Anton
2 row(s) in 0.0640 seconds
banane
die
iss
nimm
          3
          4
          1
          1
                Δ
              kaufe     1
schale    1
              die       0
schäle    1
              banane    1
schmeiß   1
              schmeiß   -1
weg       1
              schale    -1
              weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
              increment()
schale    0                 die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
banane
die
iss
nimm
          3
          4
          1
          1
                              Δ
                            kaufe     1
schale    1   overwrite()   die       0
schäle    1
                            banane    1
schmeiß   1
                            schmeiß   -1
weg       1
                            schale    -1
                            weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
schale    0   overwrite()   die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }



}
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }
 else { // old result
    write(key, value);
 }
}
Overwrite Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  put = new Put(key);
  put.add("fam", "col", sum);
  context.write(key, put);
}
Increment Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  inc = new Increment(key);
  inc.add("fam", "col", sum);
  context.write(key, inc);
}
Formalisierung
Formalisierung
Allgemeiner Mapper
Allgemeiner Reducer
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
Kernfunktionalität:
Verteiltes Rechnen mittels MapReduce

                                          Ich kümmere mich um:
                                             IncDec, Overwrite,
                                           Lesen alter Ergebnisse,
                                       Erzeugung von Inkrementen,…


                             Ich erkläre dir, wie man
                              die Eingabedaten liest,
Kernfunktionalität:        aggregiert, invertiert und die
                                 Ausgabe schreibt.
Inkrementelles Rechnen
public class WordTranslator extends
    Translator<LongWritable, Text> {
  public void translate(…) {
    …
}


IncJob job = new IncJobOverwrite(conf);
job.setTranslatorClass(
             WordTranslator.class);
job.setAbelianClass(WordAbelian.class);
public class WordAbelian implements
    Abelian<WordAbelian> {
 WordAbelian invert() { … }
 WordAbelian aggregate(WordAbelian
                        other) { … }
 WordAbelian neutral() { … }
 boolean isNeutral() { … }
 Writable extractKey() { … }
 void write(…) { … }
 void readFields(…) { … }
}
public class WordSerializer
 implements Serializer<WordAbelian> {

 Writable serialize(Writable key,
                    WordAbelian v) {
    …
 }
 WordAbelian deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value) {
    …
 }
}
How To Write A Marimba-Job

1. Abelian-Klasse
2. Translator-Klasse
3. Serializer-Klasse
4. Hadoop-Job schreiben unter
   Verwendung der Klasse IncJob
Implementierung

                                      setInputTable(…)


                     IncJob          setOutputTable(…)




  IncJobFull
                    IncJobIncDec      IncJobOverwrite
Recomputation

                                   setResultInputTable(…)
NeutralOutputStrategy
(bei IncJobOverwrite)
public interface Abelian<T extends
 Abelian<?>> extends
 WritableComparable<BinaryComparable>{

 T invert();
 T aggregate(T other);
 T neutral();
 boolean isNeutral();
 Writable extractKey();
 void write(…);
 void readFields(…);
}
public interface Serialzer<T extends
 Abelian<?>> {

 Writable serialize(T obj);
 T deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value);
}
public abstract class Translator
 <KEYIN, VALUEIN> {

 public abstract void translate
    (KEYIN key, VALUEIN value,
     Context context);

this.mapContext.write(
    abelianValue.extractKey(),
    this.invertValue ?
         abelianValue.invert() :
         abelianValue);
GenericMapper


Eingabeformat liefert:      Value


OverwriteResult    InsertedValue      DeletedValue           PreservedValue




 deserialisieren    weitergeben    setze invertValue=true;     ignorieren
                                   weitergeben
GenericReducer


            1. Aggregieren
            2. Serialisieren
            3. Schreiben


IncDec:                        Overwrite:
                               PUT → schreiben
putToIncrement(…)              IGNORE → nicht schreiben
                               DELETE → putToDelete(...)
GenericCombiner

„Write A Combiner“
   -- 7 Tips for Improving MapReduce Performance, (Tipp 4)




                1. Aggregieren
TextWindowInputFormat
Beispielanwendungen:
                   1. WordCount
void translate(key, value) {
                                   WordAbelian invert() {
                                    return new WordAbelian(
 for(word : value.split(" ")) {        this.word,
  write(                               -1 * this.count);
    new WordAbelian(word, 1));     }
 }
}                                   WordAbelian aggregate(
                                    WordAbelian other) {
                                     return new WordAbelian(
Writable serialize(                     this.word,
    WordAbelian w) {                    this.count
 Put p = new Put(                       + other.count);
        w.getWord());               }
 p.add(…);
 return p;                         boolean neutral() {
}                                   return new WordAbelian(
                                       this.word, 0);
                                   }

                                  boolean isNeutral() {

  Translator                      }
                                   return (this.count == 0);



  Serializer                      WordAbelian
Beispielanwendungen:
     2. Friends Of Friends
                      FRIENDS


A
             D
     B                FRIENDS OF FRIENDS

C
         E
Beispielanwendungen:
              2. Friends Of Friends
            translate(person, friends):

aggregate(…):
Menge der Freundes-
freunde mischen
Beispielanwendungen:
3. Reverse WebLink-Graph

                            REVERSE WEB LINK GRAPH
                            (Row-ID -> Columns)

                            Google -> {eBay, Wikipedia}

          aggregate(…): -> {Google, Wikipedia}
                      eBay

          Menge der LinksMensa-KL -> {Google}
                          mischen
                            Facebook -> {Google, Mensa-
                            KL, Uni-KL}

                            Wikipedia -> {Google}

                            Uni-KL -> {Google, Wikipedia}
Beispielanwendungen:
         4. Bigrams
Hi, kannst du mich ___?___ am
Bahnhof abholen? So in etwa
10 ___?___. Viele liebe ___?__.
P.S. Ich habe viel ___?___.
Idee:
 Große Mengen von
Text mit MapReduce
    analysieren.
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
„Woher Daten
 nehmen?“
bitte
Hi, kannst du mich ___ ___ am
                    nicht
Bahnhof abholen? So in etwa
<num>Minuten
     <num>
     Jahre
10 ___ ___.                Grüße
                            dich
               Viele liebe ___ __.
                   zu
                   Spaß
P.S. Ich habe viel ___ ___.
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
WordCount

               01:10




               01:00




               00:50




               00:40
Zeit [hh:mm]




               00:30                                                               FULL
                                                                                   INCDEC
                                                                                   OVERWRITE
               00:20




               00:10




               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Reverse Weblink-Graph

               02:51

               02:41

               02:31

               02:21

               02:11

               02:00

               01:50

               01:40
Zeit [hh:mm]




               01:30
                                                                                   FULL
               01:20
                                                                                   INCDEC
               01:10
                                                                                   OVERWRITE
               01:00

               00:50

               00:40

               00:30

               00:20

               00:10

               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Fazit
Vollständige Neuberechnung
IncDec / Overwrite
Verwendete Grafiken
Folie 5-9:                                               Folie 37-44:
Flammen und Smilie: Microsoft Office 2010                Puzzle: http://www.flickr.com/photos/dps/136565237/

Folie 10:                                                Folie 46 - 48:
Google: http://www.google.de                             Junge: Microsoft Office 2010

Folie 11:                                                Folie 49:
Amazon: http://www.amazon.de                             Google: http://www.google.de
                                                         eBay: http://www.ebay.de
Folie 12:                                                Mensa-KL: http://www.mensa-kl.de
Hadoop: http://hadoop.apache.org                         facebook: http://www.facebook.de
Casio Wristwatch:                                        Wikipedia: http://de.wikipedia.org
http://www.flickr.com/photos/andresrueda/3448240252      TU Kaiserslautern: http://www.uni-kl.de

Folie 16:                                                Folie 50-51:
Hadoop: http://hadoop.apache.org                         Handy: Microsoft Office 2010

Folie 17:                                                Folie 56:
Hadoop: http://hadoop.apache.org                         Wikipedia: http://de.wikipedia.org
Notebook: Microsoft Office 2010                          Twitter: http://www.twitter.com

Folie 18:                                                Folie 57:
HBase: http://hbase.apache.org                           Handy: Microsoft Office 2010

Folie 31:                                                Folie 58:
Hadoop: http://hadoop.apache.org                         Hadoop: http://hadoop.apache.org
Casio Wristwatch:                                        Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252
http://www.flickr.com/photos/andresrueda/3448240252

Folie 32:
Gerüst: http://www.flickr.com/photos/michale/94538528/
Hadoop: http://hadoop.apache.org
Junge: Microsoft Office 2010
Literaturverzeichnis (1/2)
[0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten.
Masterarbeit, TU Kaiserslautern, August 2012

[1] Apache Hadoop project. http://hadoop.apache.org/.
[2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526.
[3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010.
http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/.
[4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012.
http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast.
[5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012.
http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1.
[6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004.
[7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011.
[8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis.
http://database.cs.brown.edu/projects/mapreduce-vs-dbms/.
[9] Ricky Ho. Map/Reduce to recommend people connection, August 2010.
http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html.
[10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012.
[11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views?
In LADIS 2011, pages 1 – 5, 9 2011.
[12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011.
[13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce.
[14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving-
mapreduce-performance/.
Literaturverzeichnis (2/2)
[15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012.
http://play.google.com/store/apps/details?id=com.touchtype.swiftkey.
[16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf.
[17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures.
ICDE, pages 183–194, 2011.
[18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012.
http://heise.de/-1569837.
[19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7.
[20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern,
Juni 2011.
[21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012.
http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/.
[23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012.
[24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009.
http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation.
[25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009.
[26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012.
http://www.formtek.com/blog/?p=3032.
[27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60

Weitere ähnliche Inhalte

Was ist angesagt?

Empathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible codeEmpathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible codeMario Gleichmann
 
Scala in a Java 8 World
Scala in a Java 8 WorldScala in a Java 8 World
Scala in a Java 8 WorldDaniel Blyth
 
TDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hypeTDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hypetdc-globalcode
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Qiangning Hong
 
Exhibition of Atrocity
Exhibition of AtrocityExhibition of Atrocity
Exhibition of AtrocityMichael Pirnat
 
The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212Mahmoud Samir Fayed
 
집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링Kwang Woo NAM
 
Functional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heavenFunctional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heavenPawel Szulc
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 
Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기JangHyuk You
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to PigChris Wilkes
 
A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009Jordan Baker
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 

Was ist angesagt? (20)

Empathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible codeEmpathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible code
 
Scala in a Java 8 World
Scala in a Java 8 WorldScala in a Java 8 World
Scala in a Java 8 World
 
TDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hypeTDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hype
 
Introduction to Groovy
Introduction to GroovyIntroduction to Groovy
Introduction to Groovy
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Exhibition of Atrocity
Exhibition of AtrocityExhibition of Atrocity
Exhibition of Atrocity
 
The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212
 
Unit testing pig
Unit testing pigUnit testing pig
Unit testing pig
 
집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링
 
Functional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heavenFunctional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heaven
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 
Go &lt;-> Ruby
Go &lt;-> RubyGo &lt;-> Ruby
Go &lt;-> Ruby
 
Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기
 
Corona sdk
Corona sdkCorona sdk
Corona sdk
 
Ruby 1.9
Ruby 1.9Ruby 1.9
Ruby 1.9
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
 
A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Lập trình Python cơ bản
Lập trình Python cơ bảnLập trình Python cơ bản
Lập trình Python cơ bản
 
Begin with Python
Begin with PythonBegin with Python
Begin with Python
 

Ähnlich wie Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten

Coffee script
Coffee scriptCoffee script
Coffee scripttimourian
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapHoward Lewis Ship
 
Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with GroovyArturo Herrero
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Codemotion
 
Programmation fonctionnelle Scala
Programmation fonctionnelle ScalaProgrammation fonctionnelle Scala
Programmation fonctionnelle ScalaSlim Ouertani
 
Monadologie
MonadologieMonadologie
Monadologieleague
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined FunctionsChristoph Bauer
 
Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Rick Copeland
 
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour HadoopOSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour HadoopPublicis Sapient Engineering
 
01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptxIvanZawPhyo
 
Introduction to Kotlin.pptx
Introduction to Kotlin.pptxIntroduction to Kotlin.pptx
Introduction to Kotlin.pptxAzharFauzan9
 
JavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyJavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyYasuharu Nakano
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsMichael Pirnat
 
Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)intelliyole
 
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Eelco Visser
 
Pydiomatic
PydiomaticPydiomatic
Pydiomaticrik0
 
Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7Paulo Morgado
 

Ähnlich wie Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten (20)

Coffee script
Coffee scriptCoffee script
Coffee script
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter Bootstrap
 
Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with Groovy
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
 
Programmation fonctionnelle Scala
Programmation fonctionnelle ScalaProgrammation fonctionnelle Scala
Programmation fonctionnelle Scala
 
Monadologie
MonadologieMonadologie
Monadologie
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined Functions
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 
Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)
 
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour HadoopOSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
 
01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx
 
Introduction to Kotlin.pptx
Introduction to Kotlin.pptxIntroduction to Kotlin.pptx
Introduction to Kotlin.pptx
 
JavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyJavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovy
 
Python speleology
Python speleologyPython speleology
Python speleology
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)
 
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
 
Pydiomatic
PydiomaticPydiomatic
Pydiomatic
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7
 

Mehr von Johannes Schildgen

Visualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using SamplingVisualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using SamplingJohannes Schildgen
 
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...Johannes Schildgen
 
Incremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQLIncremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQLJohannes Schildgen
 
Big-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-DatenbankenBig-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-DatenbankenJohannes Schildgen
 

Mehr von Johannes Schildgen (6)

Precision and Recall
Precision and RecallPrecision and Recall
Precision and Recall
 
Visualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using SamplingVisualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using Sampling
 
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
 
Incremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQLIncremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQL
 
Big-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-DatenbankenBig-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-Datenbanken
 
Precision und Recall
Precision und RecallPrecision und Recall
Precision und Recall
 

Kürzlich hochgeladen

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Kürzlich hochgeladen (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten

  • 1. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten 2012-08-31 Johannes Schildgen TU Kaiserslautern schildgen@cs.uni-kl.de
  • 3. 11 26 14 23 37 39 41 26 19 8 25 19 22 15 18 10 16 27 8 9 12 14 15
  • 4. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 5. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 6. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -1 Pickup: 12 …
  • 7. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 8. Increment Installation Kinderriegel: 26 Balisto: 31 Hanuta: 14 Snickers: 43 Ritter Sport: 34 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 9. Overwrite Installation Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Δ Balisto: -8 Ritter Sport: 41Kinderriegel: 26 Pickup: 12 Balisto: 31 Snickers: +24 Hanuta: 14 Ritter Sport: -7 … Snickers: 43 Ritter Sport: 34 Pickup: 12
  • 10.
  • 11.
  • 12. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 13.
  • 14.
  • 15.
  • 16. public class WordCount extends Configured implements Tool { public static class WordCountMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, LongWritable> { private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { this.word.set(tokenizer.nextToken());
  • 17. A A E B E F C F B D C D
  • 18. > create 'person', 'default' > put 'person', 'p27', 'default:vorname', 'Anton' > put 'person', 'p27', 'default:nachname', 'Schmidt' > get 'person', 'p27' COLUMN CELL default:nachname timestamp=1338991497408, value=Schmidt default:vorname timestamp=1338991436688, value=Anton 2 row(s) in 0.0640 seconds
  • 19. banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 20. banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 increment() schale 0 die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 21. banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 overwrite() die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 22. banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 schale 0 overwrite() die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 23. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } }
  • 24. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } else { // old result write(key, value); } }
  • 25. Overwrite Installation void reduce(key, values) { sum = 0; for(value : value) { sum += value; } put = new Put(key); put.add("fam", "col", sum); context.write(key, put); }
  • 26. Increment Installation void reduce(key, values) { sum = 0; for(value : value) { sum += value; } inc = new Increment(key); inc.add("fam", "col", sum); context.write(key, inc); }
  • 31. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 32. Kernfunktionalität: Verteiltes Rechnen mittels MapReduce Ich kümmere mich um: IncDec, Overwrite, Lesen alter Ergebnisse, Erzeugung von Inkrementen,… Ich erkläre dir, wie man die Eingabedaten liest, Kernfunktionalität: aggregiert, invertiert und die Ausgabe schreibt. Inkrementelles Rechnen
  • 33. public class WordTranslator extends Translator<LongWritable, Text> { public void translate(…) { … } IncJob job = new IncJobOverwrite(conf); job.setTranslatorClass( WordTranslator.class); job.setAbelianClass(WordAbelian.class);
  • 34. public class WordAbelian implements Abelian<WordAbelian> { WordAbelian invert() { … } WordAbelian aggregate(WordAbelian other) { … } WordAbelian neutral() { … } boolean isNeutral() { … } Writable extractKey() { … } void write(…) { … } void readFields(…) { … } }
  • 35. public class WordSerializer implements Serializer<WordAbelian> { Writable serialize(Writable key, WordAbelian v) { … } WordAbelian deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value) { … } }
  • 36. How To Write A Marimba-Job 1. Abelian-Klasse 2. Translator-Klasse 3. Serializer-Klasse 4. Hadoop-Job schreiben unter Verwendung der Klasse IncJob
  • 37. Implementierung setInputTable(…) IncJob setOutputTable(…) IncJobFull IncJobIncDec IncJobOverwrite Recomputation setResultInputTable(…)
  • 39. public interface Abelian<T extends Abelian<?>> extends WritableComparable<BinaryComparable>{ T invert(); T aggregate(T other); T neutral(); boolean isNeutral(); Writable extractKey(); void write(…); void readFields(…); }
  • 40. public interface Serialzer<T extends Abelian<?>> { Writable serialize(T obj); T deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value); }
  • 41. public abstract class Translator <KEYIN, VALUEIN> { public abstract void translate (KEYIN key, VALUEIN value, Context context); this.mapContext.write( abelianValue.extractKey(), this.invertValue ? abelianValue.invert() : abelianValue);
  • 42. GenericMapper Eingabeformat liefert: Value OverwriteResult InsertedValue DeletedValue PreservedValue deserialisieren weitergeben setze invertValue=true; ignorieren weitergeben
  • 43. GenericReducer 1. Aggregieren 2. Serialisieren 3. Schreiben IncDec: Overwrite: PUT → schreiben putToIncrement(…) IGNORE → nicht schreiben DELETE → putToDelete(...)
  • 44. GenericCombiner „Write A Combiner“ -- 7 Tips for Improving MapReduce Performance, (Tipp 4) 1. Aggregieren
  • 46. Beispielanwendungen: 1. WordCount void translate(key, value) { WordAbelian invert() { return new WordAbelian( for(word : value.split(" ")) { this.word, write( -1 * this.count); new WordAbelian(word, 1)); } } } WordAbelian aggregate( WordAbelian other) { return new WordAbelian( Writable serialize( this.word, WordAbelian w) { this.count Put p = new Put( + other.count); w.getWord()); } p.add(…); return p; boolean neutral() { } return new WordAbelian( this.word, 0); } boolean isNeutral() { Translator } return (this.count == 0); Serializer WordAbelian
  • 47. Beispielanwendungen: 2. Friends Of Friends FRIENDS A D B FRIENDS OF FRIENDS C E
  • 48. Beispielanwendungen: 2. Friends Of Friends translate(person, friends): aggregate(…): Menge der Freundes- freunde mischen
  • 49. Beispielanwendungen: 3. Reverse WebLink-Graph REVERSE WEB LINK GRAPH (Row-ID -> Columns) Google -> {eBay, Wikipedia} aggregate(…): -> {Google, Wikipedia} eBay Menge der LinksMensa-KL -> {Google} mischen Facebook -> {Google, Mensa- KL, Uni-KL} Wikipedia -> {Google} Uni-KL -> {Google, Wikipedia}
  • 50. Beispielanwendungen: 4. Bigrams Hi, kannst du mich ___?___ am Bahnhof abholen? So in etwa 10 ___?___. Viele liebe ___?__. P.S. Ich habe viel ___?___.
  • 51. Idee: Große Mengen von Text mit MapReduce analysieren.
  • 52.
  • 53.
  • 54. Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 55. Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 57. bitte Hi, kannst du mich ___ ___ am nicht Bahnhof abholen? So in etwa <num>Minuten <num> Jahre 10 ___ ___. Grüße dich Viele liebe ___ __. zu Spaß P.S. Ich habe viel ___ ___.
  • 58. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 59. WordCount 01:10 01:00 00:50 00:40 Zeit [hh:mm] 00:30 FULL INCDEC OVERWRITE 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 60. Reverse Weblink-Graph 02:51 02:41 02:31 02:21 02:11 02:00 01:50 01:40 Zeit [hh:mm] 01:30 FULL 01:20 INCDEC 01:10 OVERWRITE 01:00 00:50 00:40 00:30 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 62. Verwendete Grafiken Folie 5-9: Folie 37-44: Flammen und Smilie: Microsoft Office 2010 Puzzle: http://www.flickr.com/photos/dps/136565237/ Folie 10: Folie 46 - 48: Google: http://www.google.de Junge: Microsoft Office 2010 Folie 11: Folie 49: Amazon: http://www.amazon.de Google: http://www.google.de eBay: http://www.ebay.de Folie 12: Mensa-KL: http://www.mensa-kl.de Hadoop: http://hadoop.apache.org facebook: http://www.facebook.de Casio Wristwatch: Wikipedia: http://de.wikipedia.org http://www.flickr.com/photos/andresrueda/3448240252 TU Kaiserslautern: http://www.uni-kl.de Folie 16: Folie 50-51: Hadoop: http://hadoop.apache.org Handy: Microsoft Office 2010 Folie 17: Folie 56: Hadoop: http://hadoop.apache.org Wikipedia: http://de.wikipedia.org Notebook: Microsoft Office 2010 Twitter: http://www.twitter.com Folie 18: Folie 57: HBase: http://hbase.apache.org Handy: Microsoft Office 2010 Folie 31: Folie 58: Hadoop: http://hadoop.apache.org Hadoop: http://hadoop.apache.org Casio Wristwatch: Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252 http://www.flickr.com/photos/andresrueda/3448240252 Folie 32: Gerüst: http://www.flickr.com/photos/michale/94538528/ Hadoop: http://hadoop.apache.org Junge: Microsoft Office 2010
  • 63. Literaturverzeichnis (1/2) [0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten. Masterarbeit, TU Kaiserslautern, August 2012 [1] Apache Hadoop project. http://hadoop.apache.org/. [2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526. [3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010. http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/. [4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012. http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast. [5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012. http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1. [6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004. [7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011. [8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis. http://database.cs.brown.edu/projects/mapreduce-vs-dbms/. [9] Ricky Ho. Map/Reduce to recommend people connection, August 2010. http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html. [10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012. [11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views? In LADIS 2011, pages 1 – 5, 9 2011. [12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011. [13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce. [14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving- mapreduce-performance/.
  • 64. Literaturverzeichnis (2/2) [15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012. http://play.google.com/store/apps/details?id=com.touchtype.swiftkey. [16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf. [17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures. ICDE, pages 183–194, 2011. [18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012. http://heise.de/-1569837. [19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009. http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7. [20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern, Juni 2011. [21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. [22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012. http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/. [23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012. [24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009. http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation. [25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009. [26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012. http://www.formtek.com/blog/?p=3032. [27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60