SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Hadoop Conference Japan 2013 Winter

   Huahin Framework
          for
       Hadoop


                       JJaann  2211,,  22001133
                      @@rryyuu__kkoobbaayyaasshhii
•   Ryu Kobayashi (@ryu_kobayashi)

•   BrainPad Inc.

•   Hadoop, Cassandra, Machine Learning, ...


      AD



                    Now on sale!!!
What is
  Huahin
Framework?
Huahin Framework
http://huahinframework.org
Hadoop Family




   Logo is ...
Huahin logo is ...




Very very very cute!
Huahin Framework
                http://huahinframework.org

We released some software which developed in an office
in June 2012 as OSS.

 * It is what was used in the panel log analysis.
 * Please refer to the slide of the "Hadoop Conference
Japan 2011 Fall" for more information.

  http://goo.gl/C9tzf

Huahin Framework is a general term for multiple
products.
Huahin Framework
                    http://huahinframework.org

The origin of the name of Huahin Framework
  There is a custom to decide on a wine region in the code name
  of the office.

  Huahin = Hua Hin = Tourist destinations in Thailand = Wine region

  When it comes to Thailand...

  Tt is the elephant !

  As such, Huahin




                                                        image
Huahin Framework
              http://huahinframework.org

Huahin Framework Configuration

 Main is consists of the following elements:
 •   Huahin Core
 •   Huahin Tools
 •   Huahin Manager
Huahin Framework
               http://huahinframework.org

Huahin Core
 •   Simplified MapReduce programs
 •   Do not have to write it yourself Writable and
     Secondary Sort
 •   The basic grouping, sorting, etc., the idea from SQL
 •   If you want to write, can write natural MapReduce
 •   C++ is the same as a superset of C
 •   It can do Hive or Pig. However, if it really want to give
     the performances.(Parallel computation, etc...)

 • There Huahin Unit as a test driver
   • Wraps the MRUnit
 • Example of implementation
Huahin Framework
              http://huahinframework.org



Huahin Example
 •   Page top 10 rank example

First, natural MapReduce.
Second, Huahin MapReduce.
Huahin Framework
    http://huahinframework.org
 Data of page top 10 rank
    Example: format is Tab delimited

Jan 21, 2013   user1    /index.html
Jan 21, 2013   user1    /index2.html
Jan 21, 2013   user2    /contents/foo.html
Jan 21, 2013   user42   /bar.html
Jan 21, 2013   user3    /index.html
Jan 21, 2013   user7    /news/index.html
Jan 21, 2013   user4    /release/2013.html
Jan 21, 2013   user3    /index2.html
Jan 21, 2013   user7    /download.html
Jan 21, 2013   user5    /bar.html
Jan 21, 2013   user12   /release/2012.html
Jan 21, 2013   user5    /contents/foo.html
Jan 21, 2013   user23   /page2.html
Jan 21, 2013   user53   /news.html
Jan 21, 2013   user6    /download.html
Jan 21, 2013   user21   /bar.html
Jan 21, 2013   user18   /index.html
Huahin Framework
                 http://huahinframework.org
Page top 10 rank of natural MapReduce
                                                      JobTools
         public class PathTop10RankJobTool extends Configured implements Tool {
           @Override
           public int run(String[] arg0) throws Exception {
             Job firstJob = new Job(getConf(), "first");
             firstJob.setJarByClass(PathTop10RankJobTool.class);

                 TextInputFormat.setInputPaths(firstJob, "input");
                 firstJob.setInputFormatClass(TextInputFormat.class);

                 firstJob.setMapperClass(PathTop10RankFirstMapper.class);
                 firstJob.setMapOutputKeyClass(FirstKeyWritable.class);
                 firstJob.setMapOutputValueClass(IntWritable.class);

                 firstJob.setReducerClass(PathTop10RankFirstReducer.class);
                 firstJob.setOutputKeyClass(SecondKeyWritable.class);
                 firstJob.setOutputValueClass(IntWritable.class);

                 SequenceFileOutputFormat.setOutputPath(firstJob, new Path("first"));
                 firstJob.setOutputFormatClass(SequenceFileOutputFormat.class);

                 if (!firstJob.waitForCompletion(true)) {
                    return -1;
                 }

                 Job secondJob = new Job(getConf(), "second");
                 secondJob.setJarByClass(PathTop10RankJobTool.class);

                 SequenceFileInputFormat.setInputPaths(secondJob, "first");
                 secondJob.setInputFormatClass(SequenceFileInputFormat.class);

                 secondJob.setMapperClass(Mapper.class);
                 secondJob.setMapOutputKeyClass(SecondKeyWritable.class);
                 secondJob.setMapOutputValueClass(IntWritable.class);

                 secondJob.setGroupingComparatorClass(PathTop10RankGroupingComparatorClass.class);
                 secondJob.setPartitionerClass(PathTop10RankPartitioner.class);
                 secondJob.setSortComparatorClass(PathTop10RankingSortComparator.class);

                 secondJob.setReducerClass(PathTop10RankSecondReducer.class);
                 secondJob.setOutputKeyClass(SecondKeyWritable.class);
                 secondJob.setOutputValueClass(IntWritable.class);

                 TextOutputFormat.setOutputPath(secondJob, new Path("output"));
                 secondJob.setOutputFormatClass(TextOutputFormat.class);

                 return secondJob.waitForCompletion(true) ? 0 : -1;
             }
         }
Huahin Framework
                 http://huahinframework.org
Page top 10 rank of natural MapReduce
                                                FirstMapper
         public class PathTop10RankFirstMapper
             extends Mapper<LongWritable, Text, FirstKeyWritable, IntWritable> {
           private IntWritable ONE = new IntWritable(1);

             @Override
             protected void map(LongWritable key, Text value, Context context)
                 throws IOException, InterruptedException {
               String[] s = value.toString().split("t");
               context.write(new FirstKeyWritable(s[0], s[2]), ONE);
             }
         }




                                                FirstReducer
         public class PathTop10RankFirstReducer
             extends Reducer<FirstKeyWritable, IntWritable, SecondKeyWritable, IntWritable> {
           @Override
           protected void reduce(FirstKeyWritable key, Iterable<IntWritable> values, Context context)
               throws IOException, InterruptedException {
             int pv = 0;
             for (IntWritable i : values) {
               pv += i.get();
             }

                 context.write(
                     new SecondKeyWritable(key.getDate().toString(), key.getPage().toString(), pv),
                     new IntWritable(pv));
             }
         }
Huahin Framework
                 http://huahinframework.org
Page top 10 rank of natural MapReduce

                                              SecondReducer
         public class PathTop10RankSecondReducer
             extends Reducer<SecondKeyWritable, IntWritable, SecondKeyWritable, IntWritable> {
           @Override
           protected void reduce(SecondKeyWritable key, Iterable<IntWritable> values, Context context)
               throws IOException, InterruptedException {
             int rank = 0;
             for (IntWritable i : values) {
               if (rank > 10) {
                  break;
               }

                     context.write(key, i);
                     rank++;
                 }
             }

         }
Huahin Framework
                                                                                  http://huahinframework.org
                                Page top 10 rank of natural MapReduce
                                                          FirstKeyWritable                                                                        SecondKeyWritable
                                                                                             public class SecondKeyWritable implements WritableComparable<SecondKeyWritable> {
public class FirstKeyWritable implements WritableComparable<FirstKeyWritable> {                private Text date = new Text();
  private Text date = new Text();                                                              private Text page = new Text();
  private Text page = new Text();                                                              private IntWritable pv = new IntWritable();
    public FirstKeyWritable() {                                                                  public SecondKeyWritable() {
    }                                                                                            }
    public FirstKeyWritable(String date, String page) {                                          public SecondKeyWritable(String date, String page, int pv) {
      this.date.set(date);                                                                         this.date.set(date);
      this.page.set(page);                                                                         this.page.set(page);
    }                                                                                              this.pv.set(pv);
                                                                                                 }
    @Override
    public void readFields(DataInput in) throws IOException {                                    @Override
      this.date.readFields(in);                                                                  public void readFields(DataInput in) throws IOException {
      this.page.readFields(in);                                                                    this.date.readFields(in);
    }                                                                                              this.page.readFields(in);
                                                                                                   this.pv.readFields(in);
    @Override                                                                                    }
    public void write(DataOutput out) throws IOException {
      this.date.write(out);                                                                      @Override
      this.page.write(out);                                                                      public void write(DataOutput out) throws IOException {
    }                                                                                              this.date.write(out);
                                                                                                   this.page.write(out);
    @Override                                                                                      this.pv.write(out);
    public int compareTo(FirstKeyWritable o) {                                                   }
      int compare = this.date.toString().compareTo(o.date.toString());
      if (compare != 0) {                                                                        @Override
         return compare;                                                                         public int compareTo(SecondKeyWritable o) {
      }                                                                                            return this.date.toString().compareTo(o.date.toString());
      return this.page.toString().compareTo(o.page.toString());                                  }
    }
                                                                                                 @Override
    @Override                                                                                    public boolean equals(Object obj) {
    public boolean equals(Object obj) {                                                            if (obj == null) {
      if (obj == null) {                                                                              return false;
         return false;                                                                             }
      }
                                                                                                     if (!(obj instanceof SecondKeyWritable)) {
        if (!(obj instanceof FirstKeyWritable)) {                                                       return false;
           return false;                                                                             }
        }
                                                                                                     SecondKeyWritable o = (SecondKeyWritable) obj;
        FirstKeyWritable o = (FirstKeyWritable) obj;                                                 return this.date.equals(o.getDate());
        return this.date.equals(o.getDate()) &&                                                  }
            this.page.equals(o.getPage());
    }                                                                                            @Override
                                                                                                 public String toString() {
    /**                                                                                            return this.date + "t" + this.page;
     * @return the date                                                                          }
     */
    public Text getDate() {                                                                      /**
       return date;                                                                               * @return the date
    }                                                                                             */
                                                                                                 public Text getDate() {
    /**                                                                                             return date;
     * @param date the date to set                                                               }
     */
    public void setDate(Text date) {                                                             /**
       this.date = date;                                                                          * @param date the date to set
    }                                                                                             */
                                                                                                 public void setDate(Text date) {
    /**                                                                                             this.date = date;
     * @return the page                                                                          }
     */
    public Text getPage() {                                                                      /**
       return page;                                                                               * @return the page
    }                                                                                             */
                                                                                                 public Text getPage() {
    /**                                                                                             return page;
     * @param page the page to set                                                               }
     */
    public void setPage(Text page) {                                                             /**
       this.page = page;                                                                          * @param page the page to set
    }                                                                                             */
}                                                                                                public void setPage(Text page) {
                                                                                                    this.page = page;
                                                                                                 }

                                                                                                 /**
                                                                                                  * @return the pv
                                                                                                  */
                                                                                                 public IntWritable getPv() {
                                                                                                    return pv;
                                                                                                 }

                                                                                                 /**
                                                                                                  * @param pv the pv to set
                                                                                                  */
                                                                                                 public void setPv(IntWritable pv) {
                                                                                                    this.pv = pv;
                                                                                                 }
                                                                                             }
Huahin Framework
                                                               http://huahinframework.org
                Page top 10 rank of natural MapReduce
                        GroupingComparator                                                                                    SortComparator
public class PathTop10RankGroupingComparatorClass extends WritableComparator {                public class PathTop10RankingSortComparator extends WritableComparator {
  public PathTop10RankGroupingComparatorClass() {                                               public PathTop10RankingSortComparator() {
    super(SecondKeyWritable.class, true);                                                         super(SecondKeyWritable.class, true);
  }                                                                                             }

    @SuppressWarnings({ "rawtypes", "unchecked" })                                                @SuppressWarnings({ "rawtypes", "unchecked" })
    @Override                                                                                     @Override
    public int compare(Object a, Object b) {                                                      public int compare(Object a, Object b) {
      if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) {                       if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) {
         Comparable one = SecondKeyWritable.class.cast(a).getDate();                                   Comparable one = SecondKeyWritable.class.cast(a).getDate();
         Comparable another = SecondKeyWritable.class.cast(b).getDate();                               Comparable another = SecondKeyWritable.class.cast(b).getDate();
         return one.compareTo(another);
      }                                                                                                 int compare = one.compareTo(another);
      return super.compare(a, b);                                                                       if (compare != 0) {
    }                                                                                                      return compare;
}                                                                                                       }

                                                                                                        Comparable oneOrder = SecondKeyWritable.class.cast(a).getPv();
                                                                                                        Comparable anotherOrder = SecondKeyWritable.class.cast(b).getPv();
                                                                                                        return oneOrder.compareTo(anotherOrder);
                                                                                                      }
                                                                                                      return super.compare(a, b);

                                      Partitioner                                             }
                                                                                                  }




public class PathTop10RankPartitioner extends Partitioner<SecondKeyWritable, IntWritable> {
  @Override
  public int getPartition(SecondKeyWritable key, IntWritable value, int numPartitioner) {
    return Math.abs(key.getDate().hashCode()) % numPartitioner;
  }
}
Huahin Framework
               http://huahinframework.org




Page top 10 rank of natural MapReduce
 •   This is a very long ...
 •   About 307 lines
Huahin Framework
                                                                       http://huahinframework.org
                                  Page top 10 rank of Huahin MapReduce
                                                           JobTools                                                       FirstSummarizer
public class PathRankingJobTool extends SimpleJobTool {                           public class FirstSummarizer extends Summarizer {
  @Override                                                                         @Override
  protected String setInputPath(String[] args) {                                    public void init() {
    return args[0];                                                                 }
  }
                                                                                      @Override
    @Override                                                                         public void summarize(Writer writer)
    protected String setOutputPath(String[] args) {                                       throws IOException, InterruptedException {
      return args[1];                                                                   int pv = 0;
    }                                                                                   while (hasNext()) {
                                                                                          Record record = next(writer);
    /* (non-Javadoc)                                                                      pv += record.getValueInteger("PV");
    * @see org.huahin.core.SimpleJobTool#setup()                                        }
    */
    @Override                                                                             Record emitRecord = new Record();
    protected void setup() throws Exception {                                             emitRecord.addGrouping("DATE", getGroupingRecord().getGroupingString("DATE"));
      final String[] labels = new String[] { "DATE", "USER", "URL" };                      emitRecord.addSort(pv, Record.SORT_UPPER, 1);
                                                                                          emitRecord.addValue("PATH", getGroupingRecord().getGroupingString("PATH"));
        SimpleJob job1 = addJob(labels, StringUtil.TAB);                                  emitRecord.addValue("PV", pv);
        job1.setFilter(FirstFilter.class);                                                writer.write(emitRecord);
        job1.setSummarizer(FirstSummarizer.class);                                    }

        SimpleJob job2 = addJob();                                                    @Override
        job2.setSummarizer(SecondSummarizer.class);                                   public void summarizerSetup() {
    }                                                                                 }
}                                                                                 }




                                                      FirstFilter                                                         SecondSummarizer
public class FirstFilter extends Filter {                                         public class SecondSummarizer extends Summarizer {
  @Override                                                                         @Override
  public void init() {                                                              public void init() {
  }                                                                                 }

    @Override                                                                         @Override
    public void filter(Record record, Writer writer)                                   public void summarize(Writer writer)
        throws IOException, InterruptedException {                                        throws IOException, InterruptedException {
      Record emitRecord = new Record();                                                 int rank = 1;
      emitRecord.addGrouping("DATE", record.getValueString("DATE"));                    while (hasNext()) {
      emitRecord.addGrouping("PATH", record.getValueString("URL"));                       if (rank > 10) {
      emitRecord.addValue("PV", 1);                                                          break;
      writer.write(emitRecord);                                                           }
    }
                                                                                              Record record = next(writer);
    @Override                                                                                 Record emitRecord = new Record();
    public void filterSetup() {                                                                emitRecord.addValue("PATH", record.getValueString("PATH"));
    }                                                                                         emitRecord.addValue("UU", record.getValueInteger("UU"));
}
                                                                                              writer.write(emitRecord);
                                                                                              rank++;
                                                                                          }
                                                                                      }

                                                                                      @Override
                                                                                      public void summarizerSetup() {
                                                                                      }
                                                                                  }
Huahin Framework
               http://huahinframework.org




Page top 10 rank of Huahin MapReduce
 •   This is a very short!!
 •   About 100 lines
Huahin Framework
              http://huahinframework.org

Huahin Core
 •   Other
     • Simple Join
     • Big Join
     • etc ...
Huahin Framework
            http://huahinframework.org

Huahin Tools
 • A collection of tools generic operation.
   • Currently only Apache Log molding...
 • Operating environment
   • On Premises Hadoop
   • Stand Alone
     • Multi Thread execution for small data
   • EMR
     • S3://huahin/tools/huahin-tools.0.1.0.jar
Huahin Framework
                 http://huahinframework.org

Huahin Manager
 •   Manager to manage the MapReduce Job
     • Get the Job list
     • Get the Job detail
     • Kill Job
     • Execution Job
         •
         Run queue management
             •
             MapReduce Jar
             •
             Hive Scripts
             •
             Pig Scripts
     • Execution Hive Query
     • Execution Pig Latin
 •   Execution is done in all the REST API.
 •   Supported Apache Hadoop 1.0.X and 2.0.2-alpha
 •   Supported CDH3 and CDH4
Huahin Framework
              http://huahinframework.org

Huahin Manager
 •   For 2.0.2-alpha and CDH4
     • Getting the Application list
     • Getting the Cluster info
     • Kill Application
     • Proxy to YARN APIs
Huahin Framework
                     http://huahinframework.org
Huahin Manager
 •   EMR Support
     • Setting bootstrap
       s3://huahin/manager/configure
     • Security group setting in order to access the REST API.
         •
         Security group that you set will be created during the
         startup of the EMR.
         ElasticMapReduce-master
             •
            Values to be set
                 •
               Port range: 9010
                 •
               Source: IP addresses that are allowed to connect
Huahin Framework
           http://huahinframework.org

Huahin Manager
 Operating environment of Huahin Manager



Huahin         Various     HiveServer(1and 2)
Manager       operations




                                 Hadoop
                                 Cluster

REST API
Huahin Framework
                 http://huahinframework.org
Huahin EManager
 Manager that specializes in EMR

 •   Manager to manage the Job Flow
     •Get the Job Flow list
     •Get the Job Flow detail
     •Kill Job Flow Step
     •Execution Job
         •
        Run queue management
             •
            Register of queue
             •
            Get the queue detail
             •
            Remove queue
Huahin Framework
              http://huahinframework.org

Huahin EManager
 •   Register queue
     • The following functions can be assigned to the queue
       at the EMR supports.
         •Hive
         •Pig
         •Streaming
         •Custom JAR
     • EManager can specify the cluster size to be started.
       EManager assign a queue to a cluster that is free.
       (EMR to be a good point to bring up multiple
       cluster!)
Huahin Framework
                        http://huahinframework.org

Huahin EManager
    Operating environment of Huahin EManager
                                                       Huahin Manager will
                                                        be started by the
                                    Various               Master node
                                   operations              bootstrap.
         On premises                             Amazon
              or
         EC2 Instance
                                                  Elastic
                                                MapReduce
                         Huahin
                                                       Huahin Manager will
                        EManager
                                    Various             be started by the
                                                          Master node
                                   operations              bootstrap.
                                                 Amazon
                                                  Elastic
                                                MapReduce
     REST API


※ NOTICE: Setup the security group
Huahin Framework
            http://huahinframework.org

Huahin EManager
 Operating environment of Huahin EManager

 The place that is different when EManager starts in
 Management Console and Tools.
  •  EManager recycle one Job Flow
     Not attempt to start and end every time the
     EMR.Order to save costs and performances.
     ※ It Currently can not Management Console. However,
     Can be done from the command line and SDK.

  •   However, reboot automatically when the upper limit of
      the number reaches 255 Step.
Huahin Framework
             http://huahinframework.org

Huahin EManager
 Operating environment of Huahin EManager

 The place that is different when EManager starts in
 Management Console and Tools.
     •  It is booting for one hour
         • for cost(accounting and performance)
         • It do shutdown automatically before the timing
           charged.
         • However, if it were running the Job is carried over
           to the next billing timing.
Huahin Framework
             http://huahinframework.org

Huahin EManager
 Register queue

 Done using the PUT or POST method of registration of the
 queue.
     •  PUT:If it have a script or JAR on the S3, It do Job
         Flow or only the execution of Step.
     •   POST:Place the JAR or script in the local to S3.
         Boot and execution Step of Job Flow. It is a feature
         not in the EMR. And, option to remove the files that
         were POST.
     •   All registration is done in JSON.
Huahin Framework
                        http://huahinframework.org

     Huahin EManager
        Register queue
Examples of PUT in the Hive:
$ curl -X PUT http://localhost:9020/queue/register/hive 
  -F ARGUMENTS='{"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}'

Optional arguments of JSON


Examples of POST in the Hive:
$ curl -X POST http://localhost:9020/queue/register/hive 
  -F SCRIPT=@wordcount.hql
  -F ARGUMENTS='{"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}'

Optional arguments of JSON
Deleted after execution by setting the "true": "deleteOnExit"
It no default deleted.
Huahin Framework
                        http://huahinframework.org

     Huahin EManager
        List of Job Flow


Example of Get all Job Flow list:
$ curl -X GET http://localhost:9020/jobflow/list




Example of get running Job Flow list:
$ curl -X GET http://localhost:9020/jobflow/runnings




Example of Job Flow detail:
$ curl -X GET http://localhost:9020/jobflow/describe/j-XXXXXXXXXXXX
Huahin Framework
                        http://huahinframework.org

     Huahin EManager
        Queue API
Example of registered queue list:
$ curl -X GET http://localhost:9020/queue/list




Example of runnings queue list:
$ curl -X GET http://localhost:9020/queue/runnings




Example of get queue detail:
$ curl -X GET http://localhost:9020/queue/describe/S_XXXXXXXXXXXX



Example of delete queue:
$ curl -X DELETE http://localhost:9020/queue/kill/S_XXXXXXXXXXXX
Huahin Framework
               http://huahinframework.org
Huahin EManager
  Kill of Job
There is a command to kill the Job running on Hadoop.

hadoop job -kill job_XXXXXXXXXX


However, there is no function that EMR. If start a Job by
mistake, there is no choice but to terminate the Job Flow.

It will be able to kill by SSH to connect to the master node of
the EMR, type the above command.

Troublesome...
Huahin Framework
                 http://huahinframework.org

Huahin EManager
   Kill of Job
It made 

possible the Kill API from EManager (Manager)!




 Example of Step kill:
 $ curl -X DELETE http://localhost:9020/jobflow/kill/step/S_XXXXXXXXXXXX
Huahin Framework
              http://huahinframework.org
Conclusion

 •   Huahin Core
     • Unlike the Hive and Pig
     • When it want to use MapReduce to some extent the
       natural.
 •   Huahin Tools
   •   Still...
 •   Huahin Manager
   •   All REST API operation
   •   Integration with other systems
 •   Huahin EManager
   •   Integration with other systems
   •   Cost and Performance management
   •   Kill Step of Job Flow!
Huahin Framework
                http://huahinframework.org

The current version
 •   Huahin Core 0.1.4
 •   Huahin Unit 0.1.4
 •   Huahin Tools 0.1.0
 •   Huahin Manager
     • 0.1.4 for Apache Hadoop 1.0.4
     • 0.1.4 for CDH3
     • 0.2.1 for Apache hadoop 2.0.2-alpha
     • 0.2.1 for CDH4
 •   Huahin EManager 0.1.1
Thanks!!!

Weitere ähnliche Inhalte

Was ist angesagt?

Design Patterns Reconsidered
Design Patterns ReconsideredDesign Patterns Reconsidered
Design Patterns ReconsideredAlex Miller
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Oliver Gierke
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."sjabs
 
Joose - JavaScript Meta Object System
Joose - JavaScript Meta Object SystemJoose - JavaScript Meta Object System
Joose - JavaScript Meta Object Systemmalteubl
 
Wed 1630 greene_robert_color
Wed 1630 greene_robert_colorWed 1630 greene_robert_color
Wed 1630 greene_robert_colorDATAVERSITY
 
A evolução da persistência de dados (com sqlite) no android
A evolução da persistência de dados (com sqlite) no androidA evolução da persistência de dados (com sqlite) no android
A evolução da persistência de dados (com sqlite) no androidRodrigo de Souza Castro
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 SpringKiyotaka Oku
 
Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Jonas Bonér
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentationOleksii Usyk
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaMongoDB
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180Mahmoud Samir Fayed
 
Actionsscript Cheat Sheet Letter
Actionsscript Cheat Sheet LetterActionsscript Cheat Sheet Letter
Actionsscript Cheat Sheet Letterguest2a6b08
 
Actionsscript cheat sheet_letter
Actionsscript cheat sheet_letterActionsscript cheat sheet_letter
Actionsscript cheat sheet_letterRadik Setagalih
 

Was ist angesagt? (19)

Design Patterns Reconsidered
Design Patterns ReconsideredDesign Patterns Reconsidered
Design Patterns Reconsidered
 
Scala in practice
Scala in practiceScala in practice
Scala in practice
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
 
Data Binding in qooxdoo
Data Binding in qooxdooData Binding in qooxdoo
Data Binding in qooxdoo
 
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
Kamil Chmielewski, Jacek Juraszek - "Hadoop. W poszukiwaniu złotego młotka."
 
Joose - JavaScript Meta Object System
Joose - JavaScript Meta Object SystemJoose - JavaScript Meta Object System
Joose - JavaScript Meta Object System
 
Wed 1630 greene_robert_color
Wed 1630 greene_robert_colorWed 1630 greene_robert_color
Wed 1630 greene_robert_color
 
Spock and Geb
Spock and GebSpock and Geb
Spock and Geb
 
A evolução da persistência de dados (com sqlite) no android
A evolução da persistência de dados (com sqlite) no androidA evolução da persistência de dados (com sqlite) no android
A evolução da persistência de dados (com sqlite) no android
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 Spring
 
Mattbrenner
MattbrennerMattbrenner
Mattbrenner
 
Ajax cheat sheet
Ajax cheat sheetAjax cheat sheet
Ajax cheat sheet
 
front-end dev
front-end devfront-end dev
front-end dev
 
Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180
 
Actionsscript Cheat Sheet Letter
Actionsscript Cheat Sheet LetterActionsscript Cheat Sheet Letter
Actionsscript Cheat Sheet Letter
 
Actionsscript cheat sheet_letter
Actionsscript cheat sheet_letterActionsscript cheat sheet_letter
Actionsscript cheat sheet_letter
 

Andere mochten auch

Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallHadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallRyu Kobayashi
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Past Present and Future of Data Processing in Apache Hadoop
Past Present and Future of Data Processing in Apache HadoopPast Present and Future of Data Processing in Apache Hadoop
Past Present and Future of Data Processing in Apache HadoopDataWorks Summit
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLRyu Kobayashi
 
Securing big data (july 2012)
Securing big data (july 2012)Securing big data (july 2012)
Securing big data (july 2012)Marc Vael
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
OpenStack Atlanta Summit for JOSUG
OpenStack Atlanta Summit for JOSUGOpenStack Atlanta Summit for JOSUG
OpenStack Atlanta Summit for JOSUGak-hasegawa
 
家島はがきツアー
家島はがきツアー家島はがきツアー
家島はがきツアーKyoko Matsuoka
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
JCドリームフェスタ出店に際しての案内
JCドリームフェスタ出店に際しての案内JCドリームフェスタ出店に際しての案内
JCドリームフェスタ出店に際しての案内Kyoko Matsuoka
 
Osc100th asiabsdcon
Osc100th asiabsdconOsc100th asiabsdcon
Osc100th asiabsdconJun Ebihara
 
Google Apps Japan Users Group #32 Members Talk(Lightning Talk)
Google Apps Japan Users Group #32 Members Talk(Lightning Talk)Google Apps Japan Users Group #32 Members Talk(Lightning Talk)
Google Apps Japan Users Group #32 Members Talk(Lightning Talk)Shigechika AIKAWA
 
Made In Japan - Akio Morita And SONY
Made In Japan - Akio Morita And SONYMade In Japan - Akio Morita And SONY
Made In Japan - Akio Morita And SONYSabin Nepal
 
What's new in LibreOffice 4.3
What's new in LibreOffice 4.3 What's new in LibreOffice 4.3
What's new in LibreOffice 4.3 Naruhiko Ogasawara
 
Japanese Open and Generative Design
Japanese Open and Generative DesignJapanese Open and Generative Design
Japanese Open and Generative DesignYuichi Yazaki
 

Andere mochten auch (20)

Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallHadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 Fall
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Past Present and Future of Data Processing in Apache Hadoop
Past Present and Future of Data Processing in Apache HadoopPast Present and Future of Data Processing in Apache Hadoop
Past Present and Future of Data Processing in Apache Hadoop
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
Securing big data (july 2012)
Securing big data (july 2012)Securing big data (july 2012)
Securing big data (july 2012)
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
OpenStack Atlanta Summit for JOSUG
OpenStack Atlanta Summit for JOSUGOpenStack Atlanta Summit for JOSUG
OpenStack Atlanta Summit for JOSUG
 
家島はがきツアー
家島はがきツアー家島はがきツアー
家島はがきツアー
 
映画
映画映画
映画
 
Vpn
VpnVpn
Vpn
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
JCドリームフェスタ出店に際しての案内
JCドリームフェスタ出店に際しての案内JCドリームフェスタ出店に際しての案内
JCドリームフェスタ出店に際しての案内
 
Osc100th asiabsdcon
Osc100th asiabsdconOsc100th asiabsdcon
Osc100th asiabsdcon
 
Google Apps Japan Users Group #32 Members Talk(Lightning Talk)
Google Apps Japan Users Group #32 Members Talk(Lightning Talk)Google Apps Japan Users Group #32 Members Talk(Lightning Talk)
Google Apps Japan Users Group #32 Members Talk(Lightning Talk)
 
Made In Japan - Akio Morita And SONY
Made In Japan - Akio Morita And SONYMade In Japan - Akio Morita And SONY
Made In Japan - Akio Morita And SONY
 
Openassets ruby
Openassets rubyOpenassets ruby
Openassets ruby
 
タイルの話
タイルの話タイルの話
タイルの話
 
What's new in LibreOffice 4.3
What's new in LibreOffice 4.3 What's new in LibreOffice 4.3
What's new in LibreOffice 4.3
 
Japanese Open and Generative Design
Japanese Open and Generative DesignJapanese Open and Generative Design
Japanese Open and Generative Design
 
Global cellular market trends
Global cellular market trends Global cellular market trends
Global cellular market trends
 

Ähnlich wie Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter

Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android InfrastructureAlexey Buzdin
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android InfrastructureC.T.Co
 
比XML更好用的Java Annotation
比XML更好用的Java Annotation比XML更好用的Java Annotation
比XML更好用的Java Annotationjavatwo2011
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusKoichi Fujikawa
 
Java Programming Must implement a storage manager that main.pdf
Java Programming Must implement a storage manager that main.pdfJava Programming Must implement a storage manager that main.pdf
Java Programming Must implement a storage manager that main.pdfadinathassociates
 
Modern Android app library stack
Modern Android app library stackModern Android app library stack
Modern Android app library stackTomáš Kypta
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mark Needham
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Designing a JavaFX Mobile application
Designing a JavaFX Mobile applicationDesigning a JavaFX Mobile application
Designing a JavaFX Mobile applicationFabrizio Giudici
 
The Workflow Pattern, Composed (2021)
The Workflow Pattern, Composed (2021)The Workflow Pattern, Composed (2021)
The Workflow Pattern, Composed (2021)Zach Klippenstein
 
Open XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand DechouxOpen XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand DechouxPublicis Sapient Engineering
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
JavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development ExperiencesJavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development ExperiencesPeter Pilgrim
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkLegacy Typesafe (now Lightbend)
 
Greach, GroovyFx Workshop
Greach, GroovyFx WorkshopGreach, GroovyFx Workshop
Greach, GroovyFx WorkshopDierk König
 
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseCodepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseSages
 

Ähnlich wie Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter (20)

Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
 
比XML更好用的Java Annotation
比XML更好用的Java Annotation比XML更好用的Java Annotation
比XML更好用的Java Annotation
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop Papyrus
 
Java Programming Must implement a storage manager that main.pdf
Java Programming Must implement a storage manager that main.pdfJava Programming Must implement a storage manager that main.pdf
Java Programming Must implement a storage manager that main.pdf
 
Modern Android app library stack
Modern Android app library stackModern Android app library stack
Modern Android app library stack
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Designing a JavaFX Mobile application
Designing a JavaFX Mobile applicationDesigning a JavaFX Mobile application
Designing a JavaFX Mobile application
 
The Workflow Pattern, Composed (2021)
The Workflow Pattern, Composed (2021)The Workflow Pattern, Composed (2021)
The Workflow Pattern, Composed (2021)
 
Open XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand DechouxOpen XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand Dechoux
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter PilgrimJavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
 
JavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development ExperiencesJavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development Experiences
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Greach, GroovyFx Workshop
Greach, GroovyFx WorkshopGreach, GroovyFx Workshop
Greach, GroovyFx Workshop
 
Ad java prac sol set
Ad java prac sol setAd java prac sol set
Ad java prac sol set
 
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseCodepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
 

Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter

  • 1. Hadoop Conference Japan 2013 Winter Huahin Framework for Hadoop JJaann 2211,, 22001133 @@rryyuu__kkoobbaayyaasshhii
  • 2. Ryu Kobayashi (@ryu_kobayashi) • BrainPad Inc. • Hadoop, Cassandra, Machine Learning, ... AD Now on sale!!!
  • 3. What is Huahin Framework?
  • 5. Hadoop Family Logo is ...
  • 6. Huahin logo is ... Very very very cute!
  • 7. Huahin Framework http://huahinframework.org We released some software which developed in an office in June 2012 as OSS.  * It is what was used in the panel log analysis.  * Please refer to the slide of the "Hadoop Conference Japan 2011 Fall" for more information.   http://goo.gl/C9tzf Huahin Framework is a general term for multiple products.
  • 8. Huahin Framework http://huahinframework.org The origin of the name of Huahin Framework There is a custom to decide on a wine region in the code name of the office. Huahin = Hua Hin = Tourist destinations in Thailand = Wine region When it comes to Thailand... Tt is the elephant ! As such, Huahin image
  • 9. Huahin Framework http://huahinframework.org Huahin Framework Configuration  Main is consists of the following elements: • Huahin Core • Huahin Tools • Huahin Manager
  • 10. Huahin Framework http://huahinframework.org Huahin Core • Simplified MapReduce programs • Do not have to write it yourself Writable and Secondary Sort • The basic grouping, sorting, etc., the idea from SQL • If you want to write, can write natural MapReduce • C++ is the same as a superset of C • It can do Hive or Pig. However, if it really want to give the performances.(Parallel computation, etc...) • There Huahin Unit as a test driver • Wraps the MRUnit • Example of implementation
  • 11. Huahin Framework http://huahinframework.org Huahin Example • Page top 10 rank example First, natural MapReduce. Second, Huahin MapReduce.
  • 12. Huahin Framework http://huahinframework.org Data of page top 10 rank Example: format is Tab delimited Jan 21, 2013 user1 /index.html Jan 21, 2013 user1 /index2.html Jan 21, 2013 user2 /contents/foo.html Jan 21, 2013 user42 /bar.html Jan 21, 2013 user3 /index.html Jan 21, 2013 user7 /news/index.html Jan 21, 2013 user4 /release/2013.html Jan 21, 2013 user3 /index2.html Jan 21, 2013 user7 /download.html Jan 21, 2013 user5 /bar.html Jan 21, 2013 user12 /release/2012.html Jan 21, 2013 user5 /contents/foo.html Jan 21, 2013 user23 /page2.html Jan 21, 2013 user53 /news.html Jan 21, 2013 user6 /download.html Jan 21, 2013 user21 /bar.html Jan 21, 2013 user18 /index.html
  • 13. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce JobTools public class PathTop10RankJobTool extends Configured implements Tool { @Override public int run(String[] arg0) throws Exception { Job firstJob = new Job(getConf(), "first"); firstJob.setJarByClass(PathTop10RankJobTool.class); TextInputFormat.setInputPaths(firstJob, "input"); firstJob.setInputFormatClass(TextInputFormat.class); firstJob.setMapperClass(PathTop10RankFirstMapper.class); firstJob.setMapOutputKeyClass(FirstKeyWritable.class); firstJob.setMapOutputValueClass(IntWritable.class); firstJob.setReducerClass(PathTop10RankFirstReducer.class); firstJob.setOutputKeyClass(SecondKeyWritable.class); firstJob.setOutputValueClass(IntWritable.class); SequenceFileOutputFormat.setOutputPath(firstJob, new Path("first")); firstJob.setOutputFormatClass(SequenceFileOutputFormat.class); if (!firstJob.waitForCompletion(true)) { return -1; } Job secondJob = new Job(getConf(), "second"); secondJob.setJarByClass(PathTop10RankJobTool.class); SequenceFileInputFormat.setInputPaths(secondJob, "first"); secondJob.setInputFormatClass(SequenceFileInputFormat.class); secondJob.setMapperClass(Mapper.class); secondJob.setMapOutputKeyClass(SecondKeyWritable.class); secondJob.setMapOutputValueClass(IntWritable.class); secondJob.setGroupingComparatorClass(PathTop10RankGroupingComparatorClass.class); secondJob.setPartitionerClass(PathTop10RankPartitioner.class); secondJob.setSortComparatorClass(PathTop10RankingSortComparator.class); secondJob.setReducerClass(PathTop10RankSecondReducer.class); secondJob.setOutputKeyClass(SecondKeyWritable.class); secondJob.setOutputValueClass(IntWritable.class); TextOutputFormat.setOutputPath(secondJob, new Path("output")); secondJob.setOutputFormatClass(TextOutputFormat.class); return secondJob.waitForCompletion(true) ? 0 : -1; } }
  • 14. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce FirstMapper public class PathTop10RankFirstMapper extends Mapper<LongWritable, Text, FirstKeyWritable, IntWritable> { private IntWritable ONE = new IntWritable(1); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] s = value.toString().split("t"); context.write(new FirstKeyWritable(s[0], s[2]), ONE); } } FirstReducer public class PathTop10RankFirstReducer extends Reducer<FirstKeyWritable, IntWritable, SecondKeyWritable, IntWritable> { @Override protected void reduce(FirstKeyWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int pv = 0; for (IntWritable i : values) { pv += i.get(); } context.write( new SecondKeyWritable(key.getDate().toString(), key.getPage().toString(), pv), new IntWritable(pv)); } }
  • 15. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce SecondReducer public class PathTop10RankSecondReducer extends Reducer<SecondKeyWritable, IntWritable, SecondKeyWritable, IntWritable> { @Override protected void reduce(SecondKeyWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int rank = 0; for (IntWritable i : values) { if (rank > 10) { break; } context.write(key, i); rank++; } } }
  • 16. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce FirstKeyWritable SecondKeyWritable public class SecondKeyWritable implements WritableComparable<SecondKeyWritable> { public class FirstKeyWritable implements WritableComparable<FirstKeyWritable> { private Text date = new Text(); private Text date = new Text(); private Text page = new Text(); private Text page = new Text(); private IntWritable pv = new IntWritable(); public FirstKeyWritable() { public SecondKeyWritable() { } } public FirstKeyWritable(String date, String page) { public SecondKeyWritable(String date, String page, int pv) { this.date.set(date); this.date.set(date); this.page.set(page); this.page.set(page); } this.pv.set(pv); } @Override public void readFields(DataInput in) throws IOException { @Override this.date.readFields(in); public void readFields(DataInput in) throws IOException { this.page.readFields(in); this.date.readFields(in); } this.page.readFields(in); this.pv.readFields(in); @Override } public void write(DataOutput out) throws IOException { this.date.write(out); @Override this.page.write(out); public void write(DataOutput out) throws IOException { } this.date.write(out); this.page.write(out); @Override this.pv.write(out); public int compareTo(FirstKeyWritable o) { } int compare = this.date.toString().compareTo(o.date.toString()); if (compare != 0) { @Override return compare; public int compareTo(SecondKeyWritable o) { } return this.date.toString().compareTo(o.date.toString()); return this.page.toString().compareTo(o.page.toString()); } } @Override @Override public boolean equals(Object obj) { public boolean equals(Object obj) { if (obj == null) { if (obj == null) { return false; return false; } } if (!(obj instanceof SecondKeyWritable)) { if (!(obj instanceof FirstKeyWritable)) { return false; return false; } } SecondKeyWritable o = (SecondKeyWritable) obj; FirstKeyWritable o = (FirstKeyWritable) obj; return this.date.equals(o.getDate()); return this.date.equals(o.getDate()) && } this.page.equals(o.getPage()); } @Override public String toString() { /** return this.date + "t" + this.page; * @return the date } */ public Text getDate() { /** return date; * @return the date } */ public Text getDate() { /** return date; * @param date the date to set } */ public void setDate(Text date) { /** this.date = date; * @param date the date to set } */ public void setDate(Text date) { /** this.date = date; * @return the page } */ public Text getPage() { /** return page; * @return the page } */ public Text getPage() { /** return page; * @param page the page to set } */ public void setPage(Text page) { /** this.page = page; * @param page the page to set } */ } public void setPage(Text page) { this.page = page; } /** * @return the pv */ public IntWritable getPv() { return pv; } /** * @param pv the pv to set */ public void setPv(IntWritable pv) { this.pv = pv; } }
  • 17. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce GroupingComparator SortComparator public class PathTop10RankGroupingComparatorClass extends WritableComparator { public class PathTop10RankingSortComparator extends WritableComparator { public PathTop10RankGroupingComparatorClass() { public PathTop10RankingSortComparator() { super(SecondKeyWritable.class, true); super(SecondKeyWritable.class, true); } } @SuppressWarnings({ "rawtypes", "unchecked" }) @SuppressWarnings({ "rawtypes", "unchecked" }) @Override @Override public int compare(Object a, Object b) { public int compare(Object a, Object b) { if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) { if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) { Comparable one = SecondKeyWritable.class.cast(a).getDate(); Comparable one = SecondKeyWritable.class.cast(a).getDate(); Comparable another = SecondKeyWritable.class.cast(b).getDate(); Comparable another = SecondKeyWritable.class.cast(b).getDate(); return one.compareTo(another); } int compare = one.compareTo(another); return super.compare(a, b); if (compare != 0) { } return compare; } } Comparable oneOrder = SecondKeyWritable.class.cast(a).getPv(); Comparable anotherOrder = SecondKeyWritable.class.cast(b).getPv(); return oneOrder.compareTo(anotherOrder); } return super.compare(a, b); Partitioner } } public class PathTop10RankPartitioner extends Partitioner<SecondKeyWritable, IntWritable> { @Override public int getPartition(SecondKeyWritable key, IntWritable value, int numPartitioner) { return Math.abs(key.getDate().hashCode()) % numPartitioner; } }
  • 18. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce • This is a very long ... • About 307 lines
  • 19. Huahin Framework http://huahinframework.org Page top 10 rank of Huahin MapReduce JobTools FirstSummarizer public class PathRankingJobTool extends SimpleJobTool { public class FirstSummarizer extends Summarizer { @Override @Override protected String setInputPath(String[] args) { public void init() { return args[0]; } } @Override @Override public void summarize(Writer writer) protected String setOutputPath(String[] args) { throws IOException, InterruptedException { return args[1]; int pv = 0; } while (hasNext()) { Record record = next(writer); /* (non-Javadoc) pv += record.getValueInteger("PV"); * @see org.huahin.core.SimpleJobTool#setup() } */ @Override Record emitRecord = new Record(); protected void setup() throws Exception { emitRecord.addGrouping("DATE", getGroupingRecord().getGroupingString("DATE")); final String[] labels = new String[] { "DATE", "USER", "URL" }; emitRecord.addSort(pv, Record.SORT_UPPER, 1); emitRecord.addValue("PATH", getGroupingRecord().getGroupingString("PATH")); SimpleJob job1 = addJob(labels, StringUtil.TAB); emitRecord.addValue("PV", pv); job1.setFilter(FirstFilter.class); writer.write(emitRecord); job1.setSummarizer(FirstSummarizer.class); } SimpleJob job2 = addJob(); @Override job2.setSummarizer(SecondSummarizer.class); public void summarizerSetup() { } } } } FirstFilter SecondSummarizer public class FirstFilter extends Filter { public class SecondSummarizer extends Summarizer { @Override @Override public void init() { public void init() { } } @Override @Override public void filter(Record record, Writer writer) public void summarize(Writer writer) throws IOException, InterruptedException { throws IOException, InterruptedException { Record emitRecord = new Record(); int rank = 1; emitRecord.addGrouping("DATE", record.getValueString("DATE")); while (hasNext()) { emitRecord.addGrouping("PATH", record.getValueString("URL")); if (rank > 10) { emitRecord.addValue("PV", 1); break; writer.write(emitRecord); } } Record record = next(writer); @Override Record emitRecord = new Record(); public void filterSetup() { emitRecord.addValue("PATH", record.getValueString("PATH")); } emitRecord.addValue("UU", record.getValueInteger("UU")); } writer.write(emitRecord); rank++; } } @Override public void summarizerSetup() { } }
  • 20. Huahin Framework http://huahinframework.org Page top 10 rank of Huahin MapReduce • This is a very short!! • About 100 lines
  • 21. Huahin Framework http://huahinframework.org Huahin Core • Other • Simple Join • Big Join • etc ...
  • 22. Huahin Framework http://huahinframework.org Huahin Tools • A collection of tools generic operation. • Currently only Apache Log molding... • Operating environment • On Premises Hadoop • Stand Alone • Multi Thread execution for small data • EMR • S3://huahin/tools/huahin-tools.0.1.0.jar
  • 23. Huahin Framework http://huahinframework.org Huahin Manager • Manager to manage the MapReduce Job • Get the Job list • Get the Job detail • Kill Job • Execution Job • Run queue management • MapReduce Jar • Hive Scripts • Pig Scripts • Execution Hive Query • Execution Pig Latin • Execution is done in all the REST API. • Supported Apache Hadoop 1.0.X and 2.0.2-alpha • Supported CDH3 and CDH4
  • 24. Huahin Framework http://huahinframework.org Huahin Manager • For 2.0.2-alpha and CDH4 • Getting the Application list • Getting the Cluster info • Kill Application • Proxy to YARN APIs
  • 25. Huahin Framework http://huahinframework.org Huahin Manager • EMR Support • Setting bootstrap s3://huahin/manager/configure • Security group setting in order to access the REST API. • Security group that you set will be created during the startup of the EMR. ElasticMapReduce-master • Values to be set • Port range: 9010 • Source: IP addresses that are allowed to connect
  • 26. Huahin Framework http://huahinframework.org Huahin Manager Operating environment of Huahin Manager Huahin Various HiveServer(1and 2) Manager operations Hadoop Cluster REST API
  • 27. Huahin Framework http://huahinframework.org Huahin EManager Manager that specializes in EMR • Manager to manage the Job Flow •Get the Job Flow list •Get the Job Flow detail •Kill Job Flow Step •Execution Job • Run queue management • Register of queue • Get the queue detail • Remove queue
  • 28. Huahin Framework http://huahinframework.org Huahin EManager • Register queue • The following functions can be assigned to the queue at the EMR supports. •Hive •Pig •Streaming •Custom JAR • EManager can specify the cluster size to be started. EManager assign a queue to a cluster that is free. (EMR to be a good point to bring up multiple cluster!)
  • 29. Huahin Framework http://huahinframework.org Huahin EManager Operating environment of Huahin EManager Huahin Manager will be started by the Various Master node operations bootstrap. On premises Amazon or EC2 Instance Elastic MapReduce Huahin Huahin Manager will EManager Various be started by the Master node operations bootstrap. Amazon Elastic MapReduce REST API ※ NOTICE: Setup the security group
  • 30. Huahin Framework http://huahinframework.org Huahin EManager Operating environment of Huahin EManager The place that is different when EManager starts in Management Console and Tools. • EManager recycle one Job Flow Not attempt to start and end every time the EMR.Order to save costs and performances. ※ It Currently can not Management Console. However, Can be done from the command line and SDK. • However, reboot automatically when the upper limit of the number reaches 255 Step.
  • 31. Huahin Framework http://huahinframework.org Huahin EManager Operating environment of Huahin EManager The place that is different when EManager starts in Management Console and Tools. • It is booting for one hour • for cost(accounting and performance) • It do shutdown automatically before the timing charged. • However, if it were running the Job is carried over to the next billing timing.
  • 32. Huahin Framework http://huahinframework.org Huahin EManager Register queue Done using the PUT or POST method of registration of the queue. • PUT:If it have a script or JAR on the S3, It do Job Flow or only the execution of Step. • POST:Place the JAR or script in the local to S3. Boot and execution Step of Job Flow. It is a feature not in the EMR. And, option to remove the files that were POST. • All registration is done in JSON.
  • 33. Huahin Framework http://huahinframework.org Huahin EManager Register queue Examples of PUT in the Hive: $ curl -X PUT http://localhost:9020/queue/register/hive -F ARGUMENTS='{"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}' Optional arguments of JSON Examples of POST in the Hive: $ curl -X POST http://localhost:9020/queue/register/hive -F SCRIPT=@wordcount.hql -F ARGUMENTS='{"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}' Optional arguments of JSON Deleted after execution by setting the "true": "deleteOnExit" It no default deleted.
  • 34. Huahin Framework http://huahinframework.org Huahin EManager List of Job Flow Example of Get all Job Flow list: $ curl -X GET http://localhost:9020/jobflow/list Example of get running Job Flow list: $ curl -X GET http://localhost:9020/jobflow/runnings Example of Job Flow detail: $ curl -X GET http://localhost:9020/jobflow/describe/j-XXXXXXXXXXXX
  • 35. Huahin Framework http://huahinframework.org Huahin EManager Queue API Example of registered queue list: $ curl -X GET http://localhost:9020/queue/list Example of runnings queue list: $ curl -X GET http://localhost:9020/queue/runnings Example of get queue detail: $ curl -X GET http://localhost:9020/queue/describe/S_XXXXXXXXXXXX Example of delete queue: $ curl -X DELETE http://localhost:9020/queue/kill/S_XXXXXXXXXXXX
  • 36. Huahin Framework http://huahinframework.org Huahin EManager Kill of Job There is a command to kill the Job running on Hadoop. hadoop job -kill job_XXXXXXXXXX However, there is no function that EMR. If start a Job by mistake, there is no choice but to terminate the Job Flow. It will be able to kill by SSH to connect to the master node of the EMR, type the above command. Troublesome...
  • 37. Huahin Framework http://huahinframework.org Huahin EManager Kill of Job It made possible the Kill API from EManager (Manager)! Example of Step kill: $ curl -X DELETE http://localhost:9020/jobflow/kill/step/S_XXXXXXXXXXXX
  • 38. Huahin Framework http://huahinframework.org Conclusion • Huahin Core • Unlike the Hive and Pig • When it want to use MapReduce to some extent the natural. • Huahin Tools • Still... • Huahin Manager • All REST API operation • Integration with other systems • Huahin EManager • Integration with other systems • Cost and Performance management • Kill Step of Job Flow!
  • 39. Huahin Framework http://huahinframework.org The current version • Huahin Core 0.1.4 • Huahin Unit 0.1.4 • Huahin Tools 0.1.0 • Huahin Manager • 0.1.4 for Apache Hadoop 1.0.4 • 0.1.4 for CDH3 • 0.2.1 for Apache hadoop 2.0.2-alpha • 0.2.1 for CDH4 • Huahin EManager 0.1.1