Rakuten tech conf

Jrubyで実現する
分散並列処理フレームワーク
Hadoop Papyrus
and more...

2010/10/16
楽天テクノロジーカンファレンス2010

日本JRubyユーザ会／ハピルス株式会社
藤川幸一 FUJIKAWA Koichi @fujibee

JRubyユーザ会
・2010年5月に設立
・Jrubyユーザの交流の場として、勉強会などを　
行っている
・第０回　設立準備会
・第１回　Google AppEngine with JRuby
・第２回　JRubyユーザ会 in RubyKaigi2010
・第３回　＜今ココ＞
・参加希望はML(Google Group)へ登録！
　http://groups.google.com/group/jruby-users-jp

Hadoopとは?

大規模データ並列分散処理フレームワーク
Google MapReduceのオープンソースク


ローン

テラバイトレベルのデータ処理に必要

標準的なHDDがRead 50MB/sとして
400TB(Webスケール)のReadだけで2000時間

分散ファイルシステムと分散処理フレームワー
クが必要

Hadoop Papyrus

HadoopジョブをRubyのDSLで実行できる


オープンソースフレームワーク

本来HadoopジョブはJavaで記述する

Javaだと複雑な記述がほんの数行で書ける

IPA未踏本体２００９年上期のサポート

Hudson上でジョブを記述/実行が可能

Step.1
JavaではなくRubyで記述

Step.2
RubyによるDSLでMapReduceを
シンプルに

Map Reduce Job
Description

Log Analysis
DSL

Step.3
Hadoopサーバ構成を容易に利用可能に

package org.apache.hadoop.examples; Java
import java.io.IOException;
import java.util.StringTokenizer;
同様な処理がJavaでは70行必要だが、
import org.apache.hadoop.conf.Configuration ;
HadoopPapyrusだと10行に！
import org.apache.hadoop.fs.Path ;
import org.apache.hadoop.io.IntWritable ;
import org.apache.hadoop.io.Text ;
import org.apache.hadoop.mapreduce.Job ;
import org.apache.hadoop.mapreduce.Mapper ;
public static class IntSumReducer extends
import org.apache.hadoop.mapreduce.Reducer ;
Reducer<Text, IntWritable, Text, IntWritable> {
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat ;
private IntWritable result = new IntWritable();
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat ;
import org.apache.hadoop.util.GenericOptionsParser ;
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
public class WordCountint sum = 0;
{
for (IntWritable val : values) {
sum += val.get();
public static class TokenizerMapper extends
}
Mapper<Object, Text, Text, IntWritable> {
result.set(sum);
Hadoop Papyrus
context.write(key, result);
}
private final static IntWritable one = new IntWritable(1);
dsl 'LogAnalysis‘
}
private Text word = new Text();

public static void main(String[] args) throws Exception {
public void map(Object key, Text value,conf = new Configuration();
Configuration Context context)
from ‘test/in‘
throws IOException, InterruptedException { = new GenericOptionsParser(conf, args)
String[] otherArgs
StringTokenizer itr = new StringTokenizer(value.toString());
.getRemainingArgs();
to ‘test/out’
while (itr.hasMoreTokens()) {(otherArgs.length != 2) {
if
word.set(itr.nextToken()); System.err.println("Usage: wordcount <in> <out>");
context.write(word, one); System.exit(2);
}
}
} pattern /[[([^|]:]+)[^]:]*]]/
Job job = new Job(conf, "word count");
} job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
column_name :link
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); topic "link num", :label => 'n' do
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
count_uniq column[:link]
}
}
end

Hadoop Papyrus 詳細
Javaで書く必要があるMap/Reduce処理内
で、JRubyを利用してRubyスクリプトを呼び出す

Hadoop Papyrus 詳細 (続き)
さらに、処理したい内容（ログ分析など）を記述したDSLを用意して
おき、Map処理、Reduce処理でそれぞれ異なる動きをさせることで1
枚のDSL記述でMapReduce処理を行うことができる。

Hapyrus (ハピルス)
・HapyrusはHadoop処理などの大量並列分散処理
のベストプラクティスを共有・実行するサービス
・Amazon EC2上に構築されHadoopをサービスと
して利用できる
・内部的にJRubyを利用
– HadoopとRuby(RoR利用)の接続として
・2010年10月からハピルス株式会社として開発開
始・鋭意開発中！
・年末にはアルファ版公開予定
ご期待ください！

JRubyでHadoopにアクセス

Hadoop
Hadoop
Hadoop IPC
Client
Client JobTracker
JobTracker
<JRuby>
<JRuby> <Java>
<Java>
Hadoop内のオブジェクトデータに
直接アクセス可能！

ありがとうございました

Twitter ID: @fujibee

Rakuten tech conf

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Rakuten tech conf

Ähnlich wie Rakuten tech conf (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (12)

Rakuten tech conf