ちょっとHadoopについて語ってみるか（仮題）

Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)

✓Google Map/Reduce GFS

✓Java
- Apache
- http://hadoop.apache.org/

•JobTracker TaskTracker Map/Reduce
•NameNode DataNode HDFS

•JobTracker NameNode
•SecondaryNameNode NameNode

•TaskTracker DataNode

•JobTracker/NameNode
•TaskTracker/DataNode

✓
- MapTask
- ReduceTask
- JobClient JobTracker Job
- HDFS
- Map/Reduce

public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
//Map
}

public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
//Reduce
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

✓Hadoop Streaming
✓LibHDFS
✓Hadoop Pipes
✓Amazon Elastic MapReduce

✓Hadoop hadoop-streaming.jar
✓ Map/Reduce

- C Perl Ruby Python
Map/Reduce
- Map/Reduce

✓Map:cat / Reduce:wc
✓HDFS
$ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/
$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/
output -mapper cat -reducer "wc -l"

09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/09/26 17:00:30 WARN mapred.JobClient: No job jar ﬁle set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0%
09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output

$ hadoop dfs -cat /dfs/test/output/*
8842

✓python
http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python

$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/
output -mapper "python map.py" -reducer "python reduce.py"

09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/09/26 17:29:26 WARN mapred.JobClient: No job jar ﬁle set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output

$ hadoop dfs -cat /dfs/test/output/*
via 1942
the 1476
to 1394
in 819
a 816
cutting) 740

✓C HDFS
http://wiki.apache.org/hadoop/LibHDFS

✓C C++ HDFS Map/Reduce
API
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/
package-summary.html

✓Amazon EC2 MapReduce
http://aws.amazon.com/elasticmapreduce/

✓
- NameNode/TaskTracker
-
✓HDFS
- HDFS DataNode

✓JMX metrics
- Hadoop

- Hadoop Java jmxremote
- http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/

✓metrics
- DFS / MapReduce / JVM / RPC
- Map/Reduce Task

(Keyword Tracker

✓JobTracker
- (http://jobtracker:50030/jobtracker.jsp)

- Map/Reduce

✓ wiki
- http://wiki.apache.org/hadoop/FAQ
✓Yahoo
- http://www.docstoc.com/docs/3766688/Hadoop-
Map-Reduce-Tuning-and-Debugging-Arun-C-
Murthy-acmurthy

✓TaskTracker Map
Reduce
- hadoop-site.xml)

mapred.tasktracker.reduce.tasks.maximum

- TaskTracker
- TaskTracker
4 8GB

✓Map→Reduce

- io.sort.mb
- io.sort.factor
- io.sort.record.parcent
- io.sort.spill.parcent

✓Reduce

- mapred.reduce.parallel.copies

✓Map

- mapred.compress.map.output (true )

✓Map→Reduce
HDFS

- fs.inmemory.size.mb

✓Reduce HDFS

✓ HDFS
- org.apache.hadoop.mapred.lib.NullOutputFormat

✓Reduce
✓ Reduce

-

-

✓MRUnit
- MapTask/ReduceTask
- cloudera Hadoop
- http://www.cloudera.com/hadoop-mrunit

✓JMock
- Mock
- http://www.jmock.org/

✓Hadoop

- (´ ω `)

✓Hudson Hadoop
- zero conf Hudson Hadoop

- Hudson Hadoop

- http://d.hatena.ne.jp/kkawa/20090315/p1
- http://weblogs.java.net/blog/kohsuke/archive/2009/03/
instantly_turni.html

✓

✓Hadoop Streaming Java

✓Letʼs Try Hadoop Programing!

ちょっとHadoopについて語ってみるか（仮題）

ちょっとHadoopについて語ってみるか（仮題）

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie ちょっとHadoopについて語ってみるか（仮題）

Ähnlich wie ちょっとHadoopについて語ってみるか（仮題） (20)

Mehr von moai kids

Mehr von moai kids (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ちょっとHadoopについて語ってみるか（仮題）