SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓Google Map/Reduce             GFS



✓Java
 - Apache
 - http://hadoop.apache.org/
✓

✓
✓

- HDD
-
✓
✓


✓
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓

✓Map   Reduce
✓
- grep
-
-
✓
-

-
-
PV UU
30+2
5+1
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
•JobTracker TaskTracker Map/Reduce
•NameNode DataNode HDFS
•JobTracker NameNode
•SecondaryNameNode NameNode

•TaskTracker DataNode

•JobTracker/NameNode
•TaskTracker/DataNode
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓
-   MapTask
-   ReduceTask
-   JobClient    JobTracker Job
-   HDFS
-   Map/Reduce
public class WordCount {
    public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
        //Map
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
       //Reduce
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}
public class WordCount {
    public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
        //Map
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
       //Reduce
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}
✓Hadoop Streaming
✓LibHDFS
✓Hadoop Pipes
✓Amazon Elastic MapReduce
✓Hadoop      hadoop-streaming.jar
✓                 Map/Reduce


 - C Perl Ruby Python
     Map/Reduce
 -      Map/Reduce
✓Map:cat / Reduce:wc
✓HDFS
$ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/
$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/
output -mapper cat -reducer "wc -l"

09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0%
09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0%
09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100%
09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output

$ hadoop dfs -cat /dfs/test/output/*
   8842
✓python
   http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python


$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/
output -mapper "python map.py" -reducer "python reduce.py"

09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0%
09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0%
09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100%
09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output

$ hadoop dfs -cat /dfs/test/output/*
via 1942
the 1476
to    1394
in    819
a     816
cutting)   740
✓C HDFS
http://wiki.apache.org/hadoop/LibHDFS
✓C C++ HDFS                                Map/Reduce
                      API
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/
package-summary.html
✓Amazon EC2                                MapReduce
 http://aws.amazon.com/elasticmapreduce/
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓
- NameNode/TaskTracker
-
✓HDFS
- HDFS    DataNode
✓JMX              metrics
 - Hadoop

 - Hadoop Java                    jmxremote
 - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/

✓metrics
 - DFS / MapReduce / JVM / RPC
 - Map/Reduce                         Task


  (Keyword Tracker
✓JobTracker
 -              (http://jobtracker:50030/jobtracker.jsp)

 - Map/Reduce
✓      wiki
 - http://wiki.apache.org/hadoop/FAQ
✓Yahoo
 - http://www.docstoc.com/docs/3766688/Hadoop-
  Map-Reduce-Tuning-and-Debugging-Arun-C-
  Murthy-acmurthy
✓TaskTracker                         Map
 Reduce
 -               hadoop-site.xml)


     mapred.tasktracker.reduce.tasks.maximum


 -       TaskTracker
 - TaskTracker
     4   8GB
✓Map→Reduce


 -   io.sort.mb
 -   io.sort.factor
 -   io.sort.record.parcent
 -   io.sort.spill.parcent
✓Reduce


 - mapred.reduce.parallel.copies
✓Map

- mapred.compress.map.output (true )
✓Map→Reduce
                         HDFS


 - fs.inmemory.size.mb
✓Reduce              HDFS


✓        HDFS
 - org.apache.hadoop.mapred.lib.NullOutputFormat
✓Reduce
✓     Reduce


 -

 -
✓MRUnit
 - MapTask/ReduceTask
 - cloudera               Hadoop
 - http://www.cloudera.com/hadoop-mrunit

✓JMock
 -          Mock
 - http://www.jmock.org/

✓Hadoop

 -                                         (´ ω `)
✓Hudson Hadoop
 - zero conf Hudson               Hadoop


 - Hudson                Hadoop


 - http://d.hatena.ne.jp/kkawa/20090315/p1
 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/
  instantly_turni.html
✓

✓Hadoop Streaming          Java


✓Letʼs Try Hadoop Programing!
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
Takahiro Inoue
 
Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116
Yahoo Developer Network
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
exsuns
 

Was ist angesagt? (20)

Prashant de-ny-project-s1
Prashant de-ny-project-s1Prashant de-ny-project-s1
Prashant de-ny-project-s1
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
Failing gracefully
Failing gracefullyFailing gracefully
Failing gracefully
 
Database High Availability Using SHADOW Systems
Database High Availability Using SHADOW SystemsDatabase High Availability Using SHADOW Systems
Database High Availability Using SHADOW Systems
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
MapReduce@DirectI
MapReduce@DirectIMapReduce@DirectI
MapReduce@DirectI
 
Gur1009
Gur1009Gur1009
Gur1009
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBase
 

Ähnlich wie ちょっとHadoopについて語ってみるか(仮題)

Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 

Ähnlich wie ちょっとHadoopについて語ってみるか(仮題) (20)

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 

Mehr von moai kids

FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係
moai kids
 
Twitterのsnowflakeについて
TwitterのsnowflakeについてTwitterのsnowflakeについて
Twitterのsnowflakeについて
moai kids
 
Programming Hive Reading #4
Programming Hive Reading #4Programming Hive Reading #4
Programming Hive Reading #4
moai kids
 
Programming Hive Reading #3
Programming Hive Reading #3Programming Hive Reading #3
Programming Hive Reading #3
moai kids
 
"Programming Hive" Reading #1
"Programming Hive" Reading #1"Programming Hive" Reading #1
"Programming Hive" Reading #1
moai kids
 
Casual Compression on MongoDB
Casual Compression on MongoDBCasual Compression on MongoDB
Casual Compression on MongoDB
moai kids
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
moai kids
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
moai kids
 
HBase本輪読会資料(11章)
HBase本輪読会資料(11章)HBase本輪読会資料(11章)
HBase本輪読会資料(11章)
moai kids
 
第四回月次セミナー(公開版)
第四回月次セミナー(公開版)第四回月次セミナー(公開版)
第四回月次セミナー(公開版)
moai kids
 
第三回月次セミナー(公開版)
第三回月次セミナー(公開版)第三回月次セミナー(公開版)
第三回月次セミナー(公開版)
moai kids
 
Pythonで自然言語処理
Pythonで自然言語処理Pythonで自然言語処理
Pythonで自然言語処理
moai kids
 
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークHandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
moai kids
 
Yammer試用レポート(公開版)
Yammer試用レポート(公開版)Yammer試用レポート(公開版)
Yammer試用レポート(公開版)
moai kids
 
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
moai kids
 
中国と私(仮題)
中国と私(仮題)中国と私(仮題)
中国と私(仮題)
moai kids
 
不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料
moai kids
 
n-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法についてn-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法について
moai kids
 

Mehr von moai kids (20)

中国最新ニュースアプリ事情
中国最新ニュースアプリ事情中国最新ニュースアプリ事情
中国最新ニュースアプリ事情
 
FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係
 
Twitterのsnowflakeについて
TwitterのsnowflakeについてTwitterのsnowflakeについて
Twitterのsnowflakeについて
 
Programming Hive Reading #4
Programming Hive Reading #4Programming Hive Reading #4
Programming Hive Reading #4
 
Programming Hive Reading #3
Programming Hive Reading #3Programming Hive Reading #3
Programming Hive Reading #3
 
"Programming Hive" Reading #1
"Programming Hive" Reading #1"Programming Hive" Reading #1
"Programming Hive" Reading #1
 
Casual Compression on MongoDB
Casual Compression on MongoDBCasual Compression on MongoDB
Casual Compression on MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
 
HBase本輪読会資料(11章)
HBase本輪読会資料(11章)HBase本輪読会資料(11章)
HBase本輪読会資料(11章)
 
snappyについて
snappyについてsnappyについて
snappyについて
 
第四回月次セミナー(公開版)
第四回月次セミナー(公開版)第四回月次セミナー(公開版)
第四回月次セミナー(公開版)
 
第三回月次セミナー(公開版)
第三回月次セミナー(公開版)第三回月次セミナー(公開版)
第三回月次セミナー(公開版)
 
Pythonで自然言語処理
Pythonで自然言語処理Pythonで自然言語処理
Pythonで自然言語処理
 
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークHandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
 
Yammer試用レポート(公開版)
Yammer試用レポート(公開版)Yammer試用レポート(公開版)
Yammer試用レポート(公開版)
 
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
 
中国と私(仮題)
中国と私(仮題)中国と私(仮題)
中国と私(仮題)
 
不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料
 
n-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法についてn-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法について
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

ちょっとHadoopについて語ってみるか(仮題)

  • 1.
  • 2.
  • 4.
  • 5. ✓Google Map/Reduce GFS ✓Java - Apache - http://hadoop.apache.org/
  • 8.
  • 9.
  • 11.
  • 12. ✓ ✓Map Reduce
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 22. 5+1
  • 24.
  • 26. •JobTracker NameNode •SecondaryNameNode NameNode •TaskTracker DataNode •JobTracker/NameNode •TaskTracker/DataNode
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 34. ✓ - MapTask - ReduceTask - JobClient JobTracker Job - HDFS - Map/Reduce
  • 35. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • 36. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • 38. ✓Hadoop hadoop-streaming.jar ✓ Map/Reduce - C Perl Ruby Python Map/Reduce - Map/Reduce
  • 39. ✓Map:cat / Reduce:wc ✓HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
  • 40. ✓python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
  • 42. ✓C C++ HDFS Map/Reduce API http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/ package-summary.html
  • 43. ✓Amazon EC2 MapReduce http://aws.amazon.com/elasticmapreduce/
  • 46. ✓JMX metrics - Hadoop - Hadoop Java jmxremote - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/ ✓metrics - DFS / MapReduce / JVM / RPC - Map/Reduce Task (Keyword Tracker
  • 47. ✓JobTracker - (http://jobtracker:50030/jobtracker.jsp) - Map/Reduce
  • 48. wiki - http://wiki.apache.org/hadoop/FAQ ✓Yahoo - http://www.docstoc.com/docs/3766688/Hadoop- Map-Reduce-Tuning-and-Debugging-Arun-C- Murthy-acmurthy
  • 49. ✓TaskTracker Map Reduce - hadoop-site.xml) mapred.tasktracker.reduce.tasks.maximum - TaskTracker - TaskTracker 4 8GB
  • 50. ✓Map→Reduce - io.sort.mb - io.sort.factor - io.sort.record.parcent - io.sort.spill.parcent
  • 53. ✓Map→Reduce HDFS - fs.inmemory.size.mb
  • 54. ✓Reduce HDFS ✓ HDFS - org.apache.hadoop.mapred.lib.NullOutputFormat
  • 55. ✓Reduce ✓ Reduce - -
  • 56. ✓MRUnit - MapTask/ReduceTask - cloudera Hadoop - http://www.cloudera.com/hadoop-mrunit ✓JMock - Mock - http://www.jmock.org/ ✓Hadoop - (´ ω `)
  • 57. ✓Hudson Hadoop - zero conf Hudson Hadoop - Hudson Hadoop - http://d.hatena.ne.jp/kkawa/20090315/p1 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/ instantly_turni.html
  • 58.
  • 59. ✓ ✓Hadoop Streaming Java ✓Letʼs Try Hadoop Programing!