SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
(      &Hadoop     )

    2013 4 12
     Takumi Asai
(26 )
–
– H21 H23 NTT Communications          IP
– H23     NTT

– twitter:@p_i_o4545
– blog:http://pioneerinocean.hatenablog.com/
    •
    •                 R    Hadoop              (   )


–
    •
(           :4/12)

    Hadoop
(       :     )
    R




              Ruby R
/   /
   /
⇒wikipedia
=
21            (   )
⇒




Google,Facebook
1000


       D   R
VS

                IT




                     RDBMS
SPSS   R




           IT
VS




FSP            Web




       FSP

       TESCO
VS




WinWin
Hadoop

  Hadoop
  – Apache
                 Java
  – Google MapReduce,Google File
    System(GFS)

     •   google
Hadoop

  Hadoop
  –    HDFS MapReduce
  – Hbase
  HDFS
  – Google      GFS
  –

  MapReduce
  – Google      MapReduce
  – Key-Value
       Java
HDFS

Namenode,2Namenode,Datanode   3


       Data Node
                                  Data Node

                      Name Node



       Data Node                  Data Node
                      Secondary
                      Name Node


       Data Node                  Data Node
HDFS

•             HDFS
                      (64MB              )

                     abcdefg   #Block1
                     hijklmn
                               (64MB)
                     opqrstu
        abcdefg
        hijklmn
        opqrstu
        vwxyz        vwxyz
                               #Block2
                               (64MB)



       150M
                               #Block3
                               (22MB)
HDFS


  –

  –
  –

   abcdefg
             #Block1   Data Node:A has 1,2
   hijklmn
             (64MB)
   opqrstu
                       Data Node:B has 2,3

   vwxyz               Data Node:C has 1,3
             #Block2
             (64MB)

                       Data Node:D has 1

             #Block3
             (22MB)    Data Node:E has 2,3
Namenode(NN)
– Namenode
– HDFS
–
–
Datanode(DN)
–
– blk_xxxxxx
–




               Secondary   Data Node
 Name Node
               Name Node
Secondary Namenode

Secondary Namenode(2NN)
– 2NN Namenode
– Namenode
–
     •          3
2NN NN
– Namenode

–        Namenode

     •
– 2NN
Namenode             !
– Namenode         HDFS
–     NN     2NN


– HDFS




–
–
HDFS

                     HDFS


  Data Node                             Data Node

                            Name Node
                              Active


  Data Node                             Data Node

                            Name Node
                             Standby


  Data Node                             Data Node



Standby              2NN
          2NN
HDFS

  HDFS
   – Datanode
   – Datanode


  Namenode
   –              Namenode
   – Namenode                          ⇔Datanode
     Datanode⇔Datanode
       •
       •
       •


                         Linux
   – ls,cat
   –                             rwx
       •          x          HDFS
MapReduce

   MapReduce
   –
   –



   – Map/Reduce    2
   – Map/Reduce         ,Mapper/Reducer
   –       Map,Reduce     Shuffle
MapReduce

HDFS


   Task Tracker
                                 Task Tracker
     (      )



                   Job Tracker
   Task Tracker      (     )     Task Tracker




   Task Tracker                  Task Tracker



  JobTracker      TaskTracker
Data Node
                                  Data Node
Task Tracker
                                 Task Tracker
                   Name Node
                   Job Tracker


 Data Node                        Data Node
Task Tracker                     Task Tracker
                   Secondary
                   Name Node


 Data Node     ※   HDFS
                                  Data Node
Task Tracker   ※   Mapreduce
                                 Task Tracker
Mapreduce

  YARN
  – HDFS                   Mapreduce

  – YARN(Mapreduce Ver2)
  – Mapreduce
  –           YARN
  –                YARN
MapReduce

   WordCount
    – MapReduce                             (Hello World          )


    Hello Hadoop Goodbye World Hello Goodbye World World Hadoop

   Map

   <Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1>
   <Goodbye,1> <World,1> <World,1> <Hadoop,1>

  Shuffle

   <Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]>

  Reduce
   <Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
MapReduce

   Mapper Reducer
   –
   –
   –
   –                HDFS ”   ”



            Map

                                 reduce

            Map

                                 reduce
            Map
MapReduce


  –                               WordCount

  – Map                   Reduce

  –
      • fizz buzz fizzbuzz fizz
  – Ruby                               Ruby


  – Map                      #{      }¥t1

          OK
  – Reduce
MapReduce


  hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred



   Hadoop        3
   hdfs          1
   Mapred        4

  –                                        OK
      • #{   }¥t#{       }


  – cat test.txt | ruby map.rb | sort | ruby reduce.rb
      •              Hadoop
MapReduce

            :Map
  hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred




   hdfs         1
   Hadoop       1
   Hadoop       1
   Mapred       1
   Mapred       1
Map
#!/usr/bin/env ruby


STDIN.each_line do |line|


line.split.each do |word|
  puts "#{word}¥t1"
 end


end
Reduce
wordhash = {}
STDIN.each_line do |line|
 word, count = line.strip.split


 if wordhash.has_key?(word)
  wordhash[word] += count.to_i
 else
  wordhash[word] = count.to_i
 end
end


wordhash.each {|record, count| puts "#{record}¥t#{count}"}
Hadoop

          Hadoop
  –

  –                Java   OK
  –

      •
          .
Hadoop
Hadoop
–
    • Pig
    • Hive
–
    • Sqoop
–
    • Mahout
–              Hadoop
    • whirr
                        etc…
Hadoop
– HDFS
    • RAID
    •
–       HDFS      Mapreduce
    • Amazon S3
–
    •


–
    •
–
    •
(Hadoop)


–      RDB


–


– Hive Pig

–

–
(Hadoop)


–


–                       HDD



–         Mapreduce
–

–              Hadoop
データ解析技術入門(Hadoop編)

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
HTrace: Tracing in HBase and HDFS (HBase Meetup)
HTrace: Tracing in HBase and HDFS (HBase Meetup)HTrace: Tracing in HBase and HDFS (HBase Meetup)
HTrace: Tracing in HBase and HDFS (HBase Meetup)
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
HUG slides on NFS and ODBC
HUG slides on NFS and ODBCHUG slides on NFS and ODBC
HUG slides on NFS and ODBC
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 

Ähnlich wie データ解析技術入門(Hadoop編)

The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
Nam Nham
 
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Naoki Yanai
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
 

Ähnlich wie データ解析技術入門(Hadoop編) (20)

Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
Failing gracefully
Failing gracefullyFailing gracefully
Failing gracefully
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle Professionals
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

データ解析技術入門(Hadoop編)

  • 1. ( &Hadoop ) 2013 4 12 Takumi Asai
  • 2. (26 ) – – H21 H23 NTT Communications IP – H23 NTT – twitter:@p_i_o4545 – blog:http://pioneerinocean.hatenablog.com/ • • R Hadoop ( ) – •
  • 3. ( :4/12) Hadoop ( : ) R Ruby R
  • 4.
  • 5. / / / ⇒wikipedia
  • 6. =
  • 7. 21 ( ) ⇒ Google,Facebook
  • 8. 1000 D R
  • 9. VS IT RDBMS SPSS R IT
  • 10. VS FSP Web FSP TESCO
  • 12.
  • 13.
  • 14. Hadoop Hadoop – Apache Java – Google MapReduce,Google File System(GFS) • google
  • 15. Hadoop Hadoop – HDFS MapReduce – Hbase HDFS – Google GFS – MapReduce – Google MapReduce – Key-Value Java
  • 16. HDFS Namenode,2Namenode,Datanode 3 Data Node Data Node Name Node Data Node Data Node Secondary Name Node Data Node Data Node
  • 17. HDFS • HDFS (64MB ) abcdefg #Block1 hijklmn (64MB) opqrstu abcdefg hijklmn opqrstu vwxyz vwxyz #Block2 (64MB) 150M #Block3 (22MB)
  • 18. HDFS – – – abcdefg #Block1 Data Node:A has 1,2 hijklmn (64MB) opqrstu Data Node:B has 2,3 vwxyz Data Node:C has 1,3 #Block2 (64MB) Data Node:D has 1 #Block3 (22MB) Data Node:E has 2,3
  • 19. Namenode(NN) – Namenode – HDFS – – Datanode(DN) – – blk_xxxxxx – Secondary Data Node Name Node Name Node
  • 20. Secondary Namenode Secondary Namenode(2NN) – 2NN Namenode – Namenode – • 3 2NN NN – Namenode – Namenode • – 2NN
  • 21. Namenode ! – Namenode HDFS – NN 2NN – HDFS – –
  • 22. HDFS HDFS Data Node Data Node Name Node Active Data Node Data Node Name Node Standby Data Node Data Node Standby 2NN 2NN
  • 23. HDFS HDFS – Datanode – Datanode Namenode – Namenode – Namenode ⇔Datanode Datanode⇔Datanode • • • Linux – ls,cat – rwx • x HDFS
  • 24. MapReduce MapReduce – – – Map/Reduce 2 – Map/Reduce ,Mapper/Reducer – Map,Reduce Shuffle
  • 25. MapReduce HDFS Task Tracker Task Tracker ( ) Job Tracker Task Tracker ( ) Task Tracker Task Tracker Task Tracker JobTracker TaskTracker
  • 26. Data Node Data Node Task Tracker Task Tracker Name Node Job Tracker Data Node Data Node Task Tracker Task Tracker Secondary Name Node Data Node ※ HDFS Data Node Task Tracker ※ Mapreduce Task Tracker
  • 27. Mapreduce YARN – HDFS Mapreduce – YARN(Mapreduce Ver2) – Mapreduce – YARN – YARN
  • 28. MapReduce WordCount – MapReduce (Hello World ) Hello Hadoop Goodbye World Hello Goodbye World World Hadoop Map <Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1> <Goodbye,1> <World,1> <World,1> <Hadoop,1> Shuffle <Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]> Reduce <Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
  • 29. MapReduce Mapper Reducer – – – – HDFS ” ” Map reduce Map reduce Map
  • 30. MapReduce – WordCount – Map Reduce – • fizz buzz fizzbuzz fizz – Ruby Ruby – Map #{ }¥t1 OK – Reduce
  • 31. MapReduce hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred Hadoop 3 hdfs 1 Mapred 4 – OK • #{ }¥t#{ } – cat test.txt | ruby map.rb | sort | ruby reduce.rb • Hadoop
  • 32. MapReduce :Map hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred hdfs 1 Hadoop 1 Hadoop 1 Mapred 1 Mapred 1
  • 33. Map #!/usr/bin/env ruby STDIN.each_line do |line| line.split.each do |word| puts "#{word}¥t1" end end
  • 34. Reduce wordhash = {} STDIN.each_line do |line| word, count = line.strip.split if wordhash.has_key?(word) wordhash[word] += count.to_i else wordhash[word] = count.to_i end end wordhash.each {|record, count| puts "#{record}¥t#{count}"}
  • 35. Hadoop Hadoop – – Java OK – • .
  • 37. Hadoop – • Pig • Hive – • Sqoop – • Mahout – Hadoop • whirr etc…
  • 38. Hadoop – HDFS • RAID • – HDFS Mapreduce • Amazon S3 – • – • – •
  • 39. (Hadoop) – RDB – – Hive Pig – –
  • 40. (Hadoop) – – HDD – Mapreduce – – Hadoop