SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
(      &Hadoop     )

    2013 4 12
     Takumi Asai
(26 )
–
– H21 H23 NTT Communications          IP
– H23     NTT

– twitter:@p_i_o4545
– blog:http://pioneerinocean.hatenablog.com/
    •
    •                 R    Hadoop              (   )


–
    •
(           :4/12)

    Hadoop
(       :     )
    R




              Ruby R
/   /
   /
⇒wikipedia
=
21            (   )
⇒




Google,Facebook
1000


       D   R
VS

                IT




                     RDBMS
SPSS   R




           IT
VS




FSP            Web




       FSP

       TESCO
VS




WinWin
Hadoop

  Hadoop
  – Apache
                 Java
  – Google MapReduce,Google File
    System(GFS)

     •   google
Hadoop

  Hadoop
  –    HDFS MapReduce
  – Hbase
  HDFS
  – Google      GFS
  –

  MapReduce
  – Google      MapReduce
  – Key-Value
       Java
HDFS

Namenode,2Namenode,Datanode   3


       Data Node
                                  Data Node

                      Name Node



       Data Node                  Data Node
                      Secondary
                      Name Node


       Data Node                  Data Node
HDFS

•             HDFS
                      (64MB              )

                     abcdefg   #Block1
                     hijklmn
                               (64MB)
                     opqrstu
        abcdefg
        hijklmn
        opqrstu
        vwxyz        vwxyz
                               #Block2
                               (64MB)



       150M
                               #Block3
                               (22MB)
HDFS


  –

  –
  –

   abcdefg
             #Block1   Data Node:A has 1,2
   hijklmn
             (64MB)
   opqrstu
                       Data Node:B has 2,3

   vwxyz               Data Node:C has 1,3
             #Block2
             (64MB)

                       Data Node:D has 1

             #Block3
             (22MB)    Data Node:E has 2,3
Namenode(NN)
– Namenode
– HDFS
–
–
Datanode(DN)
–
– blk_xxxxxx
–




               Secondary   Data Node
 Name Node
               Name Node
Secondary Namenode

Secondary Namenode(2NN)
– 2NN Namenode
– Namenode
–
     •          3
2NN NN
– Namenode

–        Namenode

     •
– 2NN
Namenode             !
– Namenode         HDFS
–     NN     2NN


– HDFS




–
–
HDFS

                     HDFS


  Data Node                             Data Node

                            Name Node
                              Active


  Data Node                             Data Node

                            Name Node
                             Standby


  Data Node                             Data Node



Standby              2NN
          2NN
HDFS

  HDFS
   – Datanode
   – Datanode


  Namenode
   –              Namenode
   – Namenode                          ⇔Datanode
     Datanode⇔Datanode
       •
       •
       •


                         Linux
   – ls,cat
   –                             rwx
       •          x          HDFS
MapReduce

   MapReduce
   –
   –



   – Map/Reduce    2
   – Map/Reduce         ,Mapper/Reducer
   –       Map,Reduce     Shuffle
MapReduce

HDFS


   Task Tracker
                                 Task Tracker
     (      )



                   Job Tracker
   Task Tracker      (     )     Task Tracker




   Task Tracker                  Task Tracker



  JobTracker      TaskTracker
Data Node
                                  Data Node
Task Tracker
                                 Task Tracker
                   Name Node
                   Job Tracker


 Data Node                        Data Node
Task Tracker                     Task Tracker
                   Secondary
                   Name Node


 Data Node     ※   HDFS
                                  Data Node
Task Tracker   ※   Mapreduce
                                 Task Tracker
Mapreduce

  YARN
  – HDFS                   Mapreduce

  – YARN(Mapreduce Ver2)
  – Mapreduce
  –           YARN
  –                YARN
MapReduce

   WordCount
    – MapReduce                             (Hello World          )


    Hello Hadoop Goodbye World Hello Goodbye World World Hadoop

   Map

   <Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1>
   <Goodbye,1> <World,1> <World,1> <Hadoop,1>

  Shuffle

   <Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]>

  Reduce
   <Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
MapReduce

   Mapper Reducer
   –
   –
   –
   –                HDFS ”   ”



            Map

                                 reduce

            Map

                                 reduce
            Map
MapReduce


  –                               WordCount

  – Map                   Reduce

  –
      • fizz buzz fizzbuzz fizz
  – Ruby                               Ruby


  – Map                      #{      }¥t1

          OK
  – Reduce
MapReduce


  hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred



   Hadoop        3
   hdfs          1
   Mapred        4

  –                                        OK
      • #{   }¥t#{       }


  – cat test.txt | ruby map.rb | sort | ruby reduce.rb
      •              Hadoop
MapReduce

            :Map
  hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred




   hdfs         1
   Hadoop       1
   Hadoop       1
   Mapred       1
   Mapred       1
Map
#!/usr/bin/env ruby


STDIN.each_line do |line|


line.split.each do |word|
  puts "#{word}¥t1"
 end


end
Reduce
wordhash = {}
STDIN.each_line do |line|
 word, count = line.strip.split


 if wordhash.has_key?(word)
  wordhash[word] += count.to_i
 else
  wordhash[word] = count.to_i
 end
end


wordhash.each {|record, count| puts "#{record}¥t#{count}"}
Hadoop

          Hadoop
  –

  –                Java   OK
  –

      •
          .
Hadoop
Hadoop
–
    • Pig
    • Hive
–
    • Sqoop
–
    • Mahout
–              Hadoop
    • whirr
                        etc…
Hadoop
– HDFS
    • RAID
    •
–       HDFS      Mapreduce
    • Amazon S3
–
    •


–
    •
–
    •
(Hadoop)


–      RDB


–


– Hive Pig

–

–
(Hadoop)


–


–                       HDD



–         Mapreduce
–

–              Hadoop
データ解析技術入門(Hadoop編)

Weitere ähnliche Inhalte

Was ist angesagt?

Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Ted Dunning
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationSameer Tiwari
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡youngick
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Ted Dunning
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of HadoopKnoldus Inc.
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 

Was ist angesagt? (20)

myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
HTrace: Tracing in HBase and HDFS (HBase Meetup)
HTrace: Tracing in HBase and HDFS (HBase Meetup)HTrace: Tracing in HBase and HDFS (HBase Meetup)
HTrace: Tracing in HBase and HDFS (HBase Meetup)
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
HUG slides on NFS and ODBC
HUG slides on NFS and ODBCHUG slides on NFS and ODBC
HUG slides on NFS and ODBC
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 

Ähnlich wie データ解析技術入門(Hadoop編)

Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReducefvanvollenhoven
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축Kwang Woo NAM
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsChien Chung Shen
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of HadoopNam Nham
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Naoki Yanai
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopStu Hood
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 

Ähnlich wie データ解析技術入門(Hadoop編) (20)

Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
Failing gracefully
Failing gracefullyFailing gracefully
Failing gracefully
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle Professionals
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 

Kürzlich hochgeladen

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

データ解析技術入門(Hadoop編)

  • 1. ( &Hadoop ) 2013 4 12 Takumi Asai
  • 2. (26 ) – – H21 H23 NTT Communications IP – H23 NTT – twitter:@p_i_o4545 – blog:http://pioneerinocean.hatenablog.com/ • • R Hadoop ( ) – •
  • 3. ( :4/12) Hadoop ( : ) R Ruby R
  • 4.
  • 5. / / / ⇒wikipedia
  • 6. =
  • 7. 21 ( ) ⇒ Google,Facebook
  • 8. 1000 D R
  • 9. VS IT RDBMS SPSS R IT
  • 10. VS FSP Web FSP TESCO
  • 12.
  • 13.
  • 14. Hadoop Hadoop – Apache Java – Google MapReduce,Google File System(GFS) • google
  • 15. Hadoop Hadoop – HDFS MapReduce – Hbase HDFS – Google GFS – MapReduce – Google MapReduce – Key-Value Java
  • 16. HDFS Namenode,2Namenode,Datanode 3 Data Node Data Node Name Node Data Node Data Node Secondary Name Node Data Node Data Node
  • 17. HDFS • HDFS (64MB ) abcdefg #Block1 hijklmn (64MB) opqrstu abcdefg hijklmn opqrstu vwxyz vwxyz #Block2 (64MB) 150M #Block3 (22MB)
  • 18. HDFS – – – abcdefg #Block1 Data Node:A has 1,2 hijklmn (64MB) opqrstu Data Node:B has 2,3 vwxyz Data Node:C has 1,3 #Block2 (64MB) Data Node:D has 1 #Block3 (22MB) Data Node:E has 2,3
  • 19. Namenode(NN) – Namenode – HDFS – – Datanode(DN) – – blk_xxxxxx – Secondary Data Node Name Node Name Node
  • 20. Secondary Namenode Secondary Namenode(2NN) – 2NN Namenode – Namenode – • 3 2NN NN – Namenode – Namenode • – 2NN
  • 21. Namenode ! – Namenode HDFS – NN 2NN – HDFS – –
  • 22. HDFS HDFS Data Node Data Node Name Node Active Data Node Data Node Name Node Standby Data Node Data Node Standby 2NN 2NN
  • 23. HDFS HDFS – Datanode – Datanode Namenode – Namenode – Namenode ⇔Datanode Datanode⇔Datanode • • • Linux – ls,cat – rwx • x HDFS
  • 24. MapReduce MapReduce – – – Map/Reduce 2 – Map/Reduce ,Mapper/Reducer – Map,Reduce Shuffle
  • 25. MapReduce HDFS Task Tracker Task Tracker ( ) Job Tracker Task Tracker ( ) Task Tracker Task Tracker Task Tracker JobTracker TaskTracker
  • 26. Data Node Data Node Task Tracker Task Tracker Name Node Job Tracker Data Node Data Node Task Tracker Task Tracker Secondary Name Node Data Node ※ HDFS Data Node Task Tracker ※ Mapreduce Task Tracker
  • 27. Mapreduce YARN – HDFS Mapreduce – YARN(Mapreduce Ver2) – Mapreduce – YARN – YARN
  • 28. MapReduce WordCount – MapReduce (Hello World ) Hello Hadoop Goodbye World Hello Goodbye World World Hadoop Map <Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1> <Goodbye,1> <World,1> <World,1> <Hadoop,1> Shuffle <Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]> Reduce <Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
  • 29. MapReduce Mapper Reducer – – – – HDFS ” ” Map reduce Map reduce Map
  • 30. MapReduce – WordCount – Map Reduce – • fizz buzz fizzbuzz fizz – Ruby Ruby – Map #{ }¥t1 OK – Reduce
  • 31. MapReduce hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred Hadoop 3 hdfs 1 Mapred 4 – OK • #{ }¥t#{ } – cat test.txt | ruby map.rb | sort | ruby reduce.rb • Hadoop
  • 32. MapReduce :Map hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred hdfs 1 Hadoop 1 Hadoop 1 Mapred 1 Mapred 1
  • 33. Map #!/usr/bin/env ruby STDIN.each_line do |line| line.split.each do |word| puts "#{word}¥t1" end end
  • 34. Reduce wordhash = {} STDIN.each_line do |line| word, count = line.strip.split if wordhash.has_key?(word) wordhash[word] += count.to_i else wordhash[word] = count.to_i end end wordhash.each {|record, count| puts "#{record}¥t#{count}"}
  • 35. Hadoop Hadoop – – Java OK – • .
  • 37. Hadoop – • Pig • Hive – • Sqoop – • Mahout – Hadoop • whirr etc…
  • 38. Hadoop – HDFS • RAID • – HDFS Mapreduce • Amazon S3 – • – • – •
  • 39. (Hadoop) – RDB – – Hive Pig – –
  • 40. (Hadoop) – – HDD – Mapreduce – – Hadoop