SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Hadoop
    Hadoop         (HDFS)



     




Public 2009/5/13
• Hadoop
• Hadoop   (HDFS)
    –
    –
    –
•




                    Copyright 2009 - Trend Micro Inc.
Hadoop               ?

• Hadoop

• Apache top-level                                  Cloud Applications

• Hadoop
    –                (HDFS)                   MapReduce                  HBase

    – MapReduce
•       Java                                  Hadoop Distributed File System
                                                        (HDFS)
•                 C++/Java/Shell/
    Command…
                                                  A Cluster of Machines
•
    – Linux    Mac OS/X Windows     Solaris
    –


                                                          Copyright 2009 - Trend Micro Inc.
Hadoop

• 2003   2
  – Google           MapReduce
• 2003   10
  – Google     Goofle File System (GFS)
• 2004   12
  – Google     MapReduce
• 2005   7
  – Doug Cutting     Nutch                MapReduce
• 2006   2
  – Hadoop          Nutch            Lucene
• 2006   11
  – Google     Bigtable



                                                      Copyright 2009 - Trend Micro Inc.
Hadoop

• 2007   2
  – Mike Cafarella        Hbase
• 2007   4
  – Yahoo!    1000                Hadoop
• 2008   1
  – Hadoop       Apache




                                           Copyright 2009 - Trend Micro Inc.
Who use Hadoop?
•   Yahoo!
    – Hadoop          2              CPU        10
•   Google
    –                 Hadoop
•   Amazon
    – Amazon          Hadoop
    –
•   IBM
    – Blue Cloud
•   Trend Micro
    –        Hadoop

•             Hadoop           …
    – http://wiki.apache.org/hadoop/PoweredBy



                                                     Copyright 2009 - Trend Micro Inc.
Hadoop   (HDFS)




              Copyright 2009 - Trend Micro Inc.
HDFS

•                                                 (Single
    Namespace)
•
    – 1          1             10 Peta Bytes
•
    – Write-once-read-many
    –
•                              (block)
    –                        128 MB
    –                                 (replica)
            (DataNode)




                                                        Copyright 2009 - Trend Micro Inc.
HDFS

•
    –

•       (File replication)
    –                3   .
    –
•
    –
    –
•
    –                         (low latency)

    –    (Batch processing)

                                              Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
(NameNode)

• NameNode           HDFS                (File System
  Namespace)
   –                  (blocks)
   –         (block)             Data Node
• Hadoop cluster
•




                                                  Copyright 2009 - Trend Micro Inc.
NameNode                              (Metadata)

•   Name node         Metadata

     –         Metadata

     –

•   Metadata
     –              (files)
     –                   (blocks)

     –       (block)
             (Data Node)
     –
         •      :            (creation time),
                       (replication factor)



                                                       Copyright 2009 - Trend Micro Inc.
NameNode                             (Metadata)
•             (      EditLog)
    –

•   FsImage
    – Name Node

         •                (Name Space)
         •        (Block)     (File)

         •
    – NameNode
      FsImage  EditLog


•   Checkpoint
    –     NameNode
    –           FsImange
        EditLog    EditLog
                          FsImange



                                                      Copyright 2009 - Trend Micro Inc.
(Secondary NameNode)

•    NameNode        FsImage     EditLog        NameNode

•    FSImage   EditLog                           FSImage
•        FSImage       NameNode
    – NameNode        EditLog
• Secondary NameNode            NameNode           (Fail over)
    – Hadoop              Name Node


          FsImage
                                      FsImage
                                       (new)

          EditLog



                                                           Copyright 2009 - Trend Micro Inc.
NameNode

•   NameNode          SPOF (single point of failure)
•              (High Availablity)


               SPOF!!




                                                Copyright 2009 - Trend Micro Inc.
(DataNode)

•                    (Blocks)

    –                     (     ext3)

    –        block   metadata
        •               (CRC), block

    –
•   Block
    –            Blocks
      NameNode
    –   NameNode
      block
            NameNode
      block



                                        Copyright 2009 - Trend Micro Inc.
HDFS –                     (Replication)

•             3
•                                 (block size)
    (replication factor)
•                                     (rack- aware)
        .




                                                      Copyright 2009 - Trend Micro Inc.
Block Placement

• Policy (v0.19)
    –
    –
    –
    –
•




                   Copyright 2009 - Trend Micro Inc.
Heartbeats

• DataNode   Heartbeats    NameNode
   –   3
• NameNode    Heartbeats      DataNode




                                         Copyright 2009 - Trend Micro Inc.
(Data Correctness)

•       Checksum
    – Cyclic Redundancy Check (CRC32 )
•
    –          512                Checksum
    – DataNode    Checksum
•
    –                     Checksum
    –




                                             Copyright 2009 - Trend Micro Inc.
(User Interface)

•   API
     – Java API
     – C language wrapper for the Java API is also avaiable

•   POSIX like command
     – hadoop dfs -mkdir /foodir
     – hadoop dfs -cat /foodir/myfile.txt
     – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt

•   DFSAdmin
     – bin/hadoop dfsadmin –safemode
     – bin/hadoop dfsadmin –report
     – bin/hadoop dfsadmin -refreshNodes

•   Web
     – http://host:port/dfshealth.jsp


                                                                   Copyright 2009 - Trend Micro Inc.
Web




      Copyright 2009 - Trend Micro Inc.
Web
  (http://172.16.203.136:50070)




Classification                    Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
Java API




           Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
• Hadoop document and installation
   – http://hadoop.apache.org/
• Hadoop Wiki
   – http://wiki.apache.org/hadoop/
• Google File System Paper
   – http://labs.google.com/papers/gfs.html




                                              Copyright 2009 - Trend Micro Inc.

Weitere ähnliche Inhalte

Ähnlich wie Introduction to Hadoop and HDFS

Zh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud ComputingZh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud Computingkevin liao
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Basekevin liao
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Waredeimos
 
Big Data: Introduction to Hadoop
Big Data: Introduction to HadoopBig Data: Introduction to Hadoop
Big Data: Introduction to Hadooptokopedia
 
Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Fahmi Fachreza
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyFirman Gautama
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Richard McDougall
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reducekevin liao
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)Joseph Chiang
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 

Ähnlich wie Introduction to Hadoop and HDFS (20)

Zh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud ComputingZh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud Computing
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Base
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Ware
 
Big Data: Introduction to Hadoop
Big Data: Introduction to HadoopBig Data: Introduction to Hadoop
Big Data: Introduction to Hadoop
 
Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reduce
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 

Kürzlich hochgeladen

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Introduction to Hadoop and HDFS

  • 1. Hadoop Hadoop (HDFS)  Public 2009/5/13
  • 2. • Hadoop • Hadoop (HDFS) – – – • Copyright 2009 - Trend Micro Inc.
  • 3. Hadoop ? • Hadoop • Apache top-level Cloud Applications • Hadoop – (HDFS) MapReduce HBase – MapReduce • Java Hadoop Distributed File System (HDFS) • C++/Java/Shell/ Command… A Cluster of Machines • – Linux Mac OS/X Windows Solaris – Copyright 2009 - Trend Micro Inc.
  • 4. Hadoop • 2003 2 – Google MapReduce • 2003 10 – Google Goofle File System (GFS) • 2004 12 – Google MapReduce • 2005 7 – Doug Cutting Nutch MapReduce • 2006 2 – Hadoop Nutch Lucene • 2006 11 – Google Bigtable Copyright 2009 - Trend Micro Inc.
  • 5. Hadoop • 2007 2 – Mike Cafarella Hbase • 2007 4 – Yahoo! 1000 Hadoop • 2008 1 – Hadoop Apache Copyright 2009 - Trend Micro Inc.
  • 6. Who use Hadoop? • Yahoo! – Hadoop 2 CPU 10 • Google – Hadoop • Amazon – Amazon Hadoop – • IBM – Blue Cloud • Trend Micro – Hadoop • Hadoop … – http://wiki.apache.org/hadoop/PoweredBy Copyright 2009 - Trend Micro Inc.
  • 7. Hadoop (HDFS) Copyright 2009 - Trend Micro Inc.
  • 8. HDFS • (Single Namespace) • – 1 1 10 Peta Bytes • – Write-once-read-many – • (block) – 128 MB – (replica) (DataNode) Copyright 2009 - Trend Micro Inc.
  • 9. HDFS • – • (File replication) – 3 . – • – – • – (low latency) – (Batch processing) Copyright 2009 - Trend Micro Inc.
  • 10. Copyright 2009 - Trend Micro Inc.
  • 11. Copyright 2009 - Trend Micro Inc.
  • 12. (NameNode) • NameNode HDFS (File System Namespace) – (blocks) – (block) Data Node • Hadoop cluster • Copyright 2009 - Trend Micro Inc.
  • 13. NameNode (Metadata) • Name node Metadata – Metadata – • Metadata – (files) – (blocks) – (block) (Data Node) – • : (creation time), (replication factor) Copyright 2009 - Trend Micro Inc.
  • 14. NameNode (Metadata) • ( EditLog) – • FsImage – Name Node • (Name Space) • (Block) (File) • – NameNode FsImage EditLog • Checkpoint – NameNode – FsImange EditLog EditLog FsImange Copyright 2009 - Trend Micro Inc.
  • 15. (Secondary NameNode) • NameNode FsImage EditLog NameNode • FSImage EditLog FSImage • FSImage NameNode – NameNode EditLog • Secondary NameNode NameNode (Fail over) – Hadoop Name Node FsImage FsImage (new) EditLog Copyright 2009 - Trend Micro Inc.
  • 16. NameNode • NameNode SPOF (single point of failure) • (High Availablity) SPOF!! Copyright 2009 - Trend Micro Inc.
  • 17. (DataNode) • (Blocks) – ( ext3) – block metadata • (CRC), block – • Block – Blocks NameNode – NameNode block NameNode block Copyright 2009 - Trend Micro Inc.
  • 18. HDFS – (Replication) • 3 • (block size) (replication factor) • (rack- aware) . Copyright 2009 - Trend Micro Inc.
  • 19. Block Placement • Policy (v0.19) – – – – • Copyright 2009 - Trend Micro Inc.
  • 20. Heartbeats • DataNode Heartbeats NameNode – 3 • NameNode Heartbeats DataNode Copyright 2009 - Trend Micro Inc.
  • 21. (Data Correctness) • Checksum – Cyclic Redundancy Check (CRC32 ) • – 512 Checksum – DataNode Checksum • – Checksum – Copyright 2009 - Trend Micro Inc.
  • 22. (User Interface) • API – Java API – C language wrapper for the Java API is also avaiable • POSIX like command – hadoop dfs -mkdir /foodir – hadoop dfs -cat /foodir/myfile.txt – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt • DFSAdmin – bin/hadoop dfsadmin –safemode – bin/hadoop dfsadmin –report – bin/hadoop dfsadmin -refreshNodes • Web – http://host:port/dfshealth.jsp Copyright 2009 - Trend Micro Inc.
  • 23. Web Copyright 2009 - Trend Micro Inc.
  • 24. Web (http://172.16.203.136:50070) Classification Copyright 2009 - Trend Micro Inc.
  • 25. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 26. Java API Copyright 2009 - Trend Micro Inc.
  • 27. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 28. • Hadoop document and installation – http://hadoop.apache.org/ • Hadoop Wiki – http://wiki.apache.org/hadoop/ • Google File System Paper – http://labs.google.com/papers/gfs.html Copyright 2009 - Trend Micro Inc.