SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Students: An Du – Tan Tran – Toan Do – Vinh Nguyen
      Instructor: Professor Lothar Piepmayer




  HDFS at a glance
Agenda

1. Design of HDFS
2.1. HDFS Concepts – Blocks
2.1. HDFS Concepts - Namenode and datanode
3.1 Dataflow - Anatomy of a read file
3.2 Dataflow - Anatomy of a write file
3.3 Dataflow - Coherency model
4. Parallel copying
5. Demo - Command line
The Design of HDFS

Very large distributed file system
  Up to 10K nodes, 1 billion files, 100PB
Streaming data access
  Write once, read many times
Commodity hardware
  Files are replicated to handle hardware failure
        Detect failures and recover from them
Worst fit with

Low-latency data access
Lots of small files
Multiple writers, arbitrary file modifications
HDFS Blocks

Normal Filesystem blocks are few kilobytes
HDFS has Large block size
    Default 64MB
    Typical 128MB
Unlike a file system for a single disk. A file in HDFS that is
 smaller than a single block does not occupy a full block
HDFS Blocks


A file is stored in blocks on various nodes in hadoop cluster.
HDFS creates several replication of the data blocks
Each and every data block is replicated to multiple nodes
 across the cluster.
HDFS Blocks




Dhruba Borthakur - Design and Evolution of the Apache Hadoop File System HDFS.pdf
Why blocks in HDFS so large?

Minimize the cost of seeks
=> Make transfer time = disk transfer rate
Benefit of Block abstraction

A file can be larger than any single disk in the network
Simplify the storage subsystem
Providing fault tolerance and availability
Namenode & Datanodes
Namenode & Datanodes

 Namenode (master)
 – manages the filesystem namespace
 – maintains the filesystem tree and metadata for all the
 files and directories in the tree.
 Datanodes (slaves)
 – store data in the local file system
 – Periodically report back to the namenode with lists of all
 existing blocks
 Clients communicate with both namenode and datanodes.
Anatomy of a File Read
Anatomy of a File Read


Benefits:
- Avoid “bottle neck”
- Multi-Clients
Writing in HDFS


Namenode
Datanode
Block
Writing in HDFS


Exeptions: Node failed
  Pipeline close, remove block and addr of failed
   node
  Namenode arrange new datanode
Coherency Model


Not visible when copying
use sync()
Apply in applications
Parallel copying in HDFS

Transfer data between clusters
   % hadoop distcp hdfs://namenode1/foo hdfs://namenode2/bar
Implemented as MapReduce, each file per map
Each map take at least 256MB
Default max maps is 20 per node
The diffirent versions only supported by webhdfs protocol:
   % hadoop distcp webhdfs://namenode1:50070/foo
      webhdfs://namenode2:50070/bar
Setup

Cluster with 03 nodes:
    04 GB RAM
    02 CPU @ 2.0Ghz+
    100G HDD
Using vmWare on 03 different servers
Network: 100Mbps
Operating System: Ubuntu 11.04
    Windows: Not tested
Setup Guide - Single Node


java runtime ssh
  http://hadoop.apache.org/common/docs/r1.0.3/si
   ngle_node_setup.html
/etc/hadoop/core-site.xml
/etc/hadoop/hdfs-site.xml
Cluster


/etc/hadoop/masters
/etc/hadoop/slaves
http://hadoop.apache.org/common/docs/r1.0.3
/cluster_setup.html
Command Line

Similar to *nix
    hadoop fs -ls /
    hadoop fs -mkdir /test
    hadoop fs -rmr /test
    hadoop fs -cp /1 /2
    hadoop fs -copyFromLocal /3 hdfs://localhost/
Namedone-specific:
    hadoop namenode -format
    start-all.sh
Command Line

Sorting: Standard method to test cluster
    TeraGen: Generate dummy data
    TeraSort: Sort
    TeraValidate: Validate sort result
Command Line:
    hadoop jar /usr/share/hadoop/hadoop-examples-1.0.3.jar
     terasort hdfs://ubuntu/10GdataUnsorted /10GDataSorted41
Benchmark Result

2 Nodes, 1GB data: 0:03:38
3 Nodes, 1GB data: 0:03:13

2 Nodes, 10GB data: 0:38:07
3 Nodes, 10GB data: 0:31:28

Virtual Machine's harddisks are the bottle-neck
Who
wins…?
References

Hadoop The Definitive Guide

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
The basic concept of Linux FIleSystem
The basic concept of Linux FIleSystemThe basic concept of Linux FIleSystem
The basic concept of Linux FIleSystemHungWei Chiu
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemBhavesh Padharia
 
12 linux archiving tools
12 linux archiving tools12 linux archiving tools
12 linux archiving toolsShay Cohen
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsDataWorks Summit
 

Was ist angesagt? (20)

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Anatomy of file read in hadoop
Anatomy of file read in hadoopAnatomy of file read in hadoop
Anatomy of file read in hadoop
 
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
The basic concept of Linux FIleSystem
The basic concept of Linux FIleSystemThe basic concept of Linux FIleSystem
The basic concept of Linux FIleSystem
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
12 linux archiving tools
12 linux archiving tools12 linux archiving tools
12 linux archiving tools
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
6 technical-dns-workshop-day3
6 technical-dns-workshop-day36 technical-dns-workshop-day3
6 technical-dns-workshop-day3
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once Semantics
 

Ähnlich wie Hadoop at a glance

Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxsunithachphd
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxSakthiVinoth78
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptxAyush .
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangaloreKelly Technologies
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabadKelly Technologies
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptxSwarnaSLcse
 

Ähnlich wie Hadoop at a glance (20)

Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptx
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptx
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptx
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangalore
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabad
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Hdfs
HdfsHdfs
Hdfs
 

Mehr von Tan Tran

Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)Tan Tran
 
Managing for results
Managing for resultsManaging for results
Managing for resultsTan Tran
 
Software estimation techniques
Software estimation techniquesSoftware estimation techniques
Software estimation techniquesTan Tran
 
Personal task management
Personal task managementPersonal task management
Personal task managementTan Tran
 
Jira in action
Jira in actionJira in action
Jira in actionTan Tran
 
Beautifying Data in the real world
Beautifying Data in the real worldBeautifying Data in the real world
Beautifying Data in the real worldTan Tran
 
BIS Vietnamese-German University
BIS Vietnamese-German UniversityBIS Vietnamese-German University
BIS Vietnamese-German UniversityTan Tran
 
Phac thao compendium
Phac thao compendiumPhac thao compendium
Phac thao compendiumTan Tran
 
Management skills in IT - Communication
Management skills in IT - CommunicationManagement skills in IT - Communication
Management skills in IT - CommunicationTan Tran
 
Internet governance and the filtering problems
Internet governance and the filtering problemsInternet governance and the filtering problems
Internet governance and the filtering problemsTan Tran
 
C# conventions & good practices
C# conventions & good practicesC# conventions & good practices
C# conventions & good practicesTan Tran
 
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy YênTổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy YênTan Tran
 
Flash coding convention for action script 3
Flash coding convention for action script 3Flash coding convention for action script 3
Flash coding convention for action script 3Tan Tran
 
Java convention
Java conventionJava convention
Java conventionTan Tran
 
VGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information ManagementVGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information ManagementTan Tran
 
Scrum introduction
Scrum introductionScrum introduction
Scrum introductionTan Tran
 

Mehr von Tan Tran (16)

Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)Mật thư trò chơi lớn (tóm tắt)
Mật thư trò chơi lớn (tóm tắt)
 
Managing for results
Managing for resultsManaging for results
Managing for results
 
Software estimation techniques
Software estimation techniquesSoftware estimation techniques
Software estimation techniques
 
Personal task management
Personal task managementPersonal task management
Personal task management
 
Jira in action
Jira in actionJira in action
Jira in action
 
Beautifying Data in the real world
Beautifying Data in the real worldBeautifying Data in the real world
Beautifying Data in the real world
 
BIS Vietnamese-German University
BIS Vietnamese-German UniversityBIS Vietnamese-German University
BIS Vietnamese-German University
 
Phac thao compendium
Phac thao compendiumPhac thao compendium
Phac thao compendium
 
Management skills in IT - Communication
Management skills in IT - CommunicationManagement skills in IT - Communication
Management skills in IT - Communication
 
Internet governance and the filtering problems
Internet governance and the filtering problemsInternet governance and the filtering problems
Internet governance and the filtering problems
 
C# conventions & good practices
C# conventions & good practicesC# conventions & good practices
C# conventions & good practices
 
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy YênTổng hợp Dâng Ngài - nhạc sĩ Thy Yên
Tổng hợp Dâng Ngài - nhạc sĩ Thy Yên
 
Flash coding convention for action script 3
Flash coding convention for action script 3Flash coding convention for action script 3
Flash coding convention for action script 3
 
Java convention
Java conventionJava convention
Java convention
 
VGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information ManagementVGU - BIS2010: Integrated Information Management
VGU - BIS2010: Integrated Information Management
 
Scrum introduction
Scrum introductionScrum introduction
Scrum introduction
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Hadoop at a glance

  • 1. Students: An Du – Tan Tran – Toan Do – Vinh Nguyen Instructor: Professor Lothar Piepmayer HDFS at a glance
  • 2. Agenda 1. Design of HDFS 2.1. HDFS Concepts – Blocks 2.1. HDFS Concepts - Namenode and datanode 3.1 Dataflow - Anatomy of a read file 3.2 Dataflow - Anatomy of a write file 3.3 Dataflow - Coherency model 4. Parallel copying 5. Demo - Command line
  • 3. The Design of HDFS Very large distributed file system Up to 10K nodes, 1 billion files, 100PB Streaming data access Write once, read many times Commodity hardware Files are replicated to handle hardware failure Detect failures and recover from them
  • 4. Worst fit with Low-latency data access Lots of small files Multiple writers, arbitrary file modifications
  • 5. HDFS Blocks Normal Filesystem blocks are few kilobytes HDFS has Large block size  Default 64MB  Typical 128MB Unlike a file system for a single disk. A file in HDFS that is smaller than a single block does not occupy a full block
  • 6. HDFS Blocks A file is stored in blocks on various nodes in hadoop cluster. HDFS creates several replication of the data blocks Each and every data block is replicated to multiple nodes across the cluster.
  • 7. HDFS Blocks Dhruba Borthakur - Design and Evolution of the Apache Hadoop File System HDFS.pdf
  • 8. Why blocks in HDFS so large? Minimize the cost of seeks => Make transfer time = disk transfer rate
  • 9. Benefit of Block abstraction A file can be larger than any single disk in the network Simplify the storage subsystem Providing fault tolerance and availability
  • 11. Namenode & Datanodes  Namenode (master) – manages the filesystem namespace – maintains the filesystem tree and metadata for all the files and directories in the tree.  Datanodes (slaves) – store data in the local file system – Periodically report back to the namenode with lists of all existing blocks  Clients communicate with both namenode and datanodes.
  • 12. Anatomy of a File Read
  • 13. Anatomy of a File Read Benefits: - Avoid “bottle neck” - Multi-Clients
  • 15.
  • 16. Writing in HDFS Exeptions: Node failed Pipeline close, remove block and addr of failed node Namenode arrange new datanode
  • 17. Coherency Model Not visible when copying use sync() Apply in applications
  • 18. Parallel copying in HDFS Transfer data between clusters % hadoop distcp hdfs://namenode1/foo hdfs://namenode2/bar Implemented as MapReduce, each file per map Each map take at least 256MB Default max maps is 20 per node The diffirent versions only supported by webhdfs protocol: % hadoop distcp webhdfs://namenode1:50070/foo webhdfs://namenode2:50070/bar
  • 19. Setup Cluster with 03 nodes:  04 GB RAM  02 CPU @ 2.0Ghz+  100G HDD Using vmWare on 03 different servers Network: 100Mbps Operating System: Ubuntu 11.04  Windows: Not tested
  • 20. Setup Guide - Single Node java runtime ssh http://hadoop.apache.org/common/docs/r1.0.3/si ngle_node_setup.html /etc/hadoop/core-site.xml /etc/hadoop/hdfs-site.xml
  • 22. Command Line Similar to *nix  hadoop fs -ls /  hadoop fs -mkdir /test  hadoop fs -rmr /test  hadoop fs -cp /1 /2  hadoop fs -copyFromLocal /3 hdfs://localhost/ Namedone-specific:  hadoop namenode -format  start-all.sh
  • 23. Command Line Sorting: Standard method to test cluster  TeraGen: Generate dummy data  TeraSort: Sort  TeraValidate: Validate sort result Command Line:  hadoop jar /usr/share/hadoop/hadoop-examples-1.0.3.jar terasort hdfs://ubuntu/10GdataUnsorted /10GDataSorted41
  • 24. Benchmark Result 2 Nodes, 1GB data: 0:03:38 3 Nodes, 1GB data: 0:03:13 2 Nodes, 10GB data: 0:38:07 3 Nodes, 10GB data: 0:31:28 Virtual Machine's harddisks are the bottle-neck