SlideShare ist ein Scribd-Unternehmen logo
1 von 23
++
Hadoop 기본 과정
++
Overview
HDFSHDFSHDFSHDFS
ImpalaImpalaImpalaImpala
MapReduceMapReduceMapReduceMapReduce
CascadingCascadingCascadingCascading HiveHiveHiveHive
++
Big Data for What?
Service
CAP Theorem, Fast Response ,Scale Out , Schema
Free ...
Distributor with RDBMS
NoSQL
MongoDB , HBASE , CouchDB ...
Analysis
Hadoop <--- today’s topic!!!
++
What’s Hadoop
Consist of
HDFS (Hadoop Distributed File System)
MapReduce
++
HDFS Architecture
master
namenode
slave
bunch of
datanode
NameNodeNameNodeNameNodeNameNode
DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
++
single master
Strong Point
simple architecture
master have global knowledge.
file and block namespace (memory and disk)
mapping from files to blocks (memory and disk)
location of each block’s replicas ( only memory)
master can make sophisticated decisions.
++
single master
Weak Point
SPOF(= single point of failure )
bottleneck
minimizing master’s involvement is important
++
Fast Recovery for NameNode
Secondary Namenode
crawls namenode’s
operation log
maintains
namenode’s data
NameNodeNameNodeNameNodeNameNode
DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
Secondary NameNodeSecondary NameNodeSecondary NameNodeSecondary NameNode
++
HA for NameNode
active namenode
do normal
namenode’s
operation
standby namenode
maintain
namenode’s data
ready to be active
namenode
NameNode(active)NameNode(active)NameNode(active)NameNode(active)
DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
NameNode(standby)NameNode(standby)NameNode(standby)NameNode(standby)
++
block
each file consists of blocks
size
default 64M
replication ( default 3 )
++
write operation
client send ‘write request’ to
namenode
namenode lock file and select
datanode to be written.
namenode response datanode list
to client.
client send file content to datanode.
datanode store file and relay to
other datanode.
finally client send close request to
namenode.
namenode release write lock
NameNodeNameNodeNameNodeNameNode
DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
clientclientclientclient
write lock & allocate datanodewrite lock & allocate datanodewrite lock & allocate datanodewrite lock & allocate datanode
++
read operation
client send ‘read request’ to
namenode
namenode lock file and select
datanode to be written.
namenode response datanode list
to client.
client send read request to
datanode.
datanode send content to client
finally client send close request to
namenode.
namenode release read lock
NameNodeNameNodeNameNodeNameNode
DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
clientclientclientclient
read lockread lockread lockread lock
++
block(again)
reason to use big-size-block
reduce client’s need to interact with namenode
reduce the size of metadata stored on namenode
++
namenode’s operation
namespace management and locking
replica placement
creation, re-replication, rebalancing
garbage collection
stale replica detection
++ namespace management and locking
goal
ensure proper serialization
use read lock/write lock
++
block replica placement
goal
maximize data reliability and availability
maximize network bandwidth utilization
default strategy is ...
one on same datanode.
one on other datanode in same rack.
one on other datanode in other rack.
++ creation, re-replication, rebalancing
creation
client create new files
consider
disk space utilization
number of recent creation
spread replicas
re-replication
number of available replica falls below proper goal
datanode down, replica corruption ...
rebalancing
move replicas for better disk space and load balancing
++
garbage collection
what’s garbage?
block not in namenode’s metadata.
mechanism
when exchanging HeartBeat with namenode,
datanode reports subset of block it has.
master replies with garbage blocks.
datanode deletes grabage blocks.
++
stale replica detection
mechanism
storing with generation timestamp.
when restarting, datanode reports its set of blocks with its
generation timestamp
++
Datanode’s operation
check data integrity
datanode use checksumming to detect corruption.
++
filesystem api
hdfs provide basic linux utilities.
ex)
hdfs dfs -mkdir -p /foo
hdfs dfs -ls /foo
hdfs dfs -cat /foo/bar.txt
hdfs dfs -rm -r /foo
++
etc
raid?
native library?
++
end
thanks ....

Weitere ähnliche Inhalte

Was ist angesagt?

HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemAnshul Bhatnagar
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 

Was ist angesagt? (19)

HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Redis vs Memcached
Redis vs MemcachedRedis vs Memcached
Redis vs Memcached
 

Ähnlich wie HDFS introduction

Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name NodeAaron Cordova
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataMarco Garcia
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 

Ähnlich wie HDFS introduction (20)

Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigData
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

HDFS introduction

  • 3. ++ Big Data for What? Service CAP Theorem, Fast Response ,Scale Out , Schema Free ... Distributor with RDBMS NoSQL MongoDB , HBASE , CouchDB ... Analysis Hadoop <--- today’s topic!!!
  • 4. ++ What’s Hadoop Consist of HDFS (Hadoop Distributed File System) MapReduce
  • 5. ++ HDFS Architecture master namenode slave bunch of datanode NameNodeNameNodeNameNodeNameNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode
  • 6. ++ single master Strong Point simple architecture master have global knowledge. file and block namespace (memory and disk) mapping from files to blocks (memory and disk) location of each block’s replicas ( only memory) master can make sophisticated decisions.
  • 7. ++ single master Weak Point SPOF(= single point of failure ) bottleneck minimizing master’s involvement is important
  • 8. ++ Fast Recovery for NameNode Secondary Namenode crawls namenode’s operation log maintains namenode’s data NameNodeNameNodeNameNodeNameNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode Secondary NameNodeSecondary NameNodeSecondary NameNodeSecondary NameNode
  • 9. ++ HA for NameNode active namenode do normal namenode’s operation standby namenode maintain namenode’s data ready to be active namenode NameNode(active)NameNode(active)NameNode(active)NameNode(active) DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode NameNode(standby)NameNode(standby)NameNode(standby)NameNode(standby)
  • 10. ++ block each file consists of blocks size default 64M replication ( default 3 )
  • 11. ++ write operation client send ‘write request’ to namenode namenode lock file and select datanode to be written. namenode response datanode list to client. client send file content to datanode. datanode store file and relay to other datanode. finally client send close request to namenode. namenode release write lock NameNodeNameNodeNameNodeNameNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode clientclientclientclient write lock & allocate datanodewrite lock & allocate datanodewrite lock & allocate datanodewrite lock & allocate datanode
  • 12. ++ read operation client send ‘read request’ to namenode namenode lock file and select datanode to be written. namenode response datanode list to client. client send read request to datanode. datanode send content to client finally client send close request to namenode. namenode release read lock NameNodeNameNodeNameNodeNameNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode DataNodeDataNodeDataNodeDataNode clientclientclientclient read lockread lockread lockread lock
  • 13. ++ block(again) reason to use big-size-block reduce client’s need to interact with namenode reduce the size of metadata stored on namenode
  • 14. ++ namenode’s operation namespace management and locking replica placement creation, re-replication, rebalancing garbage collection stale replica detection
  • 15. ++ namespace management and locking goal ensure proper serialization use read lock/write lock
  • 16. ++ block replica placement goal maximize data reliability and availability maximize network bandwidth utilization default strategy is ... one on same datanode. one on other datanode in same rack. one on other datanode in other rack.
  • 17. ++ creation, re-replication, rebalancing creation client create new files consider disk space utilization number of recent creation spread replicas re-replication number of available replica falls below proper goal datanode down, replica corruption ... rebalancing move replicas for better disk space and load balancing
  • 18. ++ garbage collection what’s garbage? block not in namenode’s metadata. mechanism when exchanging HeartBeat with namenode, datanode reports subset of block it has. master replies with garbage blocks. datanode deletes grabage blocks.
  • 19. ++ stale replica detection mechanism storing with generation timestamp. when restarting, datanode reports its set of blocks with its generation timestamp
  • 20. ++ Datanode’s operation check data integrity datanode use checksumming to detect corruption.
  • 21. ++ filesystem api hdfs provide basic linux utilities. ex) hdfs dfs -mkdir -p /foo hdfs dfs -ls /foo hdfs dfs -cat /foo/bar.txt hdfs dfs -rm -r /foo