SlideShare ist ein Scribd-Unternehmen logo
1 von 20
GOOGLE FILE SYSTEM
INTRODUCTION
Designed by Sanjay Ghemawat , Howard Gobioff and Shun-Tak

Leung of Google in 2002-03.
Provides fault tolerance, serving large number of clients with
high aggregate performance.
The field of Google is beyond the searching.
Google store the data in more than 15 thousands commodity
hardware.
Handles the exceptions of Google and other Google specific
challenges in their distributed file system.
DESIGN OVERVIEW
Assumptions
From many inexpensive commodity components that often

fail.
Stores a modest number of large files.
Workloads consist of large streaming reads and small

random reads.
Workloads also have many large, sequential writes that
append data to files.
Efficiently implement well-defined semantics for multiple
clients.
High sustained bandwidth is more important than low latency.
GOOGLE FILE SYSTEM ARCHITECTURE
GFS cluster consists of a single master and multiple
chunkservers.
The basic analogy of GFS is master , client , chunkservers.
Files are divided into fixed-size chunks.
Chunkservers store chunks on local disks as Linux files.
Master maintains all file system metadata.
Includes the namespace, access control information, the
mapping from files to chunks, and the current locations of
chunks.
Clients interact with the master for metadata operations.
Chunkservers need not cache file data .
Chunk
Similar to the concept of block in file systems.
Compared to file systems, the size of chunk is 64 MB.
Less chunks and less metadata for chunks in the master.
Problem in this chunk size is developing a hotspot.
Property of chunk is chunks are stored in chunkserver as
file, chunk handle, i.e., chunk file name.

Metadata
Master stores three major types of metadata: the file and
chunk namespaces, the mapping from files to chunks, and
the location of each chunk’s replicas.
First two types are kept persistent to an operation log stored
on the master’s local disk.
Metadata is stored in memory, master operations are fast.

Easy and efficient for the master to periodically scan .
Periodic scanning is used to implement chunk garbage
collection, re-replication and chunk migration .

Master
Single process ,running on a separate machine that stores
all metadata.
Clients contact master to get the metadata to contact the
chunkservers.
SYSTEM INTERACTION
Read Algorithm
1. Application originates the read request

2. GFS client translates the request form (filename, byte range) -> (filename,
chunk index), and sends it to master
3. Master responds with chunk handle and replica locations (i.e. chunkservers
where the replicas are stored)
4. Client picks a location and sends the (chunk handle, byte range) request to the

location
5. Chunkserver sends requested data to the client
6. Client forwards the data to the application

Write Algorithm
1. Application originates the request
2. GFS client translates request from (filename, data) -> (filename, chunk index),

and sends it to master
3. Master responds with chunk handle and (primary + secondary) replica
locations
4. Client pushes write data to all locations. Data is stored in chunkservers’
internal buffers
5. Client sends write command to primary

6. Primary determines serial order for data instances stored in its buffer and
writes the instances in that order to the chunk
7. Primary sends the serial order to the secondaries and tells them to perform the
write

8. Secondaries respond to the primary
9. Primary responds back to the client
Record Append Algorithm
1. Application originates record append request.
2. GFS client translates requests and sends it to master.
3. Master responds with chunk handle and (primary + secondary) replica locations.
4. Client pushes write data to all replicas of the last chunk of the file.
5. Primary checks if record fits in specified chunk.
6. If record doesn’t fit, then the primary:
Pads the chunk
Tell secondaries to do the same

And informs the client
Client then retries the append with the next chunk
7. If record fits, then the primary:
Appends the record
Tells secondaries to write data at exact offset
Receives responses from secondaries
And sends final response to the client
MASTER OPERATION
Name space management and locking
Multiple operations are to be active and use locks over regions of the

namespace.
GFS does not have a per-directory data structure.
GFS logically represents its namespace as a lookup table.
Each master operation acquires a set of locks before it runs.

Replica placement
A GFS cluster is highly distributed.
The chunk replica placement policy serves , maximize data reliability and
availability, and maximize network bandwidth utilization.

Chunk replicas are also spread across racks.
Creation , Re-replication and Balancing Chunks
Factors for choosing where to place the initially empty replicas:
(1)We want to place new replicas on chunkservers with below-average disksp
ace utilization.
(2) We want to limit the number of “recent” creations on each chunkserver.
(3)Spread replicas of a chunk across racks.
master re-replicates a chunk.
Chunk that needs to be rereplicated is prioritized based on how far it is from its
replication goal.
Finally, the master rebalances replicas periodically.
GARBAGE COLLECTION
 Garbage collection at both the file and chunk levels.
 Deleted by the application, the master logs the deletion

immediately.
 File is just renamed to a hidden name .
 The file can be read under the new, special name and can be

undeleted.
 Memory metadata is erased.
FAULT TOLERANCE
High Availability
Fast Recovery
Chunk Replication
Master Replication

Data Integrity
Chunkserver uses checksumming.
Broken up into 64 KB blocks.
CHALLENGES
 Storage size.
 Bottle neck for the clients.
 Time.
CONCLUSION
Supporting large-scale data processing.
Provides fault tolerance.
Tolerate chunkserver failures.
Delivers high throughput.
Storage platform for research and development.
THANK YOU
QUESTIONS

Weitere ähnliche Inhalte

Was ist angesagt?

Distributed computing
Distributed computingDistributed computing
Distributed computing
shivli0769
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
sumitjain2013
 
Transport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networksTransport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networks
Rushin Shah
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
karan2190
 

Was ist angesagt? (20)

Google file system
Google file systemGoogle file system
Google file system
 
Stream oriented communication
Stream oriented communicationStream oriented communication
Stream oriented communication
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical Clocks
 
Cs6703 grid and cloud computing unit 3
Cs6703 grid and cloud computing unit 3Cs6703 grid and cloud computing unit 3
Cs6703 grid and cloud computing unit 3
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Distributed Systems Naming
Distributed Systems NamingDistributed Systems Naming
Distributed Systems Naming
 
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocols
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Coda file system
Coda file systemCoda file system
Coda file system
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Transport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networksTransport control protocols for Wireless sensor networks
Transport control protocols for Wireless sensor networks
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
 
Mac protocols for ad hoc wireless networks
Mac protocols for ad hoc wireless networks Mac protocols for ad hoc wireless networks
Mac protocols for ad hoc wireless networks
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 

Ähnlich wie GOOGLE FILE SYSTEM

Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
Sri Prasanna
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systems
tugrulh
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 

Ähnlich wie GOOGLE FILE SYSTEM (20)

advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
Google File System
Google File SystemGoogle File System
Google File System
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
 
google file system
google file systemgoogle file system
google file system
 
Lalit
LalitLalit
Lalit
 
Google file system
Google file systemGoogle file system
Google file system
 
Gfs介绍
Gfs介绍Gfs介绍
Gfs介绍
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
 
Lec3 Dfs
Lec3 DfsLec3 Dfs
Lec3 Dfs
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systems
 
storage-systems.pptx
storage-systems.pptxstorage-systems.pptx
storage-systems.pptx
 
Kosmos Filesystem
Kosmos FilesystemKosmos Filesystem
Kosmos Filesystem
 
Hadoop
HadoopHadoop
Hadoop
 
Gfs final
Gfs finalGfs final
Gfs final
 
tittle
tittletittle
tittle
 
Google
GoogleGoogle
Google
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
Gfs
GfsGfs
Gfs
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 

Kürzlich hochgeladen

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 

Kürzlich hochgeladen (20)

Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

GOOGLE FILE SYSTEM

  • 2. INTRODUCTION Designed by Sanjay Ghemawat , Howard Gobioff and Shun-Tak Leung of Google in 2002-03. Provides fault tolerance, serving large number of clients with high aggregate performance. The field of Google is beyond the searching. Google store the data in more than 15 thousands commodity hardware. Handles the exceptions of Google and other Google specific challenges in their distributed file system.
  • 3. DESIGN OVERVIEW Assumptions From many inexpensive commodity components that often fail. Stores a modest number of large files. Workloads consist of large streaming reads and small random reads. Workloads also have many large, sequential writes that append data to files. Efficiently implement well-defined semantics for multiple clients. High sustained bandwidth is more important than low latency.
  • 4. GOOGLE FILE SYSTEM ARCHITECTURE GFS cluster consists of a single master and multiple chunkservers. The basic analogy of GFS is master , client , chunkservers.
  • 5. Files are divided into fixed-size chunks. Chunkservers store chunks on local disks as Linux files. Master maintains all file system metadata. Includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks. Clients interact with the master for metadata operations. Chunkservers need not cache file data .
  • 6. Chunk Similar to the concept of block in file systems. Compared to file systems, the size of chunk is 64 MB. Less chunks and less metadata for chunks in the master. Problem in this chunk size is developing a hotspot. Property of chunk is chunks are stored in chunkserver as file, chunk handle, i.e., chunk file name. Metadata Master stores three major types of metadata: the file and chunk namespaces, the mapping from files to chunks, and the location of each chunk’s replicas.
  • 7. First two types are kept persistent to an operation log stored on the master’s local disk. Metadata is stored in memory, master operations are fast. Easy and efficient for the master to periodically scan . Periodic scanning is used to implement chunk garbage collection, re-replication and chunk migration . Master Single process ,running on a separate machine that stores all metadata. Clients contact master to get the metadata to contact the chunkservers.
  • 8. SYSTEM INTERACTION Read Algorithm 1. Application originates the read request 2. GFS client translates the request form (filename, byte range) -> (filename, chunk index), and sends it to master 3. Master responds with chunk handle and replica locations (i.e. chunkservers where the replicas are stored)
  • 9. 4. Client picks a location and sends the (chunk handle, byte range) request to the location 5. Chunkserver sends requested data to the client 6. Client forwards the data to the application Write Algorithm 1. Application originates the request 2. GFS client translates request from (filename, data) -> (filename, chunk index), and sends it to master 3. Master responds with chunk handle and (primary + secondary) replica locations
  • 10. 4. Client pushes write data to all locations. Data is stored in chunkservers’ internal buffers
  • 11. 5. Client sends write command to primary 6. Primary determines serial order for data instances stored in its buffer and writes the instances in that order to the chunk 7. Primary sends the serial order to the secondaries and tells them to perform the write 8. Secondaries respond to the primary 9. Primary responds back to the client
  • 12. Record Append Algorithm 1. Application originates record append request. 2. GFS client translates requests and sends it to master. 3. Master responds with chunk handle and (primary + secondary) replica locations. 4. Client pushes write data to all replicas of the last chunk of the file. 5. Primary checks if record fits in specified chunk. 6. If record doesn’t fit, then the primary: Pads the chunk Tell secondaries to do the same And informs the client Client then retries the append with the next chunk 7. If record fits, then the primary: Appends the record Tells secondaries to write data at exact offset Receives responses from secondaries And sends final response to the client
  • 13. MASTER OPERATION Name space management and locking Multiple operations are to be active and use locks over regions of the namespace. GFS does not have a per-directory data structure. GFS logically represents its namespace as a lookup table. Each master operation acquires a set of locks before it runs. Replica placement A GFS cluster is highly distributed. The chunk replica placement policy serves , maximize data reliability and availability, and maximize network bandwidth utilization. Chunk replicas are also spread across racks.
  • 14. Creation , Re-replication and Balancing Chunks Factors for choosing where to place the initially empty replicas: (1)We want to place new replicas on chunkservers with below-average disksp ace utilization. (2) We want to limit the number of “recent” creations on each chunkserver. (3)Spread replicas of a chunk across racks. master re-replicates a chunk. Chunk that needs to be rereplicated is prioritized based on how far it is from its replication goal. Finally, the master rebalances replicas periodically.
  • 15. GARBAGE COLLECTION  Garbage collection at both the file and chunk levels.  Deleted by the application, the master logs the deletion immediately.  File is just renamed to a hidden name .  The file can be read under the new, special name and can be undeleted.  Memory metadata is erased.
  • 16. FAULT TOLERANCE High Availability Fast Recovery Chunk Replication Master Replication Data Integrity Chunkserver uses checksumming. Broken up into 64 KB blocks.
  • 17. CHALLENGES  Storage size.  Bottle neck for the clients.  Time.
  • 18. CONCLUSION Supporting large-scale data processing. Provides fault tolerance. Tolerate chunkserver failures. Delivers high throughput. Storage platform for research and development.

Hinweis der Redaktion

  1. Lpppp;pp
  2. Lpppp;pp