1. Google File System
Lalit Kumar
M.Tech Final Year
Compute Science & Engineering Dept.
KEC Dwarahat, Almora
2. Overview
Introduction To GFS
Architecture
Data Flow
System Interactions
Master Operations
Meta Data Management
Garbage Collection
Fault tolerance
Latest Advancement
Drawbacks
Conclusion
References
3. Introduction
More than 15,000 commodity-class PC's.
Multiple clusters distributed worldwide.
Thousands of queries served per second.
One query reads 100's of MB of data.
One query consumes 10's of billions of CPU cycles.
Google stores dozens of copies of the entire Web!
Conclusion: Need large, distributed, highly fault tolerant file
system.
4. Architecture
A GFS cluster consists of a single master and multiple chunk-servers
and is accessed by multiple clients
Figure 1: GFS Architecture
Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
5. Master
Manages namespace/metadata.
Manages chunk creation, replication, placement.
Performs snapshot operation to create duplicate of file or directory tree.
Performs checkpointing and logging of changes to metadata
Chunkservers
On startup/failure recovery, reports chunks to master.
Periodically reports sub-set of chunks to master (to detect no longer needed
chunks)
Metadata
Types of Metadata:- File and chunk namespaces, Mapping from files to
chunks, Location of each chunks replicas.
Easy and efficient for the master to periodically scan.
Periodic scanning is used to implement chunk garbage collection, re-
replication and chunk migration .
6. Data is pushed linearly along a carefully picked chain of chunk servers in a
TCP pipelined fashion.
Once a chunkserver receives some data, it starts forwarding immediately to
the next chunkserver
Each machine forwards the data to the closest machine in the network
topology that has not received it.
Data Flow
Figure 2: Data Flow in chunkservers
Source: http://research.google.com/archive/gfs‐sosp2003.pdf
7. System Interactions
Read Algorithm
1. Application originates the read request
2. GFS client translates the request form
(filename, byte range) -> (filename, chunk
index), and sends it to master
3. Master responds with chunk handle and replica
locations (i.e. chunkservers where the replicas
are stored)
4. Client picks a location and sends the (chunk
handle, byte range) request to the location
5. Chunkserver sends requested data to the client
6. Client forwards the data to the application .
Figure 3: Block diagram for Read operation
Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
8. Write Algorithm
1. Application originates the request
2. GFS client translates request from
(filename, data) -> (filename, chunk
index), and sends it to master
3. Master responds with chunk handle and
(primary + secondary) replica locations
4. Client pushes write data to all locations.
Data is stored in chunkserver’s internal
buffers
5. Client sends write command to primary
6. Primary determines serial order for data
instances stored in its buffer and writes the
instances in that order to the chunk
7. Primary sends the serial order to the
secondaries and tells them to perform the
write
8. Secondaries respond to the primary &
primary responds back to the client
Figure 4: Block Diagram for Write operation
Source: Howard Gobioff, “The GFS” Presented at SOSP 2003
9. Master Operation
1. Namespace Management and Locking
GFS maps full pathname to Metadata in a table.
Each master operation acquires a set of locks.
Locking scheme allows concurrent mutations in same directory.
Locks are acquired in a consistent total order to prevent deadlock.
2. Replica Placement
3. Chunk Creation
4. Re-Replication
5. Balancing
10. Each master operation acquires a set of locks before it runs
To make operation on /dir1/dir2/dir3/leaf it first needs the
following locks
– Read-lock on /dir1
– Read-lock on /dir1/dir2
– Read-lock on /dir1/dir2/dir3
– Read-lock or Write-lock on /dir1/dir2/dir3/leaf
File creation doesn’t require write‐lock on parent director read-
lock on the name Sufficient to protect the parent directory from
deletion, rename, or snapsho1ed
1. Namespace Management & Locking
11. 2. Chunk Creation
Master considers several factors
Place new replicas on chunk servers with below‐average disk
space utilization
Limit the number of “recent” creations on each chunk server
Spread replicas of a chunk across racks
12. 3. Re-replication
Master Re-replicate a chunk as soon as the number of available
replicas falls below a user-specified goal.
When a chunkserver becomes unavailable.
When a chunkserver reports a corrupted chunk.
When the replication goal is increased.
Re‐replication placement is similar as for “creation”
13. 4. Balancing
Master Re-balances replicas periodically for better disk space and
load balancing
Master gradually fills up a new chunkserver rather than instantly
swaps it with new chunks (and the heavy write traffic that come with
them!)
14. Metadata Management0
The master stores three major types of metadata:
File and chunk namespaces
Mapping from files to chunks
Locations of each chunk’s replicas
All metadata is kept in the master’s memory.
Figure 5: logical Structure of Metadata
Source: Naushad UzZaman,“Survey on Google File System”,CSC 456,2007
15. Storage reclaimed lazily by GC.
File first renamed to a hidden name.
Hidden files removes if more than three days old.
When hidden file removed, in-memory metadata is removed.
Regularly scans chunk namespace, identifying orphaned chunks. These
are removed.
Chunk servers periodically report chunks they have and the master replies
with the identity of all chunks that are no longer present in the master’s
metadata. The chunkserver is free to delete its replicas of such chunks.
Garbage Collection
16. Fault Tolerance
High availability:
Fast recovery.
Chunk replication.
Master Replication
Data Integrity:
Chunkserver uses checksumming.
Broken up into 64 KB blocks.
17. Latest Advancement
1. Gmail- An easily configurable email service with 15GB of web space.
2. Blogger- A free web-based service that helps consumers publish on the
web without writing code or installing software.
3. Google- “Next generation corporate s/w” A smaller version of the Google
software, modified for private use.
18. Small files will have small number of chunks even one. This can lead to
chunk servers storing these files to become hot spots in case of many client
requests.
Internal Fragmentation.
If there are many such small files the master involvement will increase and
can lead to a potential bottleneck. Having a single master node can become
an issue.
Master memory is a limitation.
Performance might degrade if the numbers of writers and random writes
are more.
No reasoning is provided for the choice of standard chunk size (64MB).
Drawbacks
19. Conclusion
GFS meets Google storage requirements
Incremental growth.
Regular check of component failure.
Data optimization from special operations .
Simple architecture.
Fault Tolerance.
20. References
[1] Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung, The Google
File System, ACM SIGOPS Operating Systems Review, Volume 37, Issue 5,
2003.
[2] Sean Quinlan, Kirk McKusick “GFS-Evolution and Fast-Forward”
Communications of the ACM, Vol 53, 2013.
[3] Thomas Anderson, Michael Dahlin, JeannaNeefe, David Patterson, Drew
Roselli, and Randolph Wang. Serverlessnetworkfil e systems. In Proceedings of
the 15th ACM Symposium on Operating System Principles, pages 109–126,
Copper Mountain Resort, Colorado, December 1995.
[4] Luis-Felipe Cabrera and Darrell D. E. Long. Swift: Using distributed disks
triping to provide high I/O data rates. Computer Systems, 4(4):405–436, 1991.
[5] InterMezzo. http://www.inter-mezzo.org, 2003.