20. Architecture
•
•
•
•
One NameNode, many DataNodes
File is a sequence of blocks (64 Mb)
Meta data in memory of namenode
Client interacts with datanodes directly
20
24. NameNode
•
•
•
Manages the FS namespace
•
Makes all decisions regarding replication of
blocks
•
SPOF
Keeps in Memory all files and blocks
Executes FS operations like opening,
closing, and renaming
24
25. NameNode
•
NS stored in 2 files: namespace image, edit
log
•
Determines the mapping of blocks to
DataNodes
•
Receives a Heartbeat and a Blockreport
from each of the DataNodes
25
30. DataNode
•
•
•
Usually 1 DataNode 1 machine
Serves read and write client requests
Performs block creation, deletion, and
replication
30
31. DataNode
•
Each block represented as 2 files: data,
metadata
•
Metadata contains checksums, generation
stamp
•
Handshake while starting the cluster to
verify ID of namespace and SW version
31
39. Read
final Configuration conf = new Configuration();
final FileSystem fs = FileSystem.get(conf);
InputStream in = null;
try {
in = fs.open(new Path("/user/test-hdfs/somefile"));
//do whatever with inpustream
} finally {
IOUtils.closeStream(in);
}
39
40. Write
•
•
•
•
Ask NameNode to create file
NameNode performs some checks
Client ask for list of DataNodes
Forms a pipeline and ack queue
40
41. Write
•
•
Send packets to the pipeline
In case of failure
•
•
•
•
close pipeline
remove bad DN, tell NN
retry sending to good DNs
block will be replicated asynchronously
41
43. Write
final Configuration conf = new Configuration();
final FileSystem fs = FileSystem.get(conf);
FSDataOutputStream out = null;
try {
out = fs.create(new Path("/user/test-hdfs/newfile"));
//write to out
} finally {
IOUtils.closeStream(in);
}
43
44. Block Placement
•
•
Reliability/Bandwidth trade off
By default: same node, 2 random nodes
from another rack, other random nodes
•
No Datanode contains more than one
replica of any block
•
No rack contains more than two
replicas of the same block
44
46. Balancer
•
Compare utilization of node with
utilization of cluster
•
Guarantees that the decision does not
reduce either the number of replicas or the
number of racks
•
Minimizes the inter-rack data copying
46