14. Hadoop Concepts
• Moving Processing is faster than moving Data
• Data Locality
• HDFS was designed for many millions of large
files, not billions of small files
• Write Once, Read Many times
• Default Replication(3)
15. Why not Moving Data
• Moving data from storage layer to computing
layer
• Can’t start processing until data movement
completion
• More nodes, more contentions
17. HDFS
$ hadoop put /home/frank/data1.txt
Data Node1 Data Node2 Data Node3 Data Node4
Name Node
Datanode:1,2,3
/user/frank/data1.txt
Replications:3
Blocksize:64M
150M
64M
Datanode:1,2,3
18. HDFS
$ hadoop put /home/frank/data1.txt
Data Node1 Data Node2 Data Node3 Data Node4
Name Node
Datanode:1,2,3
Datanode:2,3,4
/user/frank/data1.txt
Replications:3
Blocksize:64M
150M
64M
64M
19. HDFS
$ hadoop put /home/frank/data1.txt
Data Node1 Data Node2 Data Node3 Data Node4
Name Node
Datanode:1,2,3
Datanode:2,3,4
Datanode:1,2,4
/user/frank/data1.txt
Replications:3
Blocksize:64M
150M
64M
64M
22M
24. Data Locality
Data Node
Node Manager
Data Node
Node Manager
Data Node
Node Manager
Data Node
Node Manager
Name Node
Resource
ManagerJVM daemons
25. Suggested Host Configurations
Data Node
Node Manager
Data Node
Node Manager
Data Node
Node Manager
Data Node
Node Manager
Name Node
Resource
ManagerJVM daemons