Meet Hadoop Family: part 1

HDFS
Meet Hadoop Family: part 1

• What is it?  
Distributed file system, designed to store very large files with streaming data access
patterns
• Why it is needed?  
Very large file 
Streaming data access 
Commodity hardware
• Traditional design limits 
RAC, MPP, brings data to computation, network become bottleneck
• Trade-offs 
High latency data access 
Not good for lot of small files 
Write once, not support multiple write

A Client Reading Data From HDFS

Network Distances in Hadoop
• distance(/d1/r1/n1, /d1/r1/n1) = 0 (processes on the same node) 
• distance(/d1/r1/n1, /d1/r1/n2) = 2 (different nodes on the same rack) 
• distance(/d1/r1/n1,/d1/r2/n3) = 4 (nodesondifferentracksinthesamedatacenter)  
• distance(/d1/r1/n1, /d2/r3/n4) = 6 (nodes in different data centers)

• HDFS blocks, default size 128 mb (for a reason),
default replication 3x
• Name Node, stores metadata of all blocks in the
clusters, location conﬁguration
dfs.namenode.name.dir, default /dfs/xx
• Data nodes, store data blocks, also has metadata
related to local blocks
• POSIX like (almost) permissions, rw(x), users,
groups, mode

• HDFS logs and web Interface,  
port 50070, port 50075
• WebHDFS/ HTTPFS REST interface 
http://sabtu:50070/webhdfs/v1/tmp?user.name=hdfs&op=GETFILESTATUS
{"FileStatus":{"accessTime":0,"blockSize":0,"childrenNum":4,"fileId":16386,"group":"supergroup","length":
0,"modificationTime":1467099643710,"owner":"hdfs","pathSuffix":"","permission":"1777","replication":
0,"type":"DIRECTORY"}}

• High Availability mode
• HDFS federation, similar concept with namespace /
database sharding
• HDFS balancer
• Safe mode
• Distributed copy (distcp)
Some Features

• start cluster 
$HADOOP_PREFIX_HOME/bin/start-dfs.sh
• stop cluster 
$HADOOP_PREFIX_HOME/bin/stop-dfs.sh
• ﬁle operations 
hdfs dfs -cp x y 
hdfs dfs -ls x 
hdfs dfs -cat x 
hdfs dfs -put x y 
hdfs dfs -get x y
Common Commands

Questions? 
https://www.meetup.com/Jakarta-Hadoop-Big-Data/

Meet Hadoop Family: part 1

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Ähnlich wie Meet Hadoop Family: part 1

Ähnlich wie Meet Hadoop Family: part 1 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Meet Hadoop Family: part 1