SlideShare a Scribd company logo
1 of 40
Download to read offline
HDFS 
User 
Reference 
Biju 
Nair
Local 
File 
System 
FileA 
FileB 
FileC 
Inode-­‐n 
Inode-­‐m 
Inode-­‐p 
Inode-­‐n 
File 
A0ributes 
Block 
0 
Address 
Block 
1 
Address 
Block 
2 
Address 
Block 
3 
Address 
Inode-­‐m 
File 
A0ributes 
Block 
0 
Address 
Block 
1 
Address 
Block 
2 
Address 
Inode-­‐m 
File 
A0ributes 
Block 
0 
Address 
Block 
1 
Address 
Block 
2 
Address 
Block 
3 
Address 
DISK 
Directory 
MBR 
Par@@on 
Table 
Boot 
block 
Super 
block 
Free 
Space 
Trk 
i-­‐nodes 
Root 
dir 
File 
block 
size 
is 
based 
on 
what 
is 
used 
when 
FS 
is 
defined 
2
Hadoop 
Distributed 
File 
System 
Master 
Host 
(NN) 
FileA 
FileB 
FileC 
HDFS 
Directory 
H1:blk0, 
H2:blk1 
H3:blk0,H1:blk1 
H2:blk0;H3:blk1 
Local 
File 
System 
File 
DISK 
Local 
FS 
Directory 
FileA0 
FileB1 
Inode-­‐x 
Inode-­‐y 
Host 
1 
Local 
FS 
Directory 
FileA1 
FileC0 
Inode-­‐a 
Inode-­‐n 
Host 
2 
Local 
FS 
Directory 
FileB0 
FileC1 
Inode-­‐r 
Inode-­‐c 
Host 
3 
In-­‐x 
In-­‐y 
In-­‐a 
In-­‐n 
In-­‐r 
In-­‐c 
DISK 
DISK 
DISK 
Files 
created 
are 
of 
size 
equal 
to 
the 
HDFS 
blksize 
3
HDFS 
HDFS 
Data 
Transfer 
Protocol 
Date 
Node 
${dfs.data.dir}/current/VERSION 
/blk_<id_1>,/blk_<id_1>.meta 
/... 
/subdir2/ 
HTTP/S 
Data 
Node 
${dfs.data.dir}/current/VERSION 
/blk_<id_1>,/blk_<id_1>.meta 
/... 
/subdir2/ 
Data 
Node 
${dfs.data.dir}/current/VERSION 
/blk_<id_1>,/blk_<id_1>.meta 
/... 
/subdir2/ 
Name 
Node 
${dfs.name.dir}/current/VERSION 
/edits,/fsimage,/fs@me 
Secondary 
Name 
Node 
${fs.checkpoint.dir}/current/VERSION 
/edits,/fsimage,/fs@me 
Hadoop 
CLI 
WebHDFS 
HDFS 
UI 
Data 
Nodes 
RPC 
HTTP 
RPC 
4
HDFS 
Config 
Files 
and 
Ports 
• Default 
configuraJon 
– core-­‐default.xml, 
hdfs-­‐default.xml 
• Site 
specific 
configuraJon 
– core-­‐site.xml, 
hdfs-­‐site.xml 
under 
conf 
• ConfiguraJon 
of 
daemon 
processes 
– hadoop-­‐env.sh 
under 
conf 
• List 
of 
slave/data 
nodes 
– “slaves” 
file 
under 
conf 
• Ports 
– Default 
NN 
UI 
port 
50070 
(HTTP), 
50470 
(HTTPS) 
– Default 
NN 
Port 
8020/9000 
– Default 
DN 
UI 
port 
50075 
(HTTP), 
50475(HTTPS) 
5
HDFS 
-­‐ 
Write 
Flow 
Client 
Name 
Node 
Namespace 
MetaData 
Blockmap 
(Fsimage 
Edit 
files) 
Data 
Node 
Data 
Node 
Data 
Node 
1 
2 
3 
4 
5 
8 
7 
7 
6 
6 
1. Client 
requests 
to 
open 
a 
file 
to 
write 
through 
fs.create() 
call. 
This 
will 
overwrite 
exisJng 
file. 
2. Name 
node 
responds 
with 
a 
lease 
to 
the 
file 
path 
3. Client 
writes 
to 
local 
and 
when 
data 
reaches 
block 
size, 
requests 
Name 
Node 
for 
write 
4. Name 
Node 
responds 
with 
a 
new 
blockid 
and 
the 
desJnaJon 
data 
nodes 
for 
write 
and 
replicaJon 
5. Client 
sends 
the 
first 
data 
node 
the 
data 
and 
the 
checksum 
generated 
on 
the 
data 
to 
be 
wriaen 
6. First 
data 
node 
writes 
the 
data 
and 
checksum 
and 
in 
parallel 
pipelines 
the 
replicaJons 
to 
other 
DN 
7. Each 
data 
node 
where 
the 
data 
is 
replicated 
responds 
back 
with 
success 
/failure 
to 
the 
first 
DN 
8. First 
data 
node 
in 
turn 
informs 
to 
the 
Name 
node 
that 
the 
write 
request 
for 
the 
block 
is 
complete 
which 
in 
turn 
will 
update 
its 
block 
map 
Note: 
There 
can 
be 
only 
one 
write 
at 
a 
Jme 
on 
a 
file 
6
HDFS 
-­‐ 
Read 
Flow 
Client 
Name 
Node 
Namespace 
MetaData 
Blockmap 
(Fsimage 
Edit 
files) 
Data 
Node 
Data 
Node 
Data 
Node 
1 
2 
3 
4 
5 
6 
7 
1. Client 
requests 
to 
open 
a 
file 
to 
read 
through 
fs.open() 
call 
2. Name 
node 
responds 
with 
a 
lease 
to 
the 
file 
path 
3. Client 
requests 
for 
read 
the 
data 
in 
the 
file 
4. Name 
Node 
responds 
with 
block 
ids 
in 
sequence 
and 
the 
corresponding 
data 
nodes 
5. Client 
reaches 
out 
directly 
to 
the 
DNs 
for 
each 
block 
of 
data 
in 
the 
file 
6. When 
DNs 
sends 
back 
data 
along 
with 
check 
sum, 
client 
performs 
a 
checksum 
verificaJon 
by 
generaJng 
a 
checksum 
7. If 
the 
checksum 
verificaJon 
fails 
client 
reaches 
out 
to 
other 
DNs 
where 
the 
re 
is 
a 
replicaJon 
7
HDFS 
-­‐ 
Name 
Node 
Fsimage 
(MetaData) 
Namespace 
Ownership 
Permissions 
Create/mod/Access 
Jme, 
Is 
hidden 
EditFile 
(Journal) 
Changes 
to 
metadata 
BlockMap 
(In-­‐memory) 
Details 
on 
File 
blocks 
and 
where 
they 
are 
stored 
1. Name 
node 
manages 
the 
HDFS 
file 
system 
using 
the 
fsimage/edifile 
and 
block-­‐map 
data 
structures 
2. Fsimage 
and 
edifile 
data 
are 
stored 
on 
disk. 
When 
hdfs 
starts 
they 
are 
read, 
merged 
and 
stored 
in-­‐memory 
3. Data 
nodes 
sends 
details 
about 
the 
blocks 
they 
are 
storing 
when 
it 
starts 
and 
also 
at 
regular 
intervals 
4. Name 
node 
uses 
the 
block 
map 
send 
by 
data 
nodes 
to 
build 
the 
BlockMap 
data 
structure 
data 
5. The 
BlockMap 
data 
is 
used 
when 
requests 
for 
reads 
on 
files 
comes 
to 
the 
FileSystem 
6. Also 
the 
BlockMap 
data 
is 
used 
to 
idenJfy 
the 
under/over 
replicated 
files 
which 
requires 
correcJon 
7. At 
no 
point 
Name 
node 
stores 
data 
locally 
or 
directly 
involved 
in 
transferring 
data 
from 
files 
to 
client 
8. The 
client 
reading/wriJng 
data 
receives 
meta 
data 
details 
from 
NN 
and 
then 
directly 
works 
with 
DNs 
9. Name 
nodes 
require 
large 
memory 
since 
it 
needs 
to 
hold 
all 
the 
in-­‐memory 
data 
structures 
10. If 
the 
NN 
is 
lost 
the 
data 
in 
the 
file 
systems 
can’t 
be 
accessed 
8
FS 
Meta 
Data 
Change 
Management 
At 
Start-­‐up 
Periodically 
Fsimage 
(MetaData) 
EditFile 
(Journal) 
Secondary 
NameNode 
Fsimage_1 
(MetaData) 
EditFile_1 
(MetaData) 
Fsimage 
(MetaData) 
EditFile 
(Journal) 
NameNode 
Fsimage_1 
(MetaData) 
EditFile_1 
(MetaData) 
1. When 
HDFS 
is 
up 
and 
running 
changes 
to 
file 
system 
metadata 
are 
stored 
in 
Edit 
files 
2. When 
NN 
starts 
it 
looks 
for 
EditFiles 
in 
the 
system 
and 
merges 
the 
content 
with 
the 
fsimage 
on 
the 
disk 
3. The 
merging 
process 
creates 
new 
fsimage 
and 
edifile. 
Also 
the 
process 
discards 
the 
old 
fsimage 
& 
edit 
files. 
4. Since 
the 
edit 
files 
can 
be 
large 
for 
a 
very 
acJve 
HDFS 
cluster, 
the 
NN 
start-­‐up 
will 
take 
a 
long 
Jme 
5. Secondary 
name 
node 
at 
regular 
interval 
or 
aier 
a 
certain 
edifile 
size, 
merges 
the 
edit 
file 
and 
fsimage 
file 
6. The 
merge 
process 
creates 
a 
new 
fsimage 
file 
and 
an 
edit 
file. 
The 
secondary 
NN 
copies 
the 
new 
fsimage 
file 
back 
to 
NN 
7. This 
will 
reduce 
the 
NN 
start-­‐up 
process 
and 
also 
the 
fsimage 
can 
be 
used 
if 
there 
is 
a 
failure 
in 
the 
NN 
server 
to 
restore 
9
HDFS 
-­‐ 
Data 
Node 
Name 
Node 
MetaData 
BlockMap 
Data 
Node 
Heart 
Beat 
/ 
Block 
map 
Data 
Node 
Data 
Node 
1. Data 
nodes 
stores 
blocks 
of 
data 
for 
each 
file 
stored 
in 
HDFS 
and 
the 
default 
clock 
size 
is 
128 
MB 
2. Blocks 
of 
data 
is 
replicated 
n 
Jmes 
and 
by 
default 
it 
is 
3 
Jmes 
3. Data 
node 
periodically 
sends 
a 
heartbeat 
to 
the 
name 
node 
to 
inform 
NN 
that 
it 
is 
alive 
4. If 
NN 
doesn’t 
receive 
a 
heart 
beat 
, 
it 
will 
mark 
the 
DN 
as 
dead 
and 
stops 
sending 
further 
requests 
to 
the 
DN 
5. Also 
in 
periodic 
intervals, 
data 
node 
sends 
out 
a 
block 
map 
which 
includes 
all 
the 
file 
blocks 
it 
stores 
6. When 
a 
DN 
is 
dead, 
all 
the 
files 
for 
which 
blocks 
were 
stored 
in 
the 
DN 
will 
get 
marked 
as 
under 
replicated 
7. NN 
will 
recJfy 
under 
replicaJon 
by 
replicaJng 
the 
blocks 
to 
other 
data 
nodes 
10
Ensuring 
Data 
Integrity 
• Through 
replicaJon/replicaJon 
assurance 
– First 
replica 
closer 
to 
client 
node 
– Second 
replica 
on 
a 
different 
rack 
– Third 
replica 
on 
the 
rack 
as 
the 
second 
replica 
• File 
system 
checks 
run 
manually 
• Block 
scanning 
over 
a 
period 
of 
Jme 
• Storing 
checksums 
along 
with 
block 
data 
11
Permission 
and 
Quotas 
• File 
and 
directories 
use 
much 
of 
POSIX 
model 
– Associated 
with 
an 
owner 
and 
a 
group 
– Permission 
for 
owner, 
group 
and 
others 
– r 
for 
read, 
w 
for 
append 
to 
files 
– r 
for 
lisJng 
files, 
w 
for 
delete/create 
files 
in 
dirs 
– x 
to 
access 
child 
directories 
– Stciky 
bit 
on 
dirs 
prevents 
deleJons 
by 
others 
– User 
idenJficaJon 
can 
be 
simple 
(OS) 
or 
Kerberos 
12
Permission 
and 
Quotas 
• Quota 
for 
number 
of 
files 
– Name 
quota 
– dfsadmin 
-­‐setQuota 
<N> 
<dir>...<dir> 
– dfsadmin 
-­‐clrSpaceQuota 
<dir>...<dir> 
• Quota 
on 
the 
size 
of 
data 
– Space 
quota 
can 
be 
set 
to 
restrict 
space 
usage 
– dfsadmin 
-­‐setSpaceQuota 
<N> 
<dir>...<dir> 
• Replicated 
data 
also 
consumes 
quota 
– dfsadmin 
-­‐clrSpaceQuota 
<dir>...<dir> 
• ReporJng 
– fs 
-­‐count 
-­‐q 
<dir>...<dir> 
13
HDFS 
snapshot 
• No 
copy 
of 
data 
blocks. 
Only 
the 
metadata 
(block 
list 
and 
file 
names) 
are 
copied 
• Allow 
snapshot 
on 
a 
directory 
– hdfs 
dfsadmin 
–allowSnapshot 
<path> 
• Create 
snapshot 
– hdfs 
dfs 
–createSnapshot 
<path> 
[<name>] 
– Default 
name 
is 
‘s’+Jmestamp 
• Verify 
snapshot 
– hadoop 
fs 
–ls 
<path>/.snapshot 
• Directory 
with 
snapshot 
can’t 
be 
deleted 
or 
renamed 
• Disallow 
snapshot 
– hdfs 
dfsadmin 
–disallowSnapshot 
<path> 
– All 
exisJng 
snapshot 
need 
to 
be 
deleted 
before 
disallow 
• Delete 
snapshot 
– hdfs 
dfs 
–deleteSnapshot 
<path> 
<name> 
• Rename 
snapshot 
– hdfs 
dfs 
–renameSnapshot 
<path> 
<oldname> 
<newname> 
• Snapshot 
differences 
– hdfs 
snapshotDiff 
<path> 
<starJng 
snapshot 
name> 
<ending 
snapshot 
name> 
• List 
all 
snap 
shoaable 
directories 
– hdfs 
lsSnapshoaableDir 
14
HDFS 
back-­‐up 
using 
snapshot 
• Create 
a 
snapshot 
on 
the 
source 
cluster 
• Perform 
a 
“distcp” 
of 
the 
snapshot 
to 
backup 
cluster 
• Create 
a 
snapshot 
of 
the 
copy 
on 
the 
backup 
cluster 
• Cleanup 
any 
old 
back-­‐up 
copies 
to 
comply 
with 
the 
enterprise 
retenJon 
policy 
• The 
reverse 
can 
be 
followed 
to 
recover 
data 
from 
the 
backup 
– Data 
need 
to 
be 
removed 
on 
the 
producJon 
cluster 
before 
the 
restore 
– During 
deleJon 
–skipTrash 
opJon 
of 
“rm” 
will 
help 
reduce 
space 
usage 
15
distcp 
• Tool 
to 
perform 
inter 
and 
intra 
cluster 
copy 
of 
data 
• UJlizes 
mapreduce 
to 
perform 
the 
copy 
• It 
can 
be 
used 
to 
– Copy 
data 
with 
in 
a 
cluster 
– Copy 
data 
between 
clusters 
– Copy 
files 
or 
directories 
– Copy 
data 
from 
mulJple 
sources 
• Can 
be 
used 
to 
create 
a 
backup 
cluster 
• Starts 
up 
containers 
on 
both 
source 
and 
target 
• Consumes 
network 
traffic 
between 
clusters 
• Need 
to 
be 
scheduled 
at 
appropriate 
Jme 
• Can 
control 
resource 
uJlizaJon 
using 
parameters 
16
distcp 
• Hadoop 
distcp 
[opJons] 
<srcURL> 
… 
<srcURL> 
<destURL> 
– Source 
path 
need 
to 
be 
obsolute 
– DesJnaJon 
directory 
will 
be 
created 
if 
not 
present 
– “update” 
opJon 
will 
update 
only 
the 
changed 
files 
– “skipcrccheck” 
opJon 
to 
disable 
checksum 
– “overwrite” 
opJon 
is 
to 
overwrite 
exisJng 
files 
which 
is 
by 
default 
skipped 
if 
present 
– “delete” 
opJon 
to 
delete 
files 
in 
desJnaJon 
which 
are 
not 
in 
source 
– “hip” 
fs 
need 
to 
be 
used 
to 
copy 
between 
different 
versions 
of 
HDFS 
– “m” 
opJon 
to 
specify 
the 
number 
of 
mappers 
17
distcp 
– “atomic” 
opJon 
to 
commit 
all 
changes 
or 
none 
– “async” 
to 
run 
distcp 
async 
i.e. 
non 
blocking 
– “i” 
opJon 
to 
ignore 
failures 
during 
copy 
– “log” 
directory 
on 
DFS 
where 
logs 
to 
be 
saved 
– “p 
[rbugp]” 
preserve 
file 
status 
as 
source 
– “strategy 
[staJc|dynamic]” 
– “bandwidth 
[MB]” 
bandwidth 
per 
map 
in 
MB 
18
HDFS 
JAVA 
APIs 
Func@on 
API 
Directory 
Create 
FileSystem.mkdirs(path, 
permission) 
Directory 
Rename/Move 
FileSystem.rename(oldpath, 
newpath) 
Directory 
Delete 
FileSystem.delete(path, 
true) 
File 
Create 
FileSystem.createNewFile(path) 
File 
Open 
FileSystem.open(path) 
File 
Read 
FSDataInputStream.read* 
File 
Write 
FSDataOutputStream.write* 
File 
Rename/Move 
FileSystem.rename(oldpath, 
newpath) 
File 
Delete 
FileSystem.delete(path, 
false) 
File 
Append 
FileSystem.append(path) 
File 
Seek 
FSDataInputStream.seek(int) 
File 
System 
FileSystem.get(conf) 
19
HDFS 
FederaJon 
HDFS 
without 
Federa@on 
Diagram 
source: 
hadoop.apache.org 
– 
JIRA 
HDFS-­‐1052 
HDFS 
with 
Federa@on 
-­‐ Namespace 
management 
and 
block 
management 
together 
-­‐ Supports 
one 
name 
space 
-­‐ Hinders 
scalability 
above 
400 
0 
nodes 
-­‐ Doesn’t 
support 
some 
of 
mulJ-­‐tenancy 
requirements 
-­‐ Namespace 
management 
and 
block 
management 
seperated 
-­‐ Block 
management 
can 
be 
on 
its 
node 
on 
its 
own 
-­‐ Supports 
more 
than 
one 
name 
space/NN 
-­‐ Scalable 
beyond 
4000 
nodes 
and 
millions 
of 
rows 
-­‐ Can 
deploy 
mulJ-­‐tenancy 
requirements 
like 
NN 
for 
specific 
departments 
and 
isoloaJon 
-­‐ A 
namespace 
and 
block 
pool 
is 
called 
namespace 
volume 
20
Enabling 
HDFS 
federaJon 
• IdenJfy 
an 
unique 
cluster 
id 
• IdenJfy 
nameservices 
ids 
for 
name 
nodes 
• Add 
dfs.nameservices 
to 
hdfs-­‐site.xml 
– Comma 
separated 
nameservice(ns) 
names 
• Update 
hdfs-­‐site.xml 
on 
all 
NNs 
and 
DNs 
– dfs.namenode.rpc-­‐address.ns 
– dfs.namenode.hap-­‐address.ns 
– dfs.namenode.servicerpc-­‐address.ns 
– dfs.namenode.haps-­‐address.ns 
– dfs.namenode.secondaryhap-­‐address.ns 
– dfs.namenode.backup.address.ns 
• Format 
all 
name 
nodes 
using 
the 
cluster 
id 
– hdfs 
namenode 
–format 
–clusterId 
<cluster 
id> 
21
HDFS 
Rack 
Awareness 
• Rack 
awareness 
enables 
efficient 
data 
placement 
– Data 
writes 
– Balancer 
– Decommissioning/commissioning 
of 
nodes 
• Each 
node 
is 
assigned 
to 
a 
rack 
(rack 
id) 
– Rack 
id 
is 
used 
in 
the 
path 
names 
• Data 
placement 
– First 
block 
is 
placed 
near 
client 
or 
random 
node/rack 
– Second 
replica 
of 
block 
placed 
in 
a 
second 
rack 
node 
– Third 
replica 
is 
placed 
in 
a 
different 
node 
in 
second 
rack 
– If 
HDFS 
is 
not 
rack 
aware, 
second 
and 
third 
replicas 
are 
placed 
at 
random 
nodes 
22
Enabling 
HDFS 
Rack 
Awareness 
• Update 
core-­‐site.xml 
with 
topology 
properJes 
– topology.script.file.name 
• Script 
can 
be 
shell 
script, 
Python, 
Java 
– topology.script.number.args 
• Copy 
the 
script 
to 
the 
conf 
directory 
• Distribute 
the 
script 
and 
core-­‐site.xml 
• Stop 
and 
start 
the 
name 
node 
• Verify 
that 
the 
racks 
are 
recognized 
by 
HDFS 
– hdfs 
fsck 
-­‐racks 
23
HDFS 
NFS 
Gateway 
• Allows 
HDFS 
HDFS 
to 
be 
mounted 
as 
part 
of 
local 
FS 
• Stateless 
daemon 
translates 
NFS 
to 
HDFS 
access 
protocol 
• DFSClient 
is 
part 
of 
the 
gateway 
daemon 
– Averages 
30 
MB/S 
for 
writes 
• MulJple 
gateways 
can 
be 
used 
for 
scalability 
• Gateway 
machine 
requires 
all 
soiware 
and 
configs 
like 
HDFS 
client 
– Gateway 
can 
be 
run 
on 
HDFS 
cluster 
nodes 
• Random 
writes 
are 
not 
supported 
HDFS 
Cluster 
NN 
DN 
DN 
DN 
NFS 
Gateway 
(DFSClient) 
RPC 
HDFS 
Client 
NFSv3 
24
HDFS 
NFS 
Gateway 
ConfiguraJon 
• Consists 
of 
two 
daemons 
– portmap 
and 
nfs3 
• ConfiguraJon 
– dfs.nodename.access.precision; 
3600000 
(1 
Hr) 
• Name 
node 
restart 
– dfs.nfs3.dump.dir; 
dir 
to 
store 
out 
of 
seq 
data 
• Enough 
space 
to 
store 
data 
for 
all 
concurrent 
file 
writes 
• Use 
NFS 
for 
smaller 
file 
transfers 
in 
the 
order 
of 
1 
GB 
– dfs.nfs.exports.allowed.hosts; 
Host 
access 
• client*.abc.com 
r;client*.xyc.com 
rw 
– Update 
log4j.properJes 
file 
• log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG 
• log4j.logger.org.apache.hadoop.oncrpc=DEBUG 
25
HDFS 
NFS 
Gateway 
ConfiguraJon 
• Stop 
nfs 
& 
rpcbind 
services 
provided 
by 
OS 
– service 
nfs 
stop 
– service 
rpcbind 
stop 
• Start 
hadoop 
portmap 
as 
root 
– hadoop-­‐daemon.sh 
start 
portmap 
– To 
stop 
use 
“stop” 
instead 
of 
“start” 
as 
parameter 
• Start 
mountd 
and 
nfsd 
as 
user 
starJng 
HDFS 
– hadoop-­‐daemon.sh 
start 
nfs3 
– To 
stop 
use 
“stop” 
instead 
of 
“start” 
as 
parameter 
26
HDFS 
NFS 
Gateway 
ConfiguraJon 
• Validate 
NFS 
services 
are 
running 
– rpcinfo 
–p 
$nfs_server_ip 
– Should 
see 
entries 
for 
mountd, 
portmapper 
& 
nfs 
• Verify 
HDFS 
namespace 
is 
exported 
for 
mount 
– showmount 
–e 
$nfs_server_ip 
– Should 
see 
the 
export 
list 
• Mount 
HDFS 
on 
client 
– Create 
a 
mount 
point 
as 
root; 
– Change 
ownership 
of 
mount 
point 
to 
user 
running 
HDFS 
cluster 
– mount 
-­‐t 
nfs 
-­‐o 
vers=3,proto=tcp,nolock 
$nfs_server:/ 
$mount_point 
– Client 
sends 
UID 
of 
user 
to 
NFS 
– NFS 
looks 
up 
the 
username 
for 
UID 
and 
uses 
it 
to 
access 
HDFS 
– User 
name 
and 
UID 
should 
be 
the 
same 
on 
client 
and 
NFS 
27
HDFS 
Name 
Node 
HA 
Ac@ve 
Name 
Node 
Passive 
Name 
Node 
Shared 
Storage 
ZKFC 
ZKFC 
Zookeeper 
Quorum 
ZK 
ZK 
ZK 
HB 
HB 
Data 
Node 
Data 
Node 
Data 
Node 
• Zookeeper 
does 
failure 
detecJon 
and 
helps 
acJve 
name 
node 
elecJon 
• ZKFC 
ZooKeeper 
Failover 
Controller 
• monitors 
the 
health 
of 
name 
node 
• Holds 
a 
session 
open 
on 
ZK 
and 
a 
lock 
for 
acJve 
NN 
• If 
no 
other 
NN 
holds 
zlock, 
it 
tries 
to 
acquire 
it 
to 
make 
NN 
acJve 
• Share 
storage 
can 
be 
NFS 
mount 
or 
quorum 
of 
journal 
storage 
• Fencing 
is 
defined 
to 
prevent 
split 
brain 
scenario 
of 
two 
NN 
wriJng 
28
HDFS 
NN 
HA 
ConfiguraJon 
• Define 
dfs.nameservices 
– Nameservice 
Id 
• Define 
dfs.namenodes.[nameservice 
id] 
– Comma 
separated 
list 
of 
name 
nodes 
• Define 
dfs.namenode.rpc-­‐address.[Nameservice 
Id].[Name 
node 
Id] 
– Fully 
qualified 
machine 
name 
and 
port 
• Define 
dfs.namenode.hap-­‐address.[nameservice 
ID].[name 
node 
ID] 
– Fully 
qualified 
machine 
name 
and 
port 
• Define 
dfs.namenode.shared.edits.dir 
– For 
nfs: 
file:///mnt/... 
– For 
Journal 
nodes: 
qjournal://node1:8485;node2. 
com:8485; 
• Define 
dfs.client.failover.proxy.provider.[nameservice 
ID] 
– org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider 
• Define 
dfs.ha.fencing.methods 
– sshfence; 
requires 
password 
less 
ssh 
into 
name 
nodes 
from 
one 
another 
– shell 
• Define 
fs.defaultFS 
the 
HA 
enabled 
logical 
URI 
• For 
journal 
nodes 
– Define 
dfs.journalnode.edits.dir 
where 
edits 
and 
other 
local 
states 
used 
by 
JNs 
will 
be 
stored 
29
HDFS 
NN 
HA 
ConfiguraJon 
• Define 
dfs.ha.automaJc-­‐failover.enabled 
– Set 
to 
true 
• Define 
ha.zookeeper.quorum 
– Host 
and 
port 
of 
ZK 
• To 
enable 
HA 
in 
an 
exisJng 
cluster 
– Run 
hdfs 
dfsadmin 
–safemode 
enter 
– Run 
hdfs 
dfsadmin 
–saveNamespace 
– Stop 
HDFS 
cluster 
dfs-­‐stop.sh 
– Start 
journal 
node 
daemons 
hdfs-­‐daemon.sh 
journalnode 
– Run 
hdfs 
zkfc 
–formatZK 
on 
exisJng 
NN 
– Run 
hdfs 
–iniEalizeSharedEdits 
on 
exisJng 
NN 
– Run 
hdfs 
namenode 
–bootstrapStandBy 
on 
new 
NN 
– Delete 
secondary 
name 
node 
– Start 
HDFS 
cluster 
dfs-­‐start.sh 
30
hdfs 
haadmin 
• -­‐ns 
<nameserviceId> 
• -­‐transiJonToAcJve 
<serviceId> 
• -­‐transiJonToStandby 
<serviceId> 
• -­‐failover 
<serviceId> 
<serviceId> 
– [-­‐-­‐forcefence] 
[-­‐-­‐forceacJve] 
• -­‐getServiceState 
<serviceId> 
• -­‐checkHealth 
<serviceId> 
• -­‐help 
<command> 
31
hdfs 
dfsadmin 
• -­‐report 
• -­‐safemode 
[enter|leave|get|wait] 
• -­‐finalizeUpgrade 
• -­‐refreshNodes 
uses 
files 
defined 
in 
dfs.hosts 
& 
dfs.host.exclude 
• -­‐report 
• -­‐lsr 
• -­‐upgradeProgress 
status 
• -­‐metasave 
• -­‐setQuota 
<quota>/-­‐clrQuota 
<dirname>…<dirname> 
• -­‐setRep 
[-­‐w] 
<w> 
<path/file> 
32
hdfs 
fsck 
• hdfs 
fsck 
[opJons] 
path 
– move 
– 
delete 
– openforwrite 
– files 
– blocks 
– locaJons 
– racks 
33
Balancer 
• start-­‐balancer.sh 
– policy 
datanode|blockpool 
– threshold 
<percentage>; 
default 
10% 
– dfs.balancer.bandwidthPerSec 
specified 
in 
bytes 
• Default 
1 
MB/sec 
34
Adding 
New 
Nodes 
• Add 
node 
address 
to 
dfs.hosts 
file 
– Update 
mapred.hosts 
file 
if 
using 
mapred 
• Update 
namenode 
with 
the 
new 
set 
of 
nodes 
– hadoop 
dfsadmin 
–refreshNodes 
– Update 
jobtracker 
with 
the 
new 
set 
of 
nodes 
• hadoop 
mradmin 
–refreshNodes 
• Update 
“slaves” 
file 
with 
the 
new 
node 
names 
• Start 
new 
datanodes 
(and 
tasktrackers) 
• Check 
the 
availability 
of 
the 
new 
nodes 
in 
UI 
• Run 
balancer 
so 
that 
data 
is 
distributed 
35
Decommissioning 
Nodes 
• Add 
node 
address 
to 
exclude 
file 
– dfs.hosts.exclude 
– mapred.hosts.exclude 
• Update 
namenode 
(and 
jobtracker) 
– hadoop 
dfsadmin 
–refreshNodes 
– hadoop 
mradmin 
–refreshNodes 
• Verify 
all 
the 
nodes 
are 
decommissioned 
(UI) 
• Remove 
nodes 
from 
dfs.hosts 
(and 
mapred.hosts) 
file 
• Update 
namenode 
(and 
jobtracker) 
• Remove 
nodes 
from 
the 
“slaves” 
file 
36
HDFS 
Upgrade 
• No 
file 
system 
layout 
change 
– Install 
new 
version 
of 
HDFS 
(and 
MapReduce) 
– Stop 
the 
old 
daemons 
– Update 
the 
configuraJon 
files 
– Start 
the 
new 
daemons 
– Update 
clients 
to 
use 
the 
new 
libraries 
– Remove 
the 
old 
install 
and 
the 
configuraJon 
files 
– Update 
applicaJon 
code 
for 
deprecated 
APIs 
37
HDFS 
Upgrade 
• With 
file 
system 
layout 
changes 
– When 
there 
is 
a 
layout 
change 
NN 
will 
not 
start 
– Run 
FSCK 
to 
make 
sure 
that 
the 
FS 
is 
healthy 
– Keep 
a 
copy 
of 
the 
FSCK 
output 
for 
verificaJon 
– Clear 
HDFS 
and 
map 
reduce 
temporary 
files 
– Make 
sure 
that 
any 
previous 
upgrade 
is 
finalized 
– Shutdown 
map 
reduce 
and 
kill 
orphaned 
task 
– Shutdown 
HDFS 
and 
make 
a 
copy 
of 
NN 
directories 
– Install 
new 
versions 
of 
HDFS 
and 
Map 
Reduce 
– Start 
HDFS 
with 
–upgrade 
opJon 
• Start-­‐dfs.sh 
–upgrade 
– Once 
the 
upgrade 
is 
complete 
perform 
manual 
spot 
checks 
• hadoop 
dfsadmin 
–upgradeProcess 
status 
– Start 
Map 
Reduce 
– Rollback 
or 
Finalize 
the 
upgrade 
• stop-­‐dfs.sh; 
start-­‐dfs.sh 
–rollback 
• hadoop 
dfsadmin 
-­‐finalizeUpgrade 
38
Key 
Parameters 
Parameter 
Descrip@on 
Default 
Value 
dfs.blocksize 
File 
block 
size 
128 
MB 
dfs.replicaJon 
File 
block 
replicaJon 
count 
3 
dfs.datanode.numblocks 
No 
of 
blocks 
aier 
which 
new 
sub 
directory 
gets 
created 
in 
DN 
io.bytes.per.checksum 
Number 
of 
data 
bytes 
for 
which 
check 
sum 
is 
calculated 
512 
dfs.datanode.scan.period.hours 
Timeframe 
in 
hours 
to 
complete 
block 
scanning 
504 
(3 
weeks) 
39
40 
bnair@asquareb.com 
blog.asquareb.com 
https://github.com/bijugs 
@gsbiju

More Related Content

What's hot

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdfvishal choudhary
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Integrations - Thinking outside the box - Presentation Engage 2023 in Amsterdam
Integrations - Thinking outside the box - Presentation Engage 2023 in AmsterdamIntegrations - Thinking outside the box - Presentation Engage 2023 in Amsterdam
Integrations - Thinking outside the box - Presentation Engage 2023 in AmsterdamRoland Driesen
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
vSAN architecture components
vSAN architecture componentsvSAN architecture components
vSAN architecture componentsDavid Pasek
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...OpenStack Korea Community
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with linksChris Testa-O'Neill
 
Introduction - vSphere 5 High Availability (HA)
Introduction - vSphere 5 High Availability (HA)Introduction - vSphere 5 High Availability (HA)
Introduction - vSphere 5 High Availability (HA)Eric Sloof
 
VMware: The Fastest Path to Hybrid Cloud
VMware: The Fastest Path to Hybrid CloudVMware: The Fastest Path to Hybrid Cloud
VMware: The Fastest Path to Hybrid CloudAmazon Web Services
 
Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015Duncan Epping
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guideslidedown1
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueDatabricks
 

What's hot (20)

NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdf
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Integrations - Thinking outside the box - Presentation Engage 2023 in Amsterdam
Integrations - Thinking outside the box - Presentation Engage 2023 in AmsterdamIntegrations - Thinking outside the box - Presentation Engage 2023 in Amsterdam
Integrations - Thinking outside the box - Presentation Engage 2023 in Amsterdam
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
vSAN architecture components
vSAN architecture componentsvSAN architecture components
vSAN architecture components
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Snowflake Architecture
Snowflake ArchitectureSnowflake Architecture
Snowflake Architecture
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
 
Introduction - vSphere 5 High Availability (HA)
Introduction - vSphere 5 High Availability (HA)Introduction - vSphere 5 High Availability (HA)
Introduction - vSphere 5 High Availability (HA)
 
VMware: The Fastest Path to Hybrid Cloud
VMware: The Fastest Path to Hybrid CloudVMware: The Fastest Path to Hybrid Cloud
VMware: The Fastest Path to Hybrid Cloud
 
Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 

Viewers also liked

Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk ManagementBiju Nair
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsBiju Nair
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 

Viewers also liked (10)

Concurrency
ConcurrencyConcurrency
Concurrency
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk Management
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentals
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 

Similar to HDFS User Reference

Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxsunithachphd
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemsrikanthhadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsDataWorks Summit
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 

Similar to HDFS User Reference (20)

HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptx
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
DFSNov1.pptx
DFSNov1.pptxDFSNov1.pptx
DFSNov1.pptx
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once Semantics
 
Hdfs
HdfsHdfs
Hdfs
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hdfs
HdfsHdfs
Hdfs
 

More from Biju Nair

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleBiju Nair
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And OperationsBiju Nair
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka ReferenceBiju Nair
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBaseBiju Nair
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalBiju Nair
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixBiju Nair
 
Chef patterns
Chef patternsChef patterns
Chef patternsBiju Nair
 

More from Biju Nair (7)

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scale
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka Reference
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBase
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-final
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache Phoenix
 
Chef patterns
Chef patternsChef patterns
Chef patterns
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

HDFS User Reference

  • 2. Local File System FileA FileB FileC Inode-­‐n Inode-­‐m Inode-­‐p Inode-­‐n File A0ributes Block 0 Address Block 1 Address Block 2 Address Block 3 Address Inode-­‐m File A0ributes Block 0 Address Block 1 Address Block 2 Address Inode-­‐m File A0ributes Block 0 Address Block 1 Address Block 2 Address Block 3 Address DISK Directory MBR Par@@on Table Boot block Super block Free Space Trk i-­‐nodes Root dir File block size is based on what is used when FS is defined 2
  • 3. Hadoop Distributed File System Master Host (NN) FileA FileB FileC HDFS Directory H1:blk0, H2:blk1 H3:blk0,H1:blk1 H2:blk0;H3:blk1 Local File System File DISK Local FS Directory FileA0 FileB1 Inode-­‐x Inode-­‐y Host 1 Local FS Directory FileA1 FileC0 Inode-­‐a Inode-­‐n Host 2 Local FS Directory FileB0 FileC1 Inode-­‐r Inode-­‐c Host 3 In-­‐x In-­‐y In-­‐a In-­‐n In-­‐r In-­‐c DISK DISK DISK Files created are of size equal to the HDFS blksize 3
  • 4. HDFS HDFS Data Transfer Protocol Date Node ${dfs.data.dir}/current/VERSION /blk_<id_1>,/blk_<id_1>.meta /... /subdir2/ HTTP/S Data Node ${dfs.data.dir}/current/VERSION /blk_<id_1>,/blk_<id_1>.meta /... /subdir2/ Data Node ${dfs.data.dir}/current/VERSION /blk_<id_1>,/blk_<id_1>.meta /... /subdir2/ Name Node ${dfs.name.dir}/current/VERSION /edits,/fsimage,/fs@me Secondary Name Node ${fs.checkpoint.dir}/current/VERSION /edits,/fsimage,/fs@me Hadoop CLI WebHDFS HDFS UI Data Nodes RPC HTTP RPC 4
  • 5. HDFS Config Files and Ports • Default configuraJon – core-­‐default.xml, hdfs-­‐default.xml • Site specific configuraJon – core-­‐site.xml, hdfs-­‐site.xml under conf • ConfiguraJon of daemon processes – hadoop-­‐env.sh under conf • List of slave/data nodes – “slaves” file under conf • Ports – Default NN UI port 50070 (HTTP), 50470 (HTTPS) – Default NN Port 8020/9000 – Default DN UI port 50075 (HTTP), 50475(HTTPS) 5
  • 6. HDFS -­‐ Write Flow Client Name Node Namespace MetaData Blockmap (Fsimage Edit files) Data Node Data Node Data Node 1 2 3 4 5 8 7 7 6 6 1. Client requests to open a file to write through fs.create() call. This will overwrite exisJng file. 2. Name node responds with a lease to the file path 3. Client writes to local and when data reaches block size, requests Name Node for write 4. Name Node responds with a new blockid and the desJnaJon data nodes for write and replicaJon 5. Client sends the first data node the data and the checksum generated on the data to be wriaen 6. First data node writes the data and checksum and in parallel pipelines the replicaJons to other DN 7. Each data node where the data is replicated responds back with success /failure to the first DN 8. First data node in turn informs to the Name node that the write request for the block is complete which in turn will update its block map Note: There can be only one write at a Jme on a file 6
  • 7. HDFS -­‐ Read Flow Client Name Node Namespace MetaData Blockmap (Fsimage Edit files) Data Node Data Node Data Node 1 2 3 4 5 6 7 1. Client requests to open a file to read through fs.open() call 2. Name node responds with a lease to the file path 3. Client requests for read the data in the file 4. Name Node responds with block ids in sequence and the corresponding data nodes 5. Client reaches out directly to the DNs for each block of data in the file 6. When DNs sends back data along with check sum, client performs a checksum verificaJon by generaJng a checksum 7. If the checksum verificaJon fails client reaches out to other DNs where the re is a replicaJon 7
  • 8. HDFS -­‐ Name Node Fsimage (MetaData) Namespace Ownership Permissions Create/mod/Access Jme, Is hidden EditFile (Journal) Changes to metadata BlockMap (In-­‐memory) Details on File blocks and where they are stored 1. Name node manages the HDFS file system using the fsimage/edifile and block-­‐map data structures 2. Fsimage and edifile data are stored on disk. When hdfs starts they are read, merged and stored in-­‐memory 3. Data nodes sends details about the blocks they are storing when it starts and also at regular intervals 4. Name node uses the block map send by data nodes to build the BlockMap data structure data 5. The BlockMap data is used when requests for reads on files comes to the FileSystem 6. Also the BlockMap data is used to idenJfy the under/over replicated files which requires correcJon 7. At no point Name node stores data locally or directly involved in transferring data from files to client 8. The client reading/wriJng data receives meta data details from NN and then directly works with DNs 9. Name nodes require large memory since it needs to hold all the in-­‐memory data structures 10. If the NN is lost the data in the file systems can’t be accessed 8
  • 9. FS Meta Data Change Management At Start-­‐up Periodically Fsimage (MetaData) EditFile (Journal) Secondary NameNode Fsimage_1 (MetaData) EditFile_1 (MetaData) Fsimage (MetaData) EditFile (Journal) NameNode Fsimage_1 (MetaData) EditFile_1 (MetaData) 1. When HDFS is up and running changes to file system metadata are stored in Edit files 2. When NN starts it looks for EditFiles in the system and merges the content with the fsimage on the disk 3. The merging process creates new fsimage and edifile. Also the process discards the old fsimage & edit files. 4. Since the edit files can be large for a very acJve HDFS cluster, the NN start-­‐up will take a long Jme 5. Secondary name node at regular interval or aier a certain edifile size, merges the edit file and fsimage file 6. The merge process creates a new fsimage file and an edit file. The secondary NN copies the new fsimage file back to NN 7. This will reduce the NN start-­‐up process and also the fsimage can be used if there is a failure in the NN server to restore 9
  • 10. HDFS -­‐ Data Node Name Node MetaData BlockMap Data Node Heart Beat / Block map Data Node Data Node 1. Data nodes stores blocks of data for each file stored in HDFS and the default clock size is 128 MB 2. Blocks of data is replicated n Jmes and by default it is 3 Jmes 3. Data node periodically sends a heartbeat to the name node to inform NN that it is alive 4. If NN doesn’t receive a heart beat , it will mark the DN as dead and stops sending further requests to the DN 5. Also in periodic intervals, data node sends out a block map which includes all the file blocks it stores 6. When a DN is dead, all the files for which blocks were stored in the DN will get marked as under replicated 7. NN will recJfy under replicaJon by replicaJng the blocks to other data nodes 10
  • 11. Ensuring Data Integrity • Through replicaJon/replicaJon assurance – First replica closer to client node – Second replica on a different rack – Third replica on the rack as the second replica • File system checks run manually • Block scanning over a period of Jme • Storing checksums along with block data 11
  • 12. Permission and Quotas • File and directories use much of POSIX model – Associated with an owner and a group – Permission for owner, group and others – r for read, w for append to files – r for lisJng files, w for delete/create files in dirs – x to access child directories – Stciky bit on dirs prevents deleJons by others – User idenJficaJon can be simple (OS) or Kerberos 12
  • 13. Permission and Quotas • Quota for number of files – Name quota – dfsadmin -­‐setQuota <N> <dir>...<dir> – dfsadmin -­‐clrSpaceQuota <dir>...<dir> • Quota on the size of data – Space quota can be set to restrict space usage – dfsadmin -­‐setSpaceQuota <N> <dir>...<dir> • Replicated data also consumes quota – dfsadmin -­‐clrSpaceQuota <dir>...<dir> • ReporJng – fs -­‐count -­‐q <dir>...<dir> 13
  • 14. HDFS snapshot • No copy of data blocks. Only the metadata (block list and file names) are copied • Allow snapshot on a directory – hdfs dfsadmin –allowSnapshot <path> • Create snapshot – hdfs dfs –createSnapshot <path> [<name>] – Default name is ‘s’+Jmestamp • Verify snapshot – hadoop fs –ls <path>/.snapshot • Directory with snapshot can’t be deleted or renamed • Disallow snapshot – hdfs dfsadmin –disallowSnapshot <path> – All exisJng snapshot need to be deleted before disallow • Delete snapshot – hdfs dfs –deleteSnapshot <path> <name> • Rename snapshot – hdfs dfs –renameSnapshot <path> <oldname> <newname> • Snapshot differences – hdfs snapshotDiff <path> <starJng snapshot name> <ending snapshot name> • List all snap shoaable directories – hdfs lsSnapshoaableDir 14
  • 15. HDFS back-­‐up using snapshot • Create a snapshot on the source cluster • Perform a “distcp” of the snapshot to backup cluster • Create a snapshot of the copy on the backup cluster • Cleanup any old back-­‐up copies to comply with the enterprise retenJon policy • The reverse can be followed to recover data from the backup – Data need to be removed on the producJon cluster before the restore – During deleJon –skipTrash opJon of “rm” will help reduce space usage 15
  • 16. distcp • Tool to perform inter and intra cluster copy of data • UJlizes mapreduce to perform the copy • It can be used to – Copy data with in a cluster – Copy data between clusters – Copy files or directories – Copy data from mulJple sources • Can be used to create a backup cluster • Starts up containers on both source and target • Consumes network traffic between clusters • Need to be scheduled at appropriate Jme • Can control resource uJlizaJon using parameters 16
  • 17. distcp • Hadoop distcp [opJons] <srcURL> … <srcURL> <destURL> – Source path need to be obsolute – DesJnaJon directory will be created if not present – “update” opJon will update only the changed files – “skipcrccheck” opJon to disable checksum – “overwrite” opJon is to overwrite exisJng files which is by default skipped if present – “delete” opJon to delete files in desJnaJon which are not in source – “hip” fs need to be used to copy between different versions of HDFS – “m” opJon to specify the number of mappers 17
  • 18. distcp – “atomic” opJon to commit all changes or none – “async” to run distcp async i.e. non blocking – “i” opJon to ignore failures during copy – “log” directory on DFS where logs to be saved – “p [rbugp]” preserve file status as source – “strategy [staJc|dynamic]” – “bandwidth [MB]” bandwidth per map in MB 18
  • 19. HDFS JAVA APIs Func@on API Directory Create FileSystem.mkdirs(path, permission) Directory Rename/Move FileSystem.rename(oldpath, newpath) Directory Delete FileSystem.delete(path, true) File Create FileSystem.createNewFile(path) File Open FileSystem.open(path) File Read FSDataInputStream.read* File Write FSDataOutputStream.write* File Rename/Move FileSystem.rename(oldpath, newpath) File Delete FileSystem.delete(path, false) File Append FileSystem.append(path) File Seek FSDataInputStream.seek(int) File System FileSystem.get(conf) 19
  • 20. HDFS FederaJon HDFS without Federa@on Diagram source: hadoop.apache.org – JIRA HDFS-­‐1052 HDFS with Federa@on -­‐ Namespace management and block management together -­‐ Supports one name space -­‐ Hinders scalability above 400 0 nodes -­‐ Doesn’t support some of mulJ-­‐tenancy requirements -­‐ Namespace management and block management seperated -­‐ Block management can be on its node on its own -­‐ Supports more than one name space/NN -­‐ Scalable beyond 4000 nodes and millions of rows -­‐ Can deploy mulJ-­‐tenancy requirements like NN for specific departments and isoloaJon -­‐ A namespace and block pool is called namespace volume 20
  • 21. Enabling HDFS federaJon • IdenJfy an unique cluster id • IdenJfy nameservices ids for name nodes • Add dfs.nameservices to hdfs-­‐site.xml – Comma separated nameservice(ns) names • Update hdfs-­‐site.xml on all NNs and DNs – dfs.namenode.rpc-­‐address.ns – dfs.namenode.hap-­‐address.ns – dfs.namenode.servicerpc-­‐address.ns – dfs.namenode.haps-­‐address.ns – dfs.namenode.secondaryhap-­‐address.ns – dfs.namenode.backup.address.ns • Format all name nodes using the cluster id – hdfs namenode –format –clusterId <cluster id> 21
  • 22. HDFS Rack Awareness • Rack awareness enables efficient data placement – Data writes – Balancer – Decommissioning/commissioning of nodes • Each node is assigned to a rack (rack id) – Rack id is used in the path names • Data placement – First block is placed near client or random node/rack – Second replica of block placed in a second rack node – Third replica is placed in a different node in second rack – If HDFS is not rack aware, second and third replicas are placed at random nodes 22
  • 23. Enabling HDFS Rack Awareness • Update core-­‐site.xml with topology properJes – topology.script.file.name • Script can be shell script, Python, Java – topology.script.number.args • Copy the script to the conf directory • Distribute the script and core-­‐site.xml • Stop and start the name node • Verify that the racks are recognized by HDFS – hdfs fsck -­‐racks 23
  • 24. HDFS NFS Gateway • Allows HDFS HDFS to be mounted as part of local FS • Stateless daemon translates NFS to HDFS access protocol • DFSClient is part of the gateway daemon – Averages 30 MB/S for writes • MulJple gateways can be used for scalability • Gateway machine requires all soiware and configs like HDFS client – Gateway can be run on HDFS cluster nodes • Random writes are not supported HDFS Cluster NN DN DN DN NFS Gateway (DFSClient) RPC HDFS Client NFSv3 24
  • 25. HDFS NFS Gateway ConfiguraJon • Consists of two daemons – portmap and nfs3 • ConfiguraJon – dfs.nodename.access.precision; 3600000 (1 Hr) • Name node restart – dfs.nfs3.dump.dir; dir to store out of seq data • Enough space to store data for all concurrent file writes • Use NFS for smaller file transfers in the order of 1 GB – dfs.nfs.exports.allowed.hosts; Host access • client*.abc.com r;client*.xyc.com rw – Update log4j.properJes file • log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG • log4j.logger.org.apache.hadoop.oncrpc=DEBUG 25
  • 26. HDFS NFS Gateway ConfiguraJon • Stop nfs & rpcbind services provided by OS – service nfs stop – service rpcbind stop • Start hadoop portmap as root – hadoop-­‐daemon.sh start portmap – To stop use “stop” instead of “start” as parameter • Start mountd and nfsd as user starJng HDFS – hadoop-­‐daemon.sh start nfs3 – To stop use “stop” instead of “start” as parameter 26
  • 27. HDFS NFS Gateway ConfiguraJon • Validate NFS services are running – rpcinfo –p $nfs_server_ip – Should see entries for mountd, portmapper & nfs • Verify HDFS namespace is exported for mount – showmount –e $nfs_server_ip – Should see the export list • Mount HDFS on client – Create a mount point as root; – Change ownership of mount point to user running HDFS cluster – mount -­‐t nfs -­‐o vers=3,proto=tcp,nolock $nfs_server:/ $mount_point – Client sends UID of user to NFS – NFS looks up the username for UID and uses it to access HDFS – User name and UID should be the same on client and NFS 27
  • 28. HDFS Name Node HA Ac@ve Name Node Passive Name Node Shared Storage ZKFC ZKFC Zookeeper Quorum ZK ZK ZK HB HB Data Node Data Node Data Node • Zookeeper does failure detecJon and helps acJve name node elecJon • ZKFC ZooKeeper Failover Controller • monitors the health of name node • Holds a session open on ZK and a lock for acJve NN • If no other NN holds zlock, it tries to acquire it to make NN acJve • Share storage can be NFS mount or quorum of journal storage • Fencing is defined to prevent split brain scenario of two NN wriJng 28
  • 29. HDFS NN HA ConfiguraJon • Define dfs.nameservices – Nameservice Id • Define dfs.namenodes.[nameservice id] – Comma separated list of name nodes • Define dfs.namenode.rpc-­‐address.[Nameservice Id].[Name node Id] – Fully qualified machine name and port • Define dfs.namenode.hap-­‐address.[nameservice ID].[name node ID] – Fully qualified machine name and port • Define dfs.namenode.shared.edits.dir – For nfs: file:///mnt/... – For Journal nodes: qjournal://node1:8485;node2. com:8485; • Define dfs.client.failover.proxy.provider.[nameservice ID] – org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider • Define dfs.ha.fencing.methods – sshfence; requires password less ssh into name nodes from one another – shell • Define fs.defaultFS the HA enabled logical URI • For journal nodes – Define dfs.journalnode.edits.dir where edits and other local states used by JNs will be stored 29
  • 30. HDFS NN HA ConfiguraJon • Define dfs.ha.automaJc-­‐failover.enabled – Set to true • Define ha.zookeeper.quorum – Host and port of ZK • To enable HA in an exisJng cluster – Run hdfs dfsadmin –safemode enter – Run hdfs dfsadmin –saveNamespace – Stop HDFS cluster dfs-­‐stop.sh – Start journal node daemons hdfs-­‐daemon.sh journalnode – Run hdfs zkfc –formatZK on exisJng NN – Run hdfs –iniEalizeSharedEdits on exisJng NN – Run hdfs namenode –bootstrapStandBy on new NN – Delete secondary name node – Start HDFS cluster dfs-­‐start.sh 30
  • 31. hdfs haadmin • -­‐ns <nameserviceId> • -­‐transiJonToAcJve <serviceId> • -­‐transiJonToStandby <serviceId> • -­‐failover <serviceId> <serviceId> – [-­‐-­‐forcefence] [-­‐-­‐forceacJve] • -­‐getServiceState <serviceId> • -­‐checkHealth <serviceId> • -­‐help <command> 31
  • 32. hdfs dfsadmin • -­‐report • -­‐safemode [enter|leave|get|wait] • -­‐finalizeUpgrade • -­‐refreshNodes uses files defined in dfs.hosts & dfs.host.exclude • -­‐report • -­‐lsr • -­‐upgradeProgress status • -­‐metasave • -­‐setQuota <quota>/-­‐clrQuota <dirname>…<dirname> • -­‐setRep [-­‐w] <w> <path/file> 32
  • 33. hdfs fsck • hdfs fsck [opJons] path – move – delete – openforwrite – files – blocks – locaJons – racks 33
  • 34. Balancer • start-­‐balancer.sh – policy datanode|blockpool – threshold <percentage>; default 10% – dfs.balancer.bandwidthPerSec specified in bytes • Default 1 MB/sec 34
  • 35. Adding New Nodes • Add node address to dfs.hosts file – Update mapred.hosts file if using mapred • Update namenode with the new set of nodes – hadoop dfsadmin –refreshNodes – Update jobtracker with the new set of nodes • hadoop mradmin –refreshNodes • Update “slaves” file with the new node names • Start new datanodes (and tasktrackers) • Check the availability of the new nodes in UI • Run balancer so that data is distributed 35
  • 36. Decommissioning Nodes • Add node address to exclude file – dfs.hosts.exclude – mapred.hosts.exclude • Update namenode (and jobtracker) – hadoop dfsadmin –refreshNodes – hadoop mradmin –refreshNodes • Verify all the nodes are decommissioned (UI) • Remove nodes from dfs.hosts (and mapred.hosts) file • Update namenode (and jobtracker) • Remove nodes from the “slaves” file 36
  • 37. HDFS Upgrade • No file system layout change – Install new version of HDFS (and MapReduce) – Stop the old daemons – Update the configuraJon files – Start the new daemons – Update clients to use the new libraries – Remove the old install and the configuraJon files – Update applicaJon code for deprecated APIs 37
  • 38. HDFS Upgrade • With file system layout changes – When there is a layout change NN will not start – Run FSCK to make sure that the FS is healthy – Keep a copy of the FSCK output for verificaJon – Clear HDFS and map reduce temporary files – Make sure that any previous upgrade is finalized – Shutdown map reduce and kill orphaned task – Shutdown HDFS and make a copy of NN directories – Install new versions of HDFS and Map Reduce – Start HDFS with –upgrade opJon • Start-­‐dfs.sh –upgrade – Once the upgrade is complete perform manual spot checks • hadoop dfsadmin –upgradeProcess status – Start Map Reduce – Rollback or Finalize the upgrade • stop-­‐dfs.sh; start-­‐dfs.sh –rollback • hadoop dfsadmin -­‐finalizeUpgrade 38
  • 39. Key Parameters Parameter Descrip@on Default Value dfs.blocksize File block size 128 MB dfs.replicaJon File block replicaJon count 3 dfs.datanode.numblocks No of blocks aier which new sub directory gets created in DN io.bytes.per.checksum Number of data bytes for which check sum is calculated 512 dfs.datanode.scan.period.hours Timeframe in hours to complete block scanning 504 (3 weeks) 39
  • 40. 40 bnair@asquareb.com blog.asquareb.com https://github.com/bijugs @gsbiju