SlideShare a Scribd company logo
1 of 54
Download to read offline
1
Improving	
  Hadoop	
  Cluster	
  
Performance	
  via	
  Linux	
  Configura:on	
  
2014	
  Hadoop	
  Summit	
  –	
  San	
  Jose,	
  California	
  
	
  
Alex	
  Moundalexis	
  
	
  
@technmsg	
  
2
Tips	
  from	
  a	
  Former	
  SA	
  
Click	
  to	
  edit	
  Master	
  :tle	
  style	
  
CC	
  BY	
  2.0	
  /	
  Richard	
  Bumgardner	
  
Been	
  there,	
  done	
  that.	
  
4
Tips	
  from	
  a	
  Former	
  SA	
  Field	
  Guy	
  
Click	
  to	
  edit	
  Master	
  :tle	
  style	
  
CC	
  BY	
  2.0	
  /	
  Alex	
  Moundalexis	
  
Home	
  sweet	
  home.	
  
6
Tips	
  from	
  a	
  Former	
  SA	
  Field	
  Guy	
  
Easy	
  steps	
  to	
  take…	
  	
  
7
Tips	
  from	
  a	
  Former	
  SA	
  Field	
  Guy	
  
Easy	
  steps	
  to	
  take…	
  that	
  most	
  people	
  don’t.	
  
What	
  This	
  Talk	
  Isn’t	
  About	
  
•  Deploying	
  
•  Puppet,	
  Chef,	
  Ansible,	
  homegrown	
  scripts,	
  intern	
  labor	
  
•  Sizing	
  &	
  Tuning	
  
•  Depends	
  heavily	
  on	
  data	
  and	
  workload	
  
•  Coding	
  
•  Unless	
  you	
  count	
  STDOUT	
  redirec:on	
  
•  Algorithms	
  
•  I	
  suck	
  at	
  math,	
  but	
  we’ll	
  try	
  some	
  mul:plica:on	
  later	
  
8
9	
  
“	
  The	
  answer	
  to	
  most	
  
Hadoop	
  ques:ons	
  is	
  it	
  
depends.”	
  
So	
  What	
  ARE	
  We	
  Talking	
  About?	
  
•  Seven	
  simple	
  things	
  
•  Quick	
  
•  Safe	
  
•  Viable	
  for	
  most	
  environments	
  and	
  use	
  cases	
  
•  Iden:fy	
  issue,	
  then	
  offer	
  solu:on	
  
•  Note:	
  Commands	
  run	
  as	
  root	
  or	
  sudo	
  
10
11
Bad	
  news,	
  best	
  not	
  to…	
  
1.	
  Swapping	
  
Swapping	
  
•  A	
  form	
  of	
  memory	
  management	
  
•  When	
  OS	
  runs	
  low	
  on	
  memory…	
  
•  write	
  blocks	
  to	
  disk	
  
•  use	
  now-­‐free	
  memory	
  for	
  other	
  things	
  
•  read	
  blocks	
  back	
  into	
  memory	
  from	
  disk	
  when	
  needed	
  
•  Also	
  known	
  as	
  paging	
  
12
Swapping	
  
•  Problem:	
  Disks	
  are	
  slow,	
  especially	
  to	
  seek	
  
•  Hadoop	
  is	
  about	
  maximizing	
  IO	
  
•  spend	
  less	
  :me	
  acquiring	
  data	
  
•  operate	
  on	
  data	
  in	
  place	
  
•  large	
  streaming	
  reads/writes	
  from	
  disk	
  
•  Memory	
  usage	
  is	
  limited	
  within	
  JVM	
  
•  we	
  should	
  be	
  able	
  to	
  manage	
  our	
  memory	
  
13
Disable	
  Swap	
  in	
  Kernel	
  
•  Well,	
  as	
  much	
  as	
  possible.	
  
•  Immediate:	
  
	
  #	
  echo	
  0	
  >	
  /proc/sys/vm/swappiness	
  
•  Persist	
  ager	
  reboot:	
  
	
  #	
  echo	
  “vm.swappiness	
  =	
  0”	
  >>	
  /etc/sysctl.conf	
  
	
  
14
Swapping	
  Peculiari:es	
  
•  Behavior	
  varies	
  based	
  on	
  Linux	
  kernel	
  
•  CentOS	
  6.4+	
  /	
  Ubuntu	
  10.10+	
  
•  For	
  you	
  kernel	
  gurus,	
  that’s	
  Linux	
  2.6.32-­‐303+	
  
•  Prior	
  
•  We	
  don’t	
  swap,	
  except	
  to	
  avoid	
  OOM	
  condi:on.	
  
•  Ager	
  
•  We	
  don’t	
  swap,	
  ever.	
  
•  Details:	
  hkp://:ny.cloudera.com/noswap	
  
15
16
Disable	
  this	
  too.	
  
2.	
  File	
  Access	
  Time	
  
File	
  Access	
  Time	
  
•  Linux	
  tracks	
  access	
  :me	
  
•  writes	
  to	
  disk	
  even	
  if	
  all	
  you	
  did	
  was	
  read	
  
•  Problem	
  
•  more	
  disk	
  seeks	
  
•  HDFS	
  is	
  write-­‐once,	
  read-­‐many	
  
•  NameNode	
  tracks	
  access	
  informa:on	
  for	
  HDFS	
  
17
Don’t	
  Track	
  Access	
  Time	
  
•  Mount	
  volumes	
  with	
  noatime	
  op:on	
  
•  In	
  /etc/fstab:	
  
	
  
/dev/sdc	
  /data01	
  ext3	
  defaults,noatime	
  0	
  	
  
•  Note:	
  noatime	
  assumes	
  nodirtime	
  as	
  well	
  
•  What	
  about	
  relatime?	
  
•  Faster	
  than	
  atime	
  but	
  slower	
  than	
  noatime	
  
•  No	
  reboot	
  required	
  
•  #	
  mount	
  -­‐o	
  remount	
  /data01	
  
18
19
Reclaim	
  it,	
  impress	
  your	
  bosses!	
  
3.	
  Root	
  Reserved	
  Space	
  
Root	
  Reserved	
  Space	
  
•  EXT3/4	
  reserve	
  5%	
  of	
  disk	
  for	
  root-­‐owned	
  files	
  
•  On	
  an	
  OS	
  disk,	
  sure	
  
•  System	
  logs,	
  kernel	
  panics,	
  etc	
  
20
Click	
  to	
  edit	
  Master	
  :tle	
  style	
  
CC	
  BY	
  2.0	
  /	
  Alex	
  Moundalexis	
  
Disks	
  used	
  to	
  be	
  much	
  smaller,	
  right?	
  
Do	
  The	
  Math	
  
•  Conserva:ve	
  
•  5%	
  of	
  1	
  TB	
  disk	
  =	
  46	
  GB	
  
•  5	
  data	
  disks	
  per	
  server	
  =	
  230	
  GB	
  
•  5	
  servers	
  per	
  rack	
  =	
  1.15	
  TB	
  
•  Quasi-­‐Aggressive	
  
•  5%	
  of	
  4	
  TB	
  disk	
  =	
  186	
  GB	
  
•  12	
  data	
  disks	
  per	
  server	
  =	
  2.23	
  TB	
  
•  18	
  servers	
  per	
  rack	
  =	
  40.1	
  TB	
  
•  That’s	
  a	
  LOT	
  of	
  unused	
  storage!	
  
22
Root	
  Reserved	
  Space	
  
•  On	
  a	
  Hadoop	
  data	
  disk,	
  no	
  root-­‐owned	
  files	
  
•  When	
  crea:ng	
  a	
  par::on	
  
	
  #	
  mkfs.ext3	
  –m	
  0	
  /dev/sdc	
  
•  On	
  exis:ng	
  par::ons	
  
	
  #	
  tune2fs	
  -­‐m	
  0	
  /dev/sdc	
  
•  0	
  is	
  safe,	
  1	
  is	
  for	
  the	
  ultra-­‐paranoid	
  
23
24
Turn	
  it	
  on,	
  already!	
  
4.	
  Name	
  Service	
  Cache	
  Daemon	
  
Name	
  Service	
  Cache	
  Daemon	
  
•  Daemon	
  that	
  caches	
  name	
  service	
  requests	
  
•  Passwords	
  
•  Groups	
  
•  Hosts	
  
•  Helps	
  weather	
  network	
  hiccups	
  
•  Helps	
  more	
  with	
  high	
  latency	
  LDAP,	
  NIS,	
  NIS+	
  
•  Small	
  footprint	
  
•  Zero	
  configura:on	
  required	
  
25
Name	
  Service	
  Cache	
  Daemon	
  
•  Hadoop	
  nodes	
  
•  largely	
  a	
  network-­‐based	
  applica:on	
  
•  on	
  the	
  network	
  constantly	
  
•  issue	
  lots	
  of	
  DNS	
  lookups,	
  especially	
  HBase	
  &	
  distcp	
  
•  can	
  thrash	
  DNS	
  servers	
  
•  Reducing	
  latency	
  of	
  service	
  requests?	
  Smart.	
  
•  Reducing	
  impact	
  on	
  shared	
  infrastructure?	
  Smart.	
  
26
Name	
  Service	
  Cache	
  Daemon	
  
•  Turn	
  it	
  on,	
  let	
  it	
  work,	
  leave	
  it	
  alone:	
  
#	
  chkconfig	
  -­‐-­‐level	
  345	
  nscd	
  on	
  
#	
  service	
  nscd	
  start	
  	
  
•  Check	
  on	
  it	
  later:	
  
#	
  nscd	
  -­‐g	
  
•  Unless	
  using	
  Red	
  Hat	
  SSSD;	
  modify	
  ncsd	
  config	
  first!	
  
•  Don’t	
  use	
  nscd	
  to	
  cache	
  passwd,	
  group,	
  or	
  netgroup	
  
•  Red	
  Hat,	
  Using	
  NSCD	
  with	
  SSSD.	
  hkp://goo.gl/68HTMQ	
  
27
28
Not	
  a	
  problem,	
  un:l	
  they	
  are.	
  
5.	
  File	
  Handle	
  Limits	
  
File	
  Handle	
  Limits	
  
•  Kernel	
  refers	
  to	
  files	
  via	
  a	
  handle	
  
•  Also	
  called	
  descriptors	
  
•  Linux	
  is	
  a	
  mul:-­‐user	
  system	
  
•  File	
  handles	
  protect	
  the	
  system	
  from	
  
•  Poor	
  coding	
  
•  Malicious	
  users	
  
•  Pictures	
  of	
  cats	
  on	
  the	
  Internet	
  
29
30	
  
Microsog	
  Office	
  EULA.	
  Really.	
  
java.io.FileNotFoundExcep:on:	
  (Too	
  many	
  open	
  files)	
  
File	
  Handle	
  Limits	
  
•  Linux	
  defaults	
  usually	
  not	
  enough	
  
•  Increase	
  maximum	
  open	
  files	
  (default	
  1024)	
  
#	
  echo	
  hdfs	
  –	
  nofile	
  32768	
  >>	
  /etc/security/limits.conf	
  
#	
  echo	
  mapred	
  –	
  nofile	
  32768	
  >>	
  /etc/security/limits.conf	
  
#	
  echo	
  hbase	
  –	
  nofile	
  32768	
  >>	
  /etc/security/limits.conf	
  
•  Bonus:	
  Increase	
  maximum	
  processes	
  too	
  
#	
  echo	
  hdfs	
  –	
  nproc	
  32768	
  >>	
  /etc/security/limits.conf	
  
#	
  echo	
  mapred	
  –	
  nproc	
  32768	
  >>	
  /etc/security/limits.conf	
  
#	
  echo	
  hbase	
  –	
  nproc	
  32768	
  >>	
  /etc/security/limits.conf	
  
•  Note:	
  Cloudera	
  Manager	
  will	
  do	
  this	
  for	
  you.	
  
31
32
Don’t	
  be	
  tempted	
  to	
  share,	
  even	
  on	
  monster	
  disks.	
  
6.	
  Dedicated	
  Disk	
  for	
  OS	
  and	
  Logs	
  
The	
  Situa:on	
  in	
  Easy	
  Steps	
  
1.  Your	
  new	
  server	
  has	
  a	
  dozen	
  1	
  TB	
  disks	
  
2.  Eleven	
  disks	
  are	
  used	
  to	
  store	
  data	
  
3.  One	
  disk	
  is	
  used	
  for	
  the	
  OS	
  
•  20	
  GB	
  for	
  the	
  OS	
  
•  980	
  GB	
  sits	
  unused	
  	
  
4.  Someone	
  asks	
  “can	
  we	
  store	
  data	
  there	
  too?”	
  
5.  Seems	
  reasonable,	
  lots	
  of	
  space…	
  “OK,	
  why	
  not.”	
  
Sound	
  familiar?	
  
33
34	
  
Microsog	
  Office	
  EULA.	
  Really.	
  
I	
  don’t	
  understand	
  it,	
  there’s	
  	
  
no	
  consistency	
  to	
  these	
  run	
  >mes!	
  
No	
  Love	
  for	
  Shared	
  Disk	
  
•  Our	
  quest	
  for	
  data	
  gets	
  interrupted	
  a	
  lot:	
  
•  OS	
  opera:ons	
  
•  OS	
  logs	
  
•  Hadoop	
  logging,	
  quite	
  chaky	
  
•  Hadoop	
  execu:on	
  
•  userspace	
  execu:on	
  
•  Disk	
  seeks	
  are	
  slow,	
  remember?	
  
35
Dedicated	
  Disk	
  for	
  OS	
  and	
  Logs	
  
•  At	
  install	
  :me	
  	
  	
  
•  Disk	
  0,	
  OS	
  &	
  logs	
  
•  Disk	
  1-­‐n,	
  Hadoop	
  data	
  
•  Ager	
  install,	
  more	
  complicated	
  effort,	
  requires	
  
manual	
  HDFS	
  block	
  rebalancing:	
  
1.  Take	
  down	
  HDFS	
  
•  If	
  you	
  can	
  do	
  it	
  in	
  under	
  10	
  minutes,	
  just	
  the	
  DataNode	
  
2.  Move	
  or	
  distribute	
  blocks	
  from	
  disk0/dir	
  to	
  disk[1-­‐n]/dir	
  
3.  Remove	
  dir	
  from	
  HDFS	
  config	
  (dfs.data.dir)	
  
4.  Start	
  HDFS	
  
36
37
Sane,	
  both	
  forward	
  and	
  reverse.	
  
7.	
  Name	
  Resolu:on	
  
Name	
  Resolu:on	
  Op:ons	
  
1.  Hosts	
  file,	
  if	
  you	
  must	
  
2.  DNS,	
  much	
  preferred	
  
	
  
	
  
38
Name	
  Resolu:on	
  with	
  Hosts	
  File	
  
•  Set	
  canonical	
  names	
  properly	
  	
  
•  Right	
  
	
  10.1.1.1	
   	
  r01m01.cluster.org	
  r01m01 	
  master1	
  
	
  10.1.1.2	
   	
  r01w01.cluster.org	
  r01w01 	
  worker1	
  
•  Wrong	
  
	
  10.1.1.1	
   	
  r01m01 	
  r01m01.cluster.org	
  master1	
  
	
  10.1.1.2	
   	
  r01w01 	
  r01w01.cluster.org	
  worker1	
  
39
Name	
  Resolu:on	
  with	
  Hosts	
  File	
  
•  Set	
  loopback	
  address	
  properly	
  
•  Ensure	
  127.0.0.1	
  resolves	
  to	
  localhost,	
  NOT	
  hostname	
  
•  Right	
  
	
  127.0.0.1 	
  localhost	
  
•  Wrong	
  
	
  127.0.0.1 	
  r01m01	
  
40
Name	
  Resolu:on	
  with	
  DNS	
  
•  Forward	
  
•  Reverse	
  
•  Hostname	
  should	
  MATCH	
  the	
  FQDN	
  in	
  DNS	
  
41
This	
  Is	
  What	
  You	
  Ought	
  to	
  See	
  
42
Name	
  Resolu:on	
  Errata	
  
•  Mismatches?	
  Expect	
  odd	
  results.	
  
•  Problems	
  star:ng	
  DataNodes	
  
•  Non-­‐FQDN	
  in	
  Web	
  UI	
  links	
  
•  Security	
  features	
  are	
  extra	
  sensi:ve	
  to	
  FQDN	
  
•  Errors	
  so	
  common	
  that	
  link	
  to	
  FAQ	
  is	
  included	
  in	
  logs!	
  
•  hkp://wiki.apache.org/hadoop/UnknownHost	
  
•  Get	
  name	
  resolu:on	
  working	
  BEFORE	
  enabling	
  nscd!	
  
43
44
Time	
  to	
  take	
  out	
  your	
  camera	
  phones…	
  
Summary	
  
Summary	
  
1.  disable	
  vm.swappiness	
  
2.  data	
  disks:	
  mount	
  with	
  noatime	
  op:on	
  
3.  data	
  disks:	
  disable	
  root	
  reserve	
  space	
  
4.  enable	
  nscd	
  
5.  increase	
  file	
  handle	
  limits	
  
6.  use	
  dedicated	
  OS/logging	
  disk	
  
7.  sane	
  name	
  resolu:on	
  
hkp://:ny.cloudera.com/7steps	
  
45
Recommended	
  Reading	
  
•  Hadoop	
  Opera:ons	
  
hkp://amzn.to/1hDaN9B	
  
46
47
Preferably	
  related	
  to	
  the	
  talk…	
  
Ques:ons?	
  
48
Thank	
  You!	
  
Alex	
  Moundalexis	
  
	
  
@technmsg	
  
	
  
We’re	
  hiring,	
  kids!	
  Well,	
  not	
  kids.	
  
49
Because	
  we	
  had	
  enough	
  :me…	
  
8.	
  Bonus	
  Round	
  
Others	
  Things	
  to	
  Check	
  
•  Disk	
  IO	
  
•  hdparm	
  
•  #	
  hdparm	
  -­‐Tt	
  /dev/sdc	
  
•  Looking	
  for	
  at	
  least	
  70	
  MB/s	
  from	
  7200	
  RPM	
  disks	
  
•  Slower	
  could	
  indicate	
  a	
  failing	
  drive,	
  disk	
  controller,	
  array,	
  etc.	
  
•  dd	
  
•  hkp://romanrm.ru/en/dd-­‐benchmark	
  
50
Others	
  Things	
  to	
  Check	
  
•  Disable	
  Red	
  Hat	
  Transparent	
  Huge	
  Pages	
  (RH6+	
  Only)	
  
•  Can	
  reduce	
  elevated	
  CPU	
  usage	
  
•  In	
  rc.local:	
  
echo	
  never	
  >	
  /sys/kernel/mm/redhat_transparent_hugepage/defrag	
  
echo	
  never	
  >	
  /sys/kernel/mm/redhat_transparent_hugepage/enabled	
  
•  Reference:	
  Linux	
  6	
  Transparent	
  Huge	
  Pages	
  and	
  Hadoop	
  
Workloads,	
  hkp://goo.gl/WSF2qC	
  
51
Others	
  Things	
  to	
  Check	
  
•  Enable	
  Jumbo	
  Frames	
  
•  Only	
  if	
  your	
  network	
  infrastructure	
  supports	
  it!	
  
•  Can	
  easily	
  (and	
  arguably)	
  boost	
  throughput	
  by	
  10-­‐20%	
  
52
Others	
  Things	
  to	
  Check	
  
•  Enable	
  Jumbo	
  Frames	
  
•  Only	
  if	
  your	
  network	
  infrastructure	
  supports	
  it!	
  
•  Can	
  easily	
  (and	
  arguably)	
  boost	
  throughput	
  by	
  10-­‐20%	
  
•  Monitor	
  Everything	
  
•  How	
  else	
  will	
  you	
  know	
  what’s	
  happening?	
  
•  Nagios	
  
•  Ganglia	
  
53
54
Thank	
  You!	
  
Alex	
  Moundalexis	
  
	
  
@technmsg	
  
	
  
We’re	
  hiring,	
  kids!	
  Well,	
  not	
  kids.	
  

More Related Content

What's hot

Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedInAllen Wittenauer
 

What's hot (20)

Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
ha_module5
ha_module5ha_module5
ha_module5
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Hadoop 24/7
Hadoop 24/7Hadoop 24/7
Hadoop 24/7
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 

Viewers also liked

Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human TimeDataWorks Summit
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationYahoo Developer Network
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performanceDataWorks Summit
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
 
TEZ-8 UI Walkthrough
TEZ-8 UI WalkthroughTEZ-8 UI Walkthrough
TEZ-8 UI Walkthrought3rmin4t0r
 
TestDFSIO
TestDFSIOTestDFSIO
TestDFSIOhhyin
 
Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013
Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013
Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013Kanda Kassobo
 
Performance Tuning para o mercado financeiro
Performance Tuning para o mercado financeiroPerformance Tuning para o mercado financeiro
Performance Tuning para o mercado financeiroRodrigo Missiaggia
 
Admin e suas gambiarras
Admin e suas gambiarrasAdmin e suas gambiarras
Admin e suas gambiarrasDaniel Lara
 
Na Jornada da Virtualização para as Nuvens, como mantemos o controle?
Na Jornada da Virtualização para as Nuvens, como mantemos o controle?Na Jornada da Virtualização para as Nuvens, como mantemos o controle?
Na Jornada da Virtualização para as Nuvens, como mantemos o controle?Rodrigo Missiaggia
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14iwrigley
 

Viewers also liked (20)

Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Something about Kafka - Why Kafka is so fast
Something about Kafka - Why Kafka is so fastSomething about Kafka - Why Kafka is so fast
Something about Kafka - Why Kafka is so fast
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
TEZ-8 UI Walkthrough
TEZ-8 UI WalkthroughTEZ-8 UI Walkthrough
TEZ-8 UI Walkthrough
 
TestDFSIO
TestDFSIOTestDFSIO
TestDFSIO
 
Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013
Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013
Implementação_cluster alto_desempenho_fernando_eduardo_20090726-Imetro_2013
 
Performance Tuning para o mercado financeiro
Performance Tuning para o mercado financeiroPerformance Tuning para o mercado financeiro
Performance Tuning para o mercado financeiro
 
Admin e suas gambiarras
Admin e suas gambiarrasAdmin e suas gambiarras
Admin e suas gambiarras
 
Na Jornada da Virtualização para as Nuvens, como mantemos o controle?
Na Jornada da Virtualização para as Nuvens, como mantemos o controle?Na Jornada da Virtualização para as Nuvens, como mantemos o controle?
Na Jornada da Virtualização para as Nuvens, como mantemos o controle?
 
Performance tuning
Performance tuningPerformance tuning
Performance tuning
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
 

Similar to Hadoop Performance Linux Configuration

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesNETWAYS
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment StrategiesMongoDB
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudRogue Wave Software
 
The Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows AdminsThe Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows AdminsBill Malchisky Jr.
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsDataWorks Summit
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storageMarian Marinov
 
Hadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running SmoothlyHadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running SmoothlyMichael Arnold
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2ScribbleLive
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance TuningFromDual GmbH
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopLeons Petražickis
 
LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFsDocker, Inc.
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 

Similar to Hadoop Performance Linux Configuration (20)

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
 
The Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows AdminsThe Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows Admins
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
Tuning Linux for MongoDB
Tuning Linux for MongoDBTuning Linux for MongoDB
Tuning Linux for MongoDB
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
Deployment
DeploymentDeployment
Deployment
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
Hadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running SmoothlyHadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running Smoothly
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance Tuning
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFs
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 

More from Alex Moundalexis

More from Alex Moundalexis (8)

Powered by the Sun
Powered by the SunPowered by the Sun
Powered by the Sun
 
YARN
YARNYARN
YARN
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Introduction to Cloudera Impala
Introduction to Cloudera ImpalaIntroduction to Cloudera Impala
Introduction to Cloudera Impala
 
Many Hats at Cloudera
Many Hats at ClouderaMany Hats at Cloudera
Many Hats at Cloudera
 
Hue Visual Tour
Hue Visual TourHue Visual Tour
Hue Visual Tour
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldSearch in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Hadoop Performance Linux Configuration

  • 1. 1 Improving  Hadoop  Cluster   Performance  via  Linux  Configura:on   2014  Hadoop  Summit  –  San  Jose,  California     Alex  Moundalexis     @technmsg  
  • 2. 2 Tips  from  a  Former  SA  
  • 3. Click  to  edit  Master  :tle  style   CC  BY  2.0  /  Richard  Bumgardner   Been  there,  done  that.  
  • 4. 4 Tips  from  a  Former  SA  Field  Guy  
  • 5. Click  to  edit  Master  :tle  style   CC  BY  2.0  /  Alex  Moundalexis   Home  sweet  home.  
  • 6. 6 Tips  from  a  Former  SA  Field  Guy   Easy  steps  to  take…    
  • 7. 7 Tips  from  a  Former  SA  Field  Guy   Easy  steps  to  take…  that  most  people  don’t.  
  • 8. What  This  Talk  Isn’t  About   •  Deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  Sizing  &  Tuning   •  Depends  heavily  on  data  and  workload   •  Coding   •  Unless  you  count  STDOUT  redirec:on   •  Algorithms   •  I  suck  at  math,  but  we’ll  try  some  mul:plica:on  later   8
  • 9. 9   “  The  answer  to  most   Hadoop  ques:ons  is  it   depends.”  
  • 10. So  What  ARE  We  Talking  About?   •  Seven  simple  things   •  Quick   •  Safe   •  Viable  for  most  environments  and  use  cases   •  Iden:fy  issue,  then  offer  solu:on   •  Note:  Commands  run  as  root  or  sudo   10
  • 11. 11 Bad  news,  best  not  to…   1.  Swapping  
  • 12. Swapping   •  A  form  of  memory  management   •  When  OS  runs  low  on  memory…   •  write  blocks  to  disk   •  use  now-­‐free  memory  for  other  things   •  read  blocks  back  into  memory  from  disk  when  needed   •  Also  known  as  paging   12
  • 13. Swapping   •  Problem:  Disks  are  slow,  especially  to  seek   •  Hadoop  is  about  maximizing  IO   •  spend  less  :me  acquiring  data   •  operate  on  data  in  place   •  large  streaming  reads/writes  from  disk   •  Memory  usage  is  limited  within  JVM   •  we  should  be  able  to  manage  our  memory   13
  • 14. Disable  Swap  in  Kernel   •  Well,  as  much  as  possible.   •  Immediate:    #  echo  0  >  /proc/sys/vm/swappiness   •  Persist  ager  reboot:    #  echo  “vm.swappiness  =  0”  >>  /etc/sysctl.conf     14
  • 15. Swapping  Peculiari:es   •  Behavior  varies  based  on  Linux  kernel   •  CentOS  6.4+  /  Ubuntu  10.10+   •  For  you  kernel  gurus,  that’s  Linux  2.6.32-­‐303+   •  Prior   •  We  don’t  swap,  except  to  avoid  OOM  condi:on.   •  Ager   •  We  don’t  swap,  ever.   •  Details:  hkp://:ny.cloudera.com/noswap   15
  • 16. 16 Disable  this  too.   2.  File  Access  Time  
  • 17. File  Access  Time   •  Linux  tracks  access  :me   •  writes  to  disk  even  if  all  you  did  was  read   •  Problem   •  more  disk  seeks   •  HDFS  is  write-­‐once,  read-­‐many   •  NameNode  tracks  access  informa:on  for  HDFS   17
  • 18. Don’t  Track  Access  Time   •  Mount  volumes  with  noatime  op:on   •  In  /etc/fstab:     /dev/sdc  /data01  ext3  defaults,noatime  0     •  Note:  noatime  assumes  nodirtime  as  well   •  What  about  relatime?   •  Faster  than  atime  but  slower  than  noatime   •  No  reboot  required   •  #  mount  -­‐o  remount  /data01   18
  • 19. 19 Reclaim  it,  impress  your  bosses!   3.  Root  Reserved  Space  
  • 20. Root  Reserved  Space   •  EXT3/4  reserve  5%  of  disk  for  root-­‐owned  files   •  On  an  OS  disk,  sure   •  System  logs,  kernel  panics,  etc   20
  • 21. Click  to  edit  Master  :tle  style   CC  BY  2.0  /  Alex  Moundalexis   Disks  used  to  be  much  smaller,  right?  
  • 22. Do  The  Math   •  Conserva:ve   •  5%  of  1  TB  disk  =  46  GB   •  5  data  disks  per  server  =  230  GB   •  5  servers  per  rack  =  1.15  TB   •  Quasi-­‐Aggressive   •  5%  of  4  TB  disk  =  186  GB   •  12  data  disks  per  server  =  2.23  TB   •  18  servers  per  rack  =  40.1  TB   •  That’s  a  LOT  of  unused  storage!   22
  • 23. Root  Reserved  Space   •  On  a  Hadoop  data  disk,  no  root-­‐owned  files   •  When  crea:ng  a  par::on    #  mkfs.ext3  –m  0  /dev/sdc   •  On  exis:ng  par::ons    #  tune2fs  -­‐m  0  /dev/sdc   •  0  is  safe,  1  is  for  the  ultra-­‐paranoid   23
  • 24. 24 Turn  it  on,  already!   4.  Name  Service  Cache  Daemon  
  • 25. Name  Service  Cache  Daemon   •  Daemon  that  caches  name  service  requests   •  Passwords   •  Groups   •  Hosts   •  Helps  weather  network  hiccups   •  Helps  more  with  high  latency  LDAP,  NIS,  NIS+   •  Small  footprint   •  Zero  configura:on  required   25
  • 26. Name  Service  Cache  Daemon   •  Hadoop  nodes   •  largely  a  network-­‐based  applica:on   •  on  the  network  constantly   •  issue  lots  of  DNS  lookups,  especially  HBase  &  distcp   •  can  thrash  DNS  servers   •  Reducing  latency  of  service  requests?  Smart.   •  Reducing  impact  on  shared  infrastructure?  Smart.   26
  • 27. Name  Service  Cache  Daemon   •  Turn  it  on,  let  it  work,  leave  it  alone:   #  chkconfig  -­‐-­‐level  345  nscd  on   #  service  nscd  start     •  Check  on  it  later:   #  nscd  -­‐g   •  Unless  using  Red  Hat  SSSD;  modify  ncsd  config  first!   •  Don’t  use  nscd  to  cache  passwd,  group,  or  netgroup   •  Red  Hat,  Using  NSCD  with  SSSD.  hkp://goo.gl/68HTMQ   27
  • 28. 28 Not  a  problem,  un:l  they  are.   5.  File  Handle  Limits  
  • 29. File  Handle  Limits   •  Kernel  refers  to  files  via  a  handle   •  Also  called  descriptors   •  Linux  is  a  mul:-­‐user  system   •  File  handles  protect  the  system  from   •  Poor  coding   •  Malicious  users   •  Pictures  of  cats  on  the  Internet   29
  • 30. 30   Microsog  Office  EULA.  Really.   java.io.FileNotFoundExcep:on:  (Too  many  open  files)  
  • 31. File  Handle  Limits   •  Linux  defaults  usually  not  enough   •  Increase  maximum  open  files  (default  1024)   #  echo  hdfs  –  nofile  32768  >>  /etc/security/limits.conf   #  echo  mapred  –  nofile  32768  >>  /etc/security/limits.conf   #  echo  hbase  –  nofile  32768  >>  /etc/security/limits.conf   •  Bonus:  Increase  maximum  processes  too   #  echo  hdfs  –  nproc  32768  >>  /etc/security/limits.conf   #  echo  mapred  –  nproc  32768  >>  /etc/security/limits.conf   #  echo  hbase  –  nproc  32768  >>  /etc/security/limits.conf   •  Note:  Cloudera  Manager  will  do  this  for  you.   31
  • 32. 32 Don’t  be  tempted  to  share,  even  on  monster  disks.   6.  Dedicated  Disk  for  OS  and  Logs  
  • 33. The  Situa:on  in  Easy  Steps   1.  Your  new  server  has  a  dozen  1  TB  disks   2.  Eleven  disks  are  used  to  store  data   3.  One  disk  is  used  for  the  OS   •  20  GB  for  the  OS   •  980  GB  sits  unused     4.  Someone  asks  “can  we  store  data  there  too?”   5.  Seems  reasonable,  lots  of  space…  “OK,  why  not.”   Sound  familiar?   33
  • 34. 34   Microsog  Office  EULA.  Really.   I  don’t  understand  it,  there’s     no  consistency  to  these  run  >mes!  
  • 35. No  Love  for  Shared  Disk   •  Our  quest  for  data  gets  interrupted  a  lot:   •  OS  opera:ons   •  OS  logs   •  Hadoop  logging,  quite  chaky   •  Hadoop  execu:on   •  userspace  execu:on   •  Disk  seeks  are  slow,  remember?   35
  • 36. Dedicated  Disk  for  OS  and  Logs   •  At  install  :me       •  Disk  0,  OS  &  logs   •  Disk  1-­‐n,  Hadoop  data   •  Ager  install,  more  complicated  effort,  requires   manual  HDFS  block  rebalancing:   1.  Take  down  HDFS   •  If  you  can  do  it  in  under  10  minutes,  just  the  DataNode   2.  Move  or  distribute  blocks  from  disk0/dir  to  disk[1-­‐n]/dir   3.  Remove  dir  from  HDFS  config  (dfs.data.dir)   4.  Start  HDFS   36
  • 37. 37 Sane,  both  forward  and  reverse.   7.  Name  Resolu:on  
  • 38. Name  Resolu:on  Op:ons   1.  Hosts  file,  if  you  must   2.  DNS,  much  preferred       38
  • 39. Name  Resolu:on  with  Hosts  File   •  Set  canonical  names  properly     •  Right    10.1.1.1    r01m01.cluster.org  r01m01  master1    10.1.1.2    r01w01.cluster.org  r01w01  worker1   •  Wrong    10.1.1.1    r01m01  r01m01.cluster.org  master1    10.1.1.2    r01w01  r01w01.cluster.org  worker1   39
  • 40. Name  Resolu:on  with  Hosts  File   •  Set  loopback  address  properly   •  Ensure  127.0.0.1  resolves  to  localhost,  NOT  hostname   •  Right    127.0.0.1  localhost   •  Wrong    127.0.0.1  r01m01   40
  • 41. Name  Resolu:on  with  DNS   •  Forward   •  Reverse   •  Hostname  should  MATCH  the  FQDN  in  DNS   41
  • 42. This  Is  What  You  Ought  to  See   42
  • 43. Name  Resolu:on  Errata   •  Mismatches?  Expect  odd  results.   •  Problems  star:ng  DataNodes   •  Non-­‐FQDN  in  Web  UI  links   •  Security  features  are  extra  sensi:ve  to  FQDN   •  Errors  so  common  that  link  to  FAQ  is  included  in  logs!   •  hkp://wiki.apache.org/hadoop/UnknownHost   •  Get  name  resolu:on  working  BEFORE  enabling  nscd!   43
  • 44. 44 Time  to  take  out  your  camera  phones…   Summary  
  • 45. Summary   1.  disable  vm.swappiness   2.  data  disks:  mount  with  noatime  op:on   3.  data  disks:  disable  root  reserve  space   4.  enable  nscd   5.  increase  file  handle  limits   6.  use  dedicated  OS/logging  disk   7.  sane  name  resolu:on   hkp://:ny.cloudera.com/7steps   45
  • 46. Recommended  Reading   •  Hadoop  Opera:ons   hkp://amzn.to/1hDaN9B   46
  • 47. 47 Preferably  related  to  the  talk…   Ques:ons?  
  • 48. 48 Thank  You!   Alex  Moundalexis     @technmsg     We’re  hiring,  kids!  Well,  not  kids.  
  • 49. 49 Because  we  had  enough  :me…   8.  Bonus  Round  
  • 50. Others  Things  to  Check   •  Disk  IO   •  hdparm   •  #  hdparm  -­‐Tt  /dev/sdc   •  Looking  for  at  least  70  MB/s  from  7200  RPM  disks   •  Slower  could  indicate  a  failing  drive,  disk  controller,  array,  etc.   •  dd   •  hkp://romanrm.ru/en/dd-­‐benchmark   50
  • 51. Others  Things  to  Check   •  Disable  Red  Hat  Transparent  Huge  Pages  (RH6+  Only)   •  Can  reduce  elevated  CPU  usage   •  In  rc.local:   echo  never  >  /sys/kernel/mm/redhat_transparent_hugepage/defrag   echo  never  >  /sys/kernel/mm/redhat_transparent_hugepage/enabled   •  Reference:  Linux  6  Transparent  Huge  Pages  and  Hadoop   Workloads,  hkp://goo.gl/WSF2qC   51
  • 52. Others  Things  to  Check   •  Enable  Jumbo  Frames   •  Only  if  your  network  infrastructure  supports  it!   •  Can  easily  (and  arguably)  boost  throughput  by  10-­‐20%   52
  • 53. Others  Things  to  Check   •  Enable  Jumbo  Frames   •  Only  if  your  network  infrastructure  supports  it!   •  Can  easily  (and  arguably)  boost  throughput  by  10-­‐20%   •  Monitor  Everything   •  How  else  will  you  know  what’s  happening?   •  Nagios   •  Ganglia   53
  • 54. 54 Thank  You!   Alex  Moundalexis     @technmsg     We’re  hiring,  kids!  Well,  not  kids.