SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Hadoop	
  in	
  Virtual	
  Machines	
  
        Richard	
  McDougall,	
  VMware	
  
         Sanjay	
  Radia,	
  Hortonworks	
  
                        	
  
           Hadoop	
  Summit,	
  2012	
  
                         	
  
Part	
  1	
  
Say	
  What?	
  
•    VMs	
  will	
  just	
  add	
  overhead,	
  due	
  to	
  I/O	
  virt	
  
•    VMs	
  run	
  on	
  SAN,	
  we’re	
  all	
  about	
  local	
  disks	
  
•    Hadoop	
  does	
  it’s	
  own	
  cluster	
  management	
  
•    It’ll	
  do	
  resource	
  management	
  in	
  2.0	
  
•    And	
  even	
  HA	
  is	
  coming	
  to	
  Hadoop	
  

•  And…	
  what	
  is	
  the	
  point,	
  anyway?	
  
But	
  you’ve	
  been	
  asking…	
  
•  Can	
  I	
  virtualize	
  my	
  Hadoop,	
  so	
  that	
  I	
  can	
  make	
  
   it	
  easier,	
  quicker	
  to	
  get	
  a	
  cluster	
  up	
  and	
  
   running	
  
•  Is	
  it	
  possible	
  to	
  run	
  Hadoop	
  on	
  those	
  spare	
  
   machine	
  cycles	
  I	
  have	
  on	
  hundreds/thousands	
  
   of	
  nodes?	
  
•  Can	
  I	
  make	
  my	
  system	
  more	
  available	
  by	
  using	
  
   some	
  of	
  the	
  standard	
  HA	
  features?	
  
And	
  the	
  savvy	
  are	
  asking…	
  
•  Can	
  I	
  avoid	
  having	
  to	
  install	
  special	
  hardware	
  
   for	
  the	
  master	
  services,	
  like	
  name-­‐node,	
  job-­‐
   tracker?	
  
•  Can	
  I	
  dynamically	
  change	
  the	
  size	
  of	
  the	
  
   cluster	
  to	
  use	
  more	
  resources?	
  
•  Can	
  I	
  use	
  VM	
  isolaSon	
  to	
  increase	
  security	
  or	
  
   guard	
  against	
  resource-­‐intensive	
  neighbors?	
  
•  Is	
  it	
  feasible	
  to	
  provision	
  virtual-­‐clusters,	
  
   giving	
  out	
  one	
  each	
  to	
  a	
  business	
  unit?	
  
VirtualizaSon,	
  in	
  VMware’s	
  vSphere	
  

                                                                                File	
  	
  
                                                                                                 	
  
                                                          TCP/IP	
            System	
           Monitor	
  Emulates	
  Physical	
  
Guest	
           Guest	
  
                                                                                                 Devices:	
  CPU,	
  Memory,	
  I/O	
  



   Monitor	
                                       Monitor	
  
                                                                                                  CPU	
  is	
  controlled	
  by	
  scheduler	
  
                                                     Virtual	
  NIC	
      Virtual	
  SCSI	
      and	
  virtualized	
  by	
  monitor	
  
                                                                                                  	
  
                              	
  
                                     Memory	
  
VMkernel	
  
                 Scheduler           Manager	
     Virtual	
  Switch	
     File	
  System	
       Memory	
  is	
  allocated	
  by	
  the	
  
                                                                                                  VMkernel	
  and	
  virtualized	
  by	
  
                                                     NIC	
  Drivers	
      I/O	
  Drivers	
       the	
  monitor	
  


                                                                                                 Network	
  and	
  I/O	
  devices	
  are	
  
Physical	
                                                                                       emulated	
  and	
  proxied	
  though	
  
Hardware	
  
                                                                                                 naSve	
  device	
  drivers	
  
Ok,	
  so	
  first	
  what	
  about	
  the	
  concerns?	
  
•  Use	
  your	
  SAN?	
  	
  	
  …	
  if	
  you	
  want	
  to.	
  




         SAN	
  Storage	
                 NAS	
  Filers	
              Local	
  Storage	
  
                   	
                               	
                          	
  
   $2	
  -­‐	
  $10/Gigabyte	
      $1	
  -­‐	
  $5/Gigabyte	
        $0.05/Gigabyte	
  
                   	
                               	
                          	
  
             $1M	
  gets:	
                $1M	
  gets:	
                $1M	
  gets:	
  
         0.5Petabytes	
                  1	
  Petabyte	
               20	
  Petabytes	
  
    	
  1,000,000	
  IOPS	
          400,000	
  IOPS	
                10,000,000	
  IOPS	
  
          1Gbyte/sec	
                 2Gbyte/sec	
                   800	
  Gbytes/sec	
  
                                                                                	
  
Hadoop	
  Using	
  Local	
  Disks	
  

                                   Task	
  Tracker	
                           Datanode	
  
Other	
            Hadoop	
  
Workload	
         Virtual	
  
                   Machine	
  
                                                                    Ext4	
         Ext4	
      Ext4	
  




Virtualiza?on	
  Host	
            OS	
  Image	
  -­‐	
  VMDK	
     VMDK	
         VMDK	
     VMDK	
  



                     Shared	
  
                     Storage	
  
Hadoop	
  Perf	
  in	
  	
  a	
  VM	
  	
  
(RaSo	
  is	
  elapsed	
  Sme	
  to	
  physical,	
  Lower	
  Is	
  Becer)	
  
                           1.2	
  

                              1	
  
 Ra?o	
  to	
  Na?ve	
  




                           0.8	
  

                           0.6	
  

                           0.4	
                                                    1	
  VM	
  
                                                                                    2	
  VMs	
  
                           0.2	
  

                              0	
  
EvoluSon	
  of	
  Hadoop	
  on	
  VMs	
  
VM	
                               VM	
                                VM	
                   VM	
  

         Current	
  
         Hadoop:	
                           Compute	
                          T1	
                   T2	
  
         	
  
         Combined	
                 VM	
                               VM	
  
         Storage/                            Storage	
                          Storage	
  
         Compute	
  


Hadoop	
  in	
  VM	
                    Separate	
  Storage	
            Separate	
  Compute	
  Clusters	
  
-­‐  VM	
  lifecycle	
                  -­‐  Separate	
  compute	
       -­‐  Separate	
  virtual	
  clusters	
  
     determined	
                            from	
  data	
                   per	
  tenant	
  
     by	
  Datanode	
                   -­‐  ElasSc	
  compute	
         -­‐  Stronger	
  VM-­‐grade	
  security	
  
-­‐  NOT	
  ElasSc	
                    -­‐  Enable	
  shared	
               and	
  resource	
  isolaSon	
  
-­‐  Limited	
  to	
  Hadoop	
               workloads	
                 -­‐  Enable	
  deployment	
  of	
  
     MulS-­‐Tenancy	
                   -­‐  Raise	
  uSlizaSon	
             mulSple	
  Hadoop	
  runSme	
  	
  
                                                                              versions	
  
1.	
  Hadoop	
  Task	
  Tracker	
  and	
  Data	
  Node	
  in	
  a	
  VM	
  

                                                                                  Add/Remove	
  
                                                                  Slot	
  
                                                                                  Slots?	
  
                                                                  Slot	
  

    Other	
  
                                              Virtual	
     Task	
  Tracker	
  
                                              Hadoop	
  
    Workload	
  
                                              Node	
  

                                                             Datanode	
  
                                                                                           Grow/Shrink	
  
                                                                                           by	
  tens	
  of	
  GB?	
  



   Virtualiza?on	
  Host	
                                       VMDK	
  




Grow/Shrink	
  of	
  a	
  VM	
  is	
  one	
  	
  
approach	
  
2.	
  Add/remove	
  Virtual	
  Nodes	
  

                                                   Slot	
                              Slot	
  
                                                   Slot	
                              Slot	
  

 Other	
  
                               Virtual	
     Task	
  Tracker	
     Virtual	
     Task	
  Tracker	
  
                               Hadoop	
                            Hadoop	
  
 Workload	
  
                               Node	
                              Node	
  

                                              Datanode	
                          Datanode	
  




 Virtualiza?on	
  Host	
                          VMDK	
                              VMDK	
  




Just	
  add/remove	
  more	
  
virtual	
  nodes?	
  
But	
  State	
  makes	
  it	
  hard	
  to	
  power-­‐off	
  a	
  node	
  

                                                             Slot	
  
                                                             Slot	
  

Other	
  
                                         Virtual	
     Task	
  Tracker	
  
                                         Hadoop	
  
Workload	
  
                                         Node	
  

                                                        Datanode	
  




Virtualiza?on	
  Host	
                                     VMDK	
  




               Powering	
  off	
  the	
  Hadoop	
  VM	
  
               would	
  loose	
  the	
  Datanode	
  
Adding	
  a	
  node	
  needs	
  data…	
  

                                                                Slot	
                              Slot	
  
                                                                Slot	
                              Slot	
  

 Other	
  
                                            Virtual	
     Task	
  Tracker	
     Virtual	
     Task	
  Tracker	
  
                                            Hadoop	
                            Hadoop	
  
 Workload	
  
                                            Node	
                              Node	
  

                                                           Datanode	
                          Datanode	
  




 Virtualiza?on	
  Host	
                                       VMDK	
                              VMDK	
  




Adding	
  a	
  node	
  would	
  require	
  TBs	
  of	
  	
  
data	
  replica?on	
  
2.	
  Separated	
  Compute	
  and	
  Data	
  

                                                                                                      Slot	
  
                                                    Slot	
                     Virtual	
       Slot	
  
                                                                            Virtual	
  
                                                                               Hadoop	
               Slot	
  
                                Virtual	
  
                                Hadoop	
  
                                                    Slot	
                Virtual	
  
                                                                            Hadoop	
  
                                                                               Node	
  
                                                                          Hadoop	
  
                                                                                               Slot	
  
                                                                            Node	
  
                                Node	
                                    Node	
               Task	
  Tracker	
  
   Other	
                                    Task	
  Tracker	
                            Task	
  Tracker	
  
   Workload	
  




                               Virtual	
  
                               Hadoop	
                             Datanode	
  
                               Node	
  



   Virtualiza?on	
  Host	
                         VMDK	
                                    VMDK	
  



Truly	
  ElasSc	
  Hadoop:	
  
Scalable	
  through	
  virtual	
  
nodes	
  
Dataflow	
  with	
  separated	
  Compute/Data	
  

                                                  Slot	
  
                            Virtual	
             Slot	
                                 Virtual	
  
                            Hadoop	
                                                     Hadoop	
  
                            Node	
                                                       Node	
               Datanode	
  
                                          Task	
  Tracker	
  




                                            Virtual	
  NIC	
                                 Virtual	
  NIC	
  




Virtualiza?on	
  Host	
                                          Virtual	
  Switch	
                              VMDK	
  


                                                                    NIC	
  Drivers	
  
Performance	
  Analysis	
  of	
  Split	
  
Setup	
  	
  1	
  Datanode,	
  4	
  Compute	
  nodes	
  per	
  Host	
  
Workload:	
  Terasort	
  


   NodeManager	
               NodeManager	
              NodeManager	
     NodeManager	
  

   NodeManager	
               NodeManager	
              NodeManager	
     NodeManager	
  

   NodeManager	
               NodeManager	
              NodeManager	
     NodeManager	
  

   NodeManager	
               NodeManager	
              NodeManager	
     NodeManager	
  

     Datanode	
                  Datanode	
                Datanode	
        Datanode	
  
Demo:	
  Shrink/Expand	
  Cluster	
  	
  
Demo:	
  Shrink/Expand	
  Cluster	
  	
  
Setup	
  1	
  Datanodes,	
  2	
  Nodemanagers	
  and	
  2	
  web	
  servers	
  on	
  
each	
  physical	
  host	
  	
  

         Web	
  Server	
            Web	
  Server	
          Web	
  Server	
             Web	
  Server	
  


         Web	
  Server	
            Web	
  Server	
          Web	
  Server	
             Web	
  Server	
  


        NodeManager	
             NodeManager	
             NodeManager	
               NodeManager	
  


        NodeManager	
             NodeManager	
             NodeManager	
               NodeManager	
  


          Datanode	
                Datanode	
               Datanode	
                  Datanode	
  
Demo:	
  Shrink/Expand	
  Cluster	
  	
  
When	
  web	
  load	
  is	
  high	
  in	
  daySme,	
  we	
  can	
  suspend	
  some	
  Nodemanagers	
  and	
  
power	
  on	
  more	
  Web	
  servers.	
  

          Web	
  Server	
          Web	
  Server	
          Web	
  Server	
           Web	
  Server	
  


          Web	
  Server	
          Web	
  Server	
          Web	
  Server	
           Web	
  Server	
  

        NodeManager	
             NodeManager	
            NodeManager	
             NodeManager	
  

        NodeManager	
             NodeManager	
            NodeManager	
             NodeManager	
  


          Datanode	
                Datanode	
               Datanode	
                Datanode	
  
Demo	
  
Tying	
  it	
  together:	
  ElasSc	
  Hadoop	
  
                                                              Coke	
                                     Pepsi	
  




                                                                              	
  Hadoop	
  
                                                                              	
  Hadoop	
  




                                                                                                                     	
  Hadoop	
  
                                  	
  Hadoop	
  




                                                                              	
  Queue	
  
                                                                              Virtual	
  
                                                                              Virtual	
  




                                                                                                                     Virtual	
  
                                  Virtual	
  




 RunSme	
  	
  
 Layer	
  

Data	
  Layer	
  
                                    Data	
                                    Data	
                  Data	
  
                                    Container	
                               Container	
             Container	
  

                  Distributed	
  File	
  System	
  (HDFS,	
  KFS,	
  MAPR,	
  Isilon,…)	
  


                       Host	
                      Host	
                Host	
            Host	
      Host	
               Host	
  
Part	
  2	
  
Expand	
  Hadoop	
  Ecosystem	
  
•  Hortonworks	
  goal	
  
    –  Expand	
  Hadoop	
  ecosystem	
  
    –  Provide	
  first	
  class	
  support	
  of	
  various	
  plajorms	
  
•  Hadoop	
  should	
  run	
  well	
  on	
  VMs	
  
        •  VMs	
  offer	
  several	
  advantages	
  as	
  presented	
  earlier	
  
•  Take	
  advantage	
  of	
  vSphere	
  for	
  HA	
  



                                                                                    Page	
  24	
  
VMware-­‐Hortonworks	
  Joint	
  
                 Engineering	
  
•  First	
  class	
  support	
  for	
  VMs	
  
    –  Topology	
  plugins	
  (Hadoop-­‐8468)	
  
        •  2	
  VMs	
  can	
  be	
  on	
  same	
  host	
  
                – Pick	
  closer	
  data	
  
                – Schedule	
  tasks	
  closer	
  
                – Don’t	
  put	
  two	
  replicas	
  on	
  same	
  host	
  
    –  MR-­‐tmp	
  on	
  HDFS	
  using	
  block	
  pools	
  
        •  ElasSc	
  Compute-­‐VMs	
  will	
  not	
  need	
  local	
  disk	
  
    –  Fast	
  communicaSons	
  within	
  VMs	
  	
  

                                                                                 Page	
  25	
  
Hadoop	
  Total	
  System	
  Availability	
  
                         Architecture	
  
                                               Slave	
  Nodes	
  of	
  Hadoop	
  Cluster	
  


                             job	
                job	
                 job	
       job	
        job	
  


 Apps	
  
Running	
  
Outside	
  
                                                                Failover	
  

                                   JT	
  into	
  Safemode	
  

                    NN	
                                           JT	
                       NN	
  
                                                                                                                N+K	
  	
  
                      Server	
                                         Server	
                  Server	
     failover	
  

                                       HA	
  Cluster	
  for	
  	
  Master	
  Daemons	
  
                                                                                                                        26	
  
HA is coming in 1.0
Using Total System Availability Architecture




                                               27	
  
 ©	
  Hortonworks	
  Inc.	
  2011	
  
HA	
  in	
  Hadoop	
  1	
  with	
  HDP1	
  
•  Total	
  System	
  Availability	
  Architecture	
  
    –  Namenode	
  
         •  Clients	
  pause	
  automaScally	
  
         •  JobTracker	
  pauses	
  automaScally	
  
    –  Other	
  Hadoop	
  master	
  services	
  (JT,	
  …)	
  coming	
  

•  Use	
  industry	
  proven	
  HA	
  framework	
  
    –  VMWare	
  vSphere-­‐HA	
  
         •  Failover,	
  fencing,	
  …	
  
         •  Corner	
  cases	
  are	
  tricky	
  –	
  if	
  not	
  addressed,	
  corrupSon	
  
    –  AddiSon	
  benefits:	
  	
  
         •  N-­‐N	
  &	
  N+K	
  failover	
  
         •  MigraSon	
  for	
  maintenance	
  

                                                                                                28	
  
Hadoop	
  NN/JT	
  HA	
  with	
  vSphere	
  




                                          Page	
  29	
  
NameNode	
  HA	
  –	
  Failover	
  Times	
  	
  

•  NameNode	
  Failover	
  Smes	
  with	
  vSphere	
  and	
  LinuxHA	
  
    –  Failure	
  detecSon	
  +	
  Failover	
  –	
  0.5	
  to	
  2	
  minutes	
  
    –  OS	
  bootup	
  needed	
  for	
  vSphere	
  –	
  1	
  minute	
  
    –  Namenode	
  Startup	
  (exit	
  safemode)	
  
            •  Small/Medium	
  clusters	
  –	
  1	
  to	
  2	
  minutes	
  
            •  Large	
  cluster	
  –	
  5	
  to	
  15	
  minutes	
  

•  NameNode	
  startup	
  Sme	
  measurements	
  
    –  60	
  Nodes,	
  60K	
  files,	
  6	
  million	
  blocks,	
  300	
  TB	
  raw	
  storage	
  –	
  40	
  sec	
  
    –  180	
  Nodes,	
  200K	
  files,	
  18	
  million	
  blocks,	
  900TB	
  raw	
  storage	
  –	
  120	
  sec	
  

    Cold	
  Failover	
  is	
  good	
  enough	
  for	
  small/medium	
  clusters	
  	
  
            Failure	
  Detec:on	
  and	
  Automa:c	
  Failover	
  Dominates	
  	
  
                                                                                                                      30	
  
Summary	
  
•  Advantages	
  of	
  Hadoop	
  on	
  VMs	
  
   –  Cluster	
  Management	
  
   –  Cluster	
  consolidaSon	
  
   –  Greater	
  ElasScity	
  in	
  mixed	
  environment	
  
   –  Alternate	
  mulS-­‐tenancy	
  to	
  capacity	
  scheduler’s	
  
      offerings	
  
•  HA	
  for	
  Hadoop	
  Master	
  Daemons	
  
   –  vSphere	
  based	
  HA	
  for	
  NN,	
  JT,	
  …	
  in	
  Hadoop	
  1	
  
   –  Total	
  System	
  Availability	
  Architecture	
  

                                                                                  Page	
  31	
  
Backup	
  
Cluster	
  ConfiguraSon	
  
•  Hardware	
  
     –    AMAX	
  ClusterMax,	
  7	
  nodes	
  
     –    2X	
  X5650	
  2.67	
  GHz	
  hex-­‐core,	
  96	
  GB	
  memory	
  
     –    12X	
  SATA	
  500	
  GB	
  7200	
  RPM	
  (10	
  for	
  Hadoop	
  data),	
  EXT4	
  
     –    Mellanox	
  ConnectX	
  VPI	
  (MT26418),	
  10	
  GbE	
  
     –    Mellanox	
  Vantage	
  6048,	
  10	
  GbE	
  
•  OS/Hypervisor	
  
     –  RHEL	
  6.1	
  x86_64	
  (naSve	
  and	
  guest)	
  
     –  ESX	
  5.0	
  RTM	
  with	
  devel	
  Mellanox	
  driver	
  
•  VMs	
  (HT	
  off/on)	
  
     –  1	
  VM:	
  	
  92000	
  MB,	
  (12/24)	
  vCPUs,	
  10	
  PRDM	
  disks	
  
     –  2	
  VMs:	
  46000	
  MB,	
  (6/12)	
  vCPUs,	
  5	
  PRDM	
  disks	
  
Hadoop	
  ConfiguraSon	
  
DistribuSon	
  
       –  Based	
  on	
  Apache	
  open-­‐source	
  0.20.2	
  
Parameters	
  
       –    dfs.datanode.max.xcievers=4096	
  
       –    dfs.replicaSon=2	
  
       –    dfs.block.size=134217728	
  
       –    io.file.buffer.size=131072	
  
       –    mapred.child.java.opts=”-­‐Xmx2048m	
  -­‐Xmn512m”	
  (naSve)	
  
       –    mapred.child.java.opts=”-­‐Xmx1900m	
  -­‐Xmn512m”	
  (virtual)	
  
•  Network	
  topology	
  
       –  Hadoop	
  uses	
  info	
  for	
  reliability	
  and	
  performance	
  
       –  MulSple	
  VMs/host:	
  	
  Each	
  host	
  is	
  a	
  “rack”	
  
	
  
What	
  about	
  Performance?	
  

                    Mellanox10	
  GbE	
  switch	
  


AMAX	
  ClusterMax	
  
2X	
  X5650,	
  96	
  GB	
  
12X	
  SATA	
  500	
  GB	
  
Mellanox	
  10	
  GbE	
  adapter	
  
Tying	
  it	
  together:	
  ElasSc	
  Hadoop	
  
                                                              Coke	
                                     Pepsi	
  




                                                                              	
  Hadoop	
  
                                                                              	
  Hadoop	
  




                                                                                                                     	
  Hadoop	
  
                                  	
  Hadoop	
  




                                                                              	
  Queue	
  
                                                                              Virtual	
  
                                                                              Virtual	
  




                                                                                                                     Virtual	
  
                                  Virtual	
  




 RunSme	
  	
  
 Layer	
  

Data	
  Layer	
  
                                    Data	
                                    Data	
                  Data	
  
                                    Container	
                               Container	
             Container	
  

                  Distributed	
  File	
  System	
  (HDFS,	
  KFS,	
  MAPR,	
  Isilon,…)	
  


                       Host	
                      Host	
                Host	
            Host	
      Host	
               Host	
  
Resource	
  Shiwing	
  using	
  VirtualizaSon	
  
  Other	
  VM	
  

                    Other	
  VM	
  

                                       Other	
  VM	
  

                                                         Other	
  VM	
  

                                                                           Other	
  VM	
  



                                                                                             Other	
  VM	
  

                                                                                                                Other	
  VM	
  

                                                                                                                                    Other	
  VM	
  

                                                                                                                                                      Other	
  VM	
  

                                                                                                                                                                        Other	
  VM	
  




                                                                                                                                                                                          Other	
  VM	
  

                                                                                                                                                                                                            Other	
  VM	
  

                                                                                                                                                                                                                               Other	
  VM	
  

                                                                                                                                                                                                                                                 Other	
  VM	
  

                                                                                                                                                                                                                                                                   Other	
  VM	
  
   Hadoop	
  

                     Hadoop	
  




                                                                                              Hadoop	
  

                                                                                                                 Hadoop	
  




                                                                                                                                                                                           Hadoop	
  

                                                                                                                                                                                                             Hadoop	
  
                                                                                                               Virtualiza?on	
  PlaQorm	
  

                                      Host	
                                                                                      Host	
                                                                                      Host	
  
                                      HDFS	
                                                                                      HDFS	
                                                                                      HDFS	
  




               While	
  exisSng	
  apps	
  run	
  during	
  the	
  day	
  to	
  support	
  business	
  
               operaSons,	
  Hadoop	
  batch	
  jobs	
  kicks	
  off	
  at	
  night	
  to	
  conduct	
  deep	
  
               analysis	
  of	
  data.	
  
The	
  cluster	
  is	
  the	
  machine	
  



                                                                                                                                     HP
                                                                                                                                                vCenter	
                                                                                                               HP
                                                  1           2                                                                   ProLiant                                                   1          2                                                            ProLiant
                                                              OVER                                                                DL380G6                                                               OVER                                                         DL380G6
                                     1            2           TEMP                 1       5                                                                                  1              2          TEMP                       1       5
                           POWER         POWER                                                                                                                          POWER        POWER
                           SUPPLY        SUPPLY               INTER                                        PL A Y ER                                                    SUPPLY       SUPPLY             INTER                                      PL A Y ER
                                                              LOCK                                                                                                                                      LOCK
                           POWER CAP                                                                                                                                    POWER CAP
                                                               DIMMS                                                                                                                                     DIMMS
                        1A 3G 5E 7C 9i                 9i 7C 5E 3G 1A                                                                                                  1A 3G 5E 7C 9i            9i 7C 5E 3G 1A

                                                                                   2       6                                                                                                                                       2       6
                           2D 4B 6H 8F                  8F 6H 4B 2D                                                                                                     2D 4B 6H 8F               8F 6H 4B 2D
                                             ONLINE                                                                                                                                      ONLINE
                                     1       SPARE                     2                                                                                                         1       SPARE                    2
                            PROC                                  PROC                                                                                                   PROC                               PROC
                                                 MIRROR                                                                                                                                  MIRROR
                        FANS                                                                                                                                           FANS
                                                                                   3       7                                                                                                                                       3       7
                             1           2         3      4        5       6                                                                                              1          2       3     4         5         6




                                                                                   4       8                                                                                                                                       4       8




                                                                                                                                             Imbalanced	
  
                                                                                                                                              Balanced	
  
                                                                                                                                             Cluster	
  
                                                                                                                                              Cluster	
  




 POWER
 SUPPLY
       1


 POWER CAP
              POWER
              SUPPLY
                    1

                    2
                                 2
                                OVER
                                TEMP

                                 INTER
                                 LOCK
                                                                               1       5
                                                                                               PL A Y ER
                                                                                                                          HP
                                                                                                                       ProLiant
                                                                                                                       DL380G6
                                                                                                                                                 Heavy	
  Load	
               POWER
                                                                                                                                                                               SUPPLY

                                                                                                                                                                               POWER CAP
                                                                                                                                                                                         1




                                                                                                                                                                              1A 3G 5E 7C 9i
                                                                                                                                                                                             POWER
                                                                                                                                                                                             SUPPLY
                                                                                                                                                                                                   1

                                                                                                                                                                                                   2
                                                                                                                                                                                                                 2
                                                                                                                                                                                                                 OVER
                                                                                                                                                                                                                 TEMP

                                                                                                                                                                                                                 INTER
                                                                                                                                                                                                                 LOCK
                                                                                                                                                                                                                DIMMS
                                                                                                                                                                                                        9i 7C 5E 3G 1A
                                                                                                                                                                                                                                       1       5
                                                                                                                                                                                                                                                         PL A Y ER
                                                                                                                                                                                                                                                                          HP
                                                                                                                                                                                                                                                                       ProLiant
                                                                                                                                                                                                                                                                       DL380G6




                                DIMMS
1A 3G 5E 7C 9i          9i 7C 5E 3G 1A                                                                                                                                                                                                 2       6

                                                                               2       6                                                                                          2D 4B 6H 8F               8F 6H 4B 2D
                                                                                                                                                                                                 ONLINE
                                                                                                                                                                                         1       SPARE                     2
 2D 4B 6H 8F               8F 6H 4B 2D
                  ONLINE                                                                                                                                                             PROC                             PROC
          1                                  2                                                                                                                                                    MIRROR
                  SPARE                                                                                                                                                       FANS
  PROC                               PROC
                                                                                                                                                                                                                                       3       7
                  MIRROR                                                                                                                                                             1       2      3        4         5       6
FANS
                                                                               3       7
   1          2     3       4            5        6



                                                                                                                                                                                                                                       4       8

                                                                               4       8




                                                                                                                                                 Lighter	
  Load	
  
SAN,	
  NAS	
  or	
  Local	
  Storage?	
  
        •  Shared	
  Storage:	
  SAN	
  or	
  NAS	
                                                                                          •  Hybrid	
  Storage	
  
                                –  Easy	
  to	
  provision	
                                                                                               –  SAN/NAS	
  for	
  boot	
  images,	
  
                                –  Automated	
  cluster	
                                                                                                     VMs,	
  other	
  workloads	
  
                                   rebalancing	
                                                                                                           –  Local	
  disk	
  for	
  Hadoop	
  &	
  HDFS	
  
                                                                                                                                                           –  Scalable	
  Bandwidth,	
  Lower	
  
                                                                                                                                                              Cost/GB	
  
              Other	
  VM	
  

                                 Other	
  VM	
  




                                                                              Other	
  VM	
  




                                                                                                                           Other	
  VM	
  




                                                                                                                                                            Other	
  VM	
  

                                                                                                                                                                              Other	
  VM	
  




                                                                                                                                                                                                                           Other	
  VM	
  




                                                                                                                                                                                                                                                                        Other	
  VM	
  
Hadoop	
  




                                                   Hadoop	
  

                                                                 Hadoop	
  




                                                                                                Hadoop	
  

                                                                                                              Hadoop	
  




                                                                                                                                              Hadoop	
  




                                                                                                                                                                                                Hadoop	
  

                                                                                                                                                                                                              Hadoop	
  




                                                                                                                                                                                                                                             Hadoop	
  

                                                                                                                                                                                                                                                           Hadoop	
  
             Host	
                                             Host	
                                       Host	
                                        Host	
                                            Host	
                                       Host	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Gpdb best practices v a01 20150313
Gpdb best practices v a01 20150313Gpdb best practices v a01 20150313
Gpdb best practices v a01 20150313Sanghee Lee
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
오픈소스 공간통계분석 패키지 개발
오픈소스  공간통계분석 패키지 개발오픈소스  공간통계분석 패키지 개발
오픈소스 공간통계분석 패키지 개발MinPa Lee
 
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBaseComparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBaseAccumulo Summit
 

Was ist angesagt? (20)

Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...
Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...
Introduction to CloudStack: How to Deploy and Manage Infrastructure-as-a-Serv...
 
Gpdb best practices v a01 20150313
Gpdb best practices v a01 20150313Gpdb best practices v a01 20150313
Gpdb best practices v a01 20150313
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
오픈소스 공간통계분석 패키지 개발
오픈소스  공간통계분석 패키지 개발오픈소스  공간통계분석 패키지 개발
오픈소스 공간통계분석 패키지 개발
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBaseComparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBase
 

Andere mochten auch

Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopHortonworks
 
How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015Michael Ponomarew
 
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry PaulFish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paulvandananicky
 
What is system level analysis
What is system level analysisWhat is system level analysis
What is system level analysisCAST
 
Rate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsRate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsPaul singh
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersjanritari
 
Financial aspects of marketing management
Financial aspects of marketing managementFinancial aspects of marketing management
Financial aspects of marketing managementBabasab Patil
 
Moving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StoryMoving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StorySauce Labs
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
 
IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)Nurhazman Abdul Aziz
 
The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work Cav1234
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
GRE Computer Raw Conversion Table
GRE Computer Raw Conversion TableGRE Computer Raw Conversion Table
GRE Computer Raw Conversion TableSuccess Prep
 
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...CA Technologies
 
성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 Intro
성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 Intro성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 Intro
성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 IntroAmazon Web Services Korea
 

Andere mochten auch (19)

Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015
 
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry PaulFish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
 
What is system level analysis
What is system level analysisWhat is system level analysis
What is system level analysis
 
Rate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsRate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applications
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answers
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
Financial aspects of marketing management
Financial aspects of marketing managementFinancial aspects of marketing management
Financial aspects of marketing management
 
Moving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StoryMoving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life Story
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
Progeny LIMS
Progeny LIMSProgeny LIMS
Progeny LIMS
 
Getting Past No
Getting Past NoGetting Past No
Getting Past No
 
IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)
 
Matrix Effect
Matrix EffectMatrix Effect
Matrix Effect
 
The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
GRE Computer Raw Conversion Table
GRE Computer Raw Conversion TableGRE Computer Raw Conversion Table
GRE Computer Raw Conversion Table
 
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
 
성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 Intro
성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 Intro성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 Intro
성공적인 AWS Cloud 마이그레이션 전략 및 사례 - 방희란 매니저:: AWS Cloud Track 1 Intro
 

Ähnlich wie Apache Hadoop on Virtual Machines

Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialRichard McDougall
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloudaidanshribman
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
16 August 2012 - SWUG - Hyper-V in Windows 2012
16 August 2012 - SWUG - Hyper-V in Windows 201216 August 2012 - SWUG - Hyper-V in Windows 2012
16 August 2012 - SWUG - Hyper-V in Windows 2012Daniel Mar
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCPThe Linux Foundation
 
Hyper V R2 Deep Dive
Hyper V R2 Deep DiveHyper V R2 Deep Dive
Hyper V R2 Deep DiveAidan Finn
 
Hyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisHyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisBlauge
 
Building Business Continuity Solutions With Hyper V
Building Business Continuity Solutions With Hyper VBuilding Business Continuity Solutions With Hyper V
Building Business Continuity Solutions With Hyper Vrsnarayanan
 
Windows server 2012 failover clustering improvements
Windows server 2012   failover clustering improvementsWindows server 2012   failover clustering improvements
Windows server 2012 failover clustering improvementsSusantha Silva
 
CloudStack Architecture Future
CloudStack Architecture FutureCloudStack Architecture Future
CloudStack Architecture FutureKimihiko Kitase
 
21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentation21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentationdataplex systems limited
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 

Ähnlich wie Apache Hadoop on Virtual Machines (20)

Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A Tutorial
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
16 August 2012 - SWUG - Hyper-V in Windows 2012
16 August 2012 - SWUG - Hyper-V in Windows 201216 August 2012 - SWUG - Hyper-V in Windows 2012
16 August 2012 - SWUG - Hyper-V in Windows 2012
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCP
 
An Introduction to Azure IaaS
An Introduction to Azure IaaSAn Introduction to Azure IaaS
An Introduction to Azure IaaS
 
Hyper V R2 Deep Dive
Hyper V R2 Deep DiveHyper V R2 Deep Dive
Hyper V R2 Deep Dive
 
Hyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisHyper V And Scvmm Best Practis
Hyper V And Scvmm Best Practis
 
Building Business Continuity Solutions With Hyper V
Building Business Continuity Solutions With Hyper VBuilding Business Continuity Solutions With Hyper V
Building Business Continuity Solutions With Hyper V
 
Windows server 2012 failover clustering improvements
Windows server 2012   failover clustering improvementsWindows server 2012   failover clustering improvements
Windows server 2012 failover clustering improvements
 
Open nebula froscon
Open nebula frosconOpen nebula froscon
Open nebula froscon
 
CloudStack Architecture Future
CloudStack Architecture FutureCloudStack Architecture Future
CloudStack Architecture Future
 
21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentation21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentation
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Apache Hadoop on Virtual Machines

  • 1. Hadoop  in  Virtual  Machines   Richard  McDougall,  VMware   Sanjay  Radia,  Hortonworks     Hadoop  Summit,  2012    
  • 3. Say  What?   •  VMs  will  just  add  overhead,  due  to  I/O  virt   •  VMs  run  on  SAN,  we’re  all  about  local  disks   •  Hadoop  does  it’s  own  cluster  management   •  It’ll  do  resource  management  in  2.0   •  And  even  HA  is  coming  to  Hadoop   •  And…  what  is  the  point,  anyway?  
  • 4. But  you’ve  been  asking…   •  Can  I  virtualize  my  Hadoop,  so  that  I  can  make   it  easier,  quicker  to  get  a  cluster  up  and   running   •  Is  it  possible  to  run  Hadoop  on  those  spare   machine  cycles  I  have  on  hundreds/thousands   of  nodes?   •  Can  I  make  my  system  more  available  by  using   some  of  the  standard  HA  features?  
  • 5. And  the  savvy  are  asking…   •  Can  I  avoid  having  to  install  special  hardware   for  the  master  services,  like  name-­‐node,  job-­‐ tracker?   •  Can  I  dynamically  change  the  size  of  the   cluster  to  use  more  resources?   •  Can  I  use  VM  isolaSon  to  increase  security  or   guard  against  resource-­‐intensive  neighbors?   •  Is  it  feasible  to  provision  virtual-­‐clusters,   giving  out  one  each  to  a  business  unit?  
  • 6. VirtualizaSon,  in  VMware’s  vSphere   File       TCP/IP   System   Monitor  Emulates  Physical   Guest   Guest   Devices:  CPU,  Memory,  I/O   Monitor   Monitor   CPU  is  controlled  by  scheduler   Virtual  NIC   Virtual  SCSI   and  virtualized  by  monitor       Memory   VMkernel   Scheduler Manager   Virtual  Switch   File  System   Memory  is  allocated  by  the   VMkernel  and  virtualized  by   NIC  Drivers   I/O  Drivers   the  monitor   Network  and  I/O  devices  are   Physical   emulated  and  proxied  though   Hardware   naSve  device  drivers  
  • 7. Ok,  so  first  what  about  the  concerns?   •  Use  your  SAN?      …  if  you  want  to.   SAN  Storage   NAS  Filers   Local  Storage         $2  -­‐  $10/Gigabyte   $1  -­‐  $5/Gigabyte   $0.05/Gigabyte         $1M  gets:   $1M  gets:   $1M  gets:   0.5Petabytes   1  Petabyte   20  Petabytes    1,000,000  IOPS   400,000  IOPS   10,000,000  IOPS   1Gbyte/sec   2Gbyte/sec   800  Gbytes/sec    
  • 8. Hadoop  Using  Local  Disks   Task  Tracker   Datanode   Other   Hadoop   Workload   Virtual   Machine   Ext4   Ext4   Ext4   Virtualiza?on  Host   OS  Image  -­‐  VMDK   VMDK   VMDK   VMDK   Shared   Storage  
  • 9. Hadoop  Perf  in    a  VM     (RaSo  is  elapsed  Sme  to  physical,  Lower  Is  Becer)   1.2   1   Ra?o  to  Na?ve   0.8   0.6   0.4   1  VM   2  VMs   0.2   0  
  • 10. EvoluSon  of  Hadoop  on  VMs   VM   VM   VM   VM   Current   Hadoop:   Compute   T1   T2     Combined   VM   VM   Storage/ Storage   Storage   Compute   Hadoop  in  VM   Separate  Storage   Separate  Compute  Clusters   -­‐  VM  lifecycle   -­‐  Separate  compute   -­‐  Separate  virtual  clusters   determined   from  data   per  tenant   by  Datanode   -­‐  ElasSc  compute   -­‐  Stronger  VM-­‐grade  security   -­‐  NOT  ElasSc   -­‐  Enable  shared   and  resource  isolaSon   -­‐  Limited  to  Hadoop   workloads   -­‐  Enable  deployment  of   MulS-­‐Tenancy   -­‐  Raise  uSlizaSon   mulSple  Hadoop  runSme     versions  
  • 11. 1.  Hadoop  Task  Tracker  and  Data  Node  in  a  VM   Add/Remove   Slot   Slots?   Slot   Other   Virtual   Task  Tracker   Hadoop   Workload   Node   Datanode   Grow/Shrink   by  tens  of  GB?   Virtualiza?on  Host   VMDK   Grow/Shrink  of  a  VM  is  one     approach  
  • 12. 2.  Add/remove  Virtual  Nodes   Slot   Slot   Slot   Slot   Other   Virtual   Task  Tracker   Virtual   Task  Tracker   Hadoop   Hadoop   Workload   Node   Node   Datanode   Datanode   Virtualiza?on  Host   VMDK   VMDK   Just  add/remove  more   virtual  nodes?  
  • 13. But  State  makes  it  hard  to  power-­‐off  a  node   Slot   Slot   Other   Virtual   Task  Tracker   Hadoop   Workload   Node   Datanode   Virtualiza?on  Host   VMDK   Powering  off  the  Hadoop  VM   would  loose  the  Datanode  
  • 14. Adding  a  node  needs  data…   Slot   Slot   Slot   Slot   Other   Virtual   Task  Tracker   Virtual   Task  Tracker   Hadoop   Hadoop   Workload   Node   Node   Datanode   Datanode   Virtualiza?on  Host   VMDK   VMDK   Adding  a  node  would  require  TBs  of     data  replica?on  
  • 15. 2.  Separated  Compute  and  Data   Slot   Slot   Virtual   Slot   Virtual   Hadoop   Slot   Virtual   Hadoop   Slot   Virtual   Hadoop   Node   Hadoop   Slot   Node   Node   Node   Task  Tracker   Other   Task  Tracker   Task  Tracker   Workload   Virtual   Hadoop   Datanode   Node   Virtualiza?on  Host   VMDK   VMDK   Truly  ElasSc  Hadoop:   Scalable  through  virtual   nodes  
  • 16. Dataflow  with  separated  Compute/Data   Slot   Virtual   Slot   Virtual   Hadoop   Hadoop   Node   Node   Datanode   Task  Tracker   Virtual  NIC   Virtual  NIC   Virtualiza?on  Host   Virtual  Switch   VMDK   NIC  Drivers  
  • 17. Performance  Analysis  of  Split   Setup    1  Datanode,  4  Compute  nodes  per  Host   Workload:  Terasort   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   Datanode   Datanode   Datanode   Datanode  
  • 19. Demo:  Shrink/Expand  Cluster     Setup  1  Datanodes,  2  Nodemanagers  and  2  web  servers  on   each  physical  host     Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   Datanode   Datanode   Datanode   Datanode  
  • 20. Demo:  Shrink/Expand  Cluster     When  web  load  is  high  in  daySme,  we  can  suspend  some  Nodemanagers  and   power  on  more  Web  servers.   Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   Web  Server   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   Datanode   Datanode   Datanode   Datanode  
  • 22. Tying  it  together:  ElasSc  Hadoop   Coke   Pepsi    Hadoop    Hadoop    Hadoop    Hadoop    Queue   Virtual   Virtual   Virtual   Virtual   RunSme     Layer   Data  Layer   Data   Data   Data   Container   Container   Container   Distributed  File  System  (HDFS,  KFS,  MAPR,  Isilon,…)   Host   Host   Host   Host   Host   Host  
  • 24. Expand  Hadoop  Ecosystem   •  Hortonworks  goal   –  Expand  Hadoop  ecosystem   –  Provide  first  class  support  of  various  plajorms   •  Hadoop  should  run  well  on  VMs   •  VMs  offer  several  advantages  as  presented  earlier   •  Take  advantage  of  vSphere  for  HA   Page  24  
  • 25. VMware-­‐Hortonworks  Joint   Engineering   •  First  class  support  for  VMs   –  Topology  plugins  (Hadoop-­‐8468)   •  2  VMs  can  be  on  same  host   – Pick  closer  data   – Schedule  tasks  closer   – Don’t  put  two  replicas  on  same  host   –  MR-­‐tmp  on  HDFS  using  block  pools   •  ElasSc  Compute-­‐VMs  will  not  need  local  disk   –  Fast  communicaSons  within  VMs     Page  25  
  • 26. Hadoop  Total  System  Availability   Architecture   Slave  Nodes  of  Hadoop  Cluster   job   job   job   job   job   Apps   Running   Outside   Failover   JT  into  Safemode   NN   JT   NN   N+K     Server   Server   Server   failover   HA  Cluster  for    Master  Daemons   26  
  • 27. HA is coming in 1.0 Using Total System Availability Architecture 27   ©  Hortonworks  Inc.  2011  
  • 28. HA  in  Hadoop  1  with  HDP1   •  Total  System  Availability  Architecture   –  Namenode   •  Clients  pause  automaScally   •  JobTracker  pauses  automaScally   –  Other  Hadoop  master  services  (JT,  …)  coming   •  Use  industry  proven  HA  framework   –  VMWare  vSphere-­‐HA   •  Failover,  fencing,  …   •  Corner  cases  are  tricky  –  if  not  addressed,  corrupSon   –  AddiSon  benefits:     •  N-­‐N  &  N+K  failover   •  MigraSon  for  maintenance   28  
  • 29. Hadoop  NN/JT  HA  with  vSphere   Page  29  
  • 30. NameNode  HA  –  Failover  Times     •  NameNode  Failover  Smes  with  vSphere  and  LinuxHA   –  Failure  detecSon  +  Failover  –  0.5  to  2  minutes   –  OS  bootup  needed  for  vSphere  –  1  minute   –  Namenode  Startup  (exit  safemode)   •  Small/Medium  clusters  –  1  to  2  minutes   •  Large  cluster  –  5  to  15  minutes   •  NameNode  startup  Sme  measurements   –  60  Nodes,  60K  files,  6  million  blocks,  300  TB  raw  storage  –  40  sec   –  180  Nodes,  200K  files,  18  million  blocks,  900TB  raw  storage  –  120  sec   Cold  Failover  is  good  enough  for  small/medium  clusters     Failure  Detec:on  and  Automa:c  Failover  Dominates     30  
  • 31. Summary   •  Advantages  of  Hadoop  on  VMs   –  Cluster  Management   –  Cluster  consolidaSon   –  Greater  ElasScity  in  mixed  environment   –  Alternate  mulS-­‐tenancy  to  capacity  scheduler’s   offerings   •  HA  for  Hadoop  Master  Daemons   –  vSphere  based  HA  for  NN,  JT,  …  in  Hadoop  1   –  Total  System  Availability  Architecture   Page  31  
  • 33. Cluster  ConfiguraSon   •  Hardware   –  AMAX  ClusterMax,  7  nodes   –  2X  X5650  2.67  GHz  hex-­‐core,  96  GB  memory   –  12X  SATA  500  GB  7200  RPM  (10  for  Hadoop  data),  EXT4   –  Mellanox  ConnectX  VPI  (MT26418),  10  GbE   –  Mellanox  Vantage  6048,  10  GbE   •  OS/Hypervisor   –  RHEL  6.1  x86_64  (naSve  and  guest)   –  ESX  5.0  RTM  with  devel  Mellanox  driver   •  VMs  (HT  off/on)   –  1  VM:    92000  MB,  (12/24)  vCPUs,  10  PRDM  disks   –  2  VMs:  46000  MB,  (6/12)  vCPUs,  5  PRDM  disks  
  • 34. Hadoop  ConfiguraSon   DistribuSon   –  Based  on  Apache  open-­‐source  0.20.2   Parameters   –  dfs.datanode.max.xcievers=4096   –  dfs.replicaSon=2   –  dfs.block.size=134217728   –  io.file.buffer.size=131072   –  mapred.child.java.opts=”-­‐Xmx2048m  -­‐Xmn512m”  (naSve)   –  mapred.child.java.opts=”-­‐Xmx1900m  -­‐Xmn512m”  (virtual)   •  Network  topology   –  Hadoop  uses  info  for  reliability  and  performance   –  MulSple  VMs/host:    Each  host  is  a  “rack”    
  • 35. What  about  Performance?   Mellanox10  GbE  switch   AMAX  ClusterMax   2X  X5650,  96  GB   12X  SATA  500  GB   Mellanox  10  GbE  adapter  
  • 36. Tying  it  together:  ElasSc  Hadoop   Coke   Pepsi    Hadoop    Hadoop    Hadoop    Hadoop    Queue   Virtual   Virtual   Virtual   Virtual   RunSme     Layer   Data  Layer   Data   Data   Data   Container   Container   Container   Distributed  File  System  (HDFS,  KFS,  MAPR,  Isilon,…)   Host   Host   Host   Host   Host   Host  
  • 37. Resource  Shiwing  using  VirtualizaSon   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Virtualiza?on  PlaQorm   Host   Host   Host   HDFS   HDFS   HDFS   While  exisSng  apps  run  during  the  day  to  support  business   operaSons,  Hadoop  batch  jobs  kicks  off  at  night  to  conduct  deep   analysis  of  data.  
  • 38. The  cluster  is  the  machine   HP vCenter   HP 1 2 ProLiant 1 2 ProLiant OVER DL380G6 OVER DL380G6 1 2 TEMP 1 5 1 2 TEMP 1 5 POWER POWER POWER POWER SUPPLY SUPPLY INTER PL A Y ER SUPPLY SUPPLY INTER PL A Y ER LOCK LOCK POWER CAP POWER CAP DIMMS DIMMS 1A 3G 5E 7C 9i 9i 7C 5E 3G 1A 1A 3G 5E 7C 9i 9i 7C 5E 3G 1A 2 6 2 6 2D 4B 6H 8F 8F 6H 4B 2D 2D 4B 6H 8F 8F 6H 4B 2D ONLINE ONLINE 1 SPARE 2 1 SPARE 2 PROC PROC PROC PROC MIRROR MIRROR FANS FANS 3 7 3 7 1 2 3 4 5 6 1 2 3 4 5 6 4 8 4 8 Imbalanced   Balanced   Cluster   Cluster   POWER SUPPLY 1 POWER CAP POWER SUPPLY 1 2 2 OVER TEMP INTER LOCK 1 5 PL A Y ER HP ProLiant DL380G6 Heavy  Load   POWER SUPPLY POWER CAP 1 1A 3G 5E 7C 9i POWER SUPPLY 1 2 2 OVER TEMP INTER LOCK DIMMS 9i 7C 5E 3G 1A 1 5 PL A Y ER HP ProLiant DL380G6 DIMMS 1A 3G 5E 7C 9i 9i 7C 5E 3G 1A 2 6 2 6 2D 4B 6H 8F 8F 6H 4B 2D ONLINE 1 SPARE 2 2D 4B 6H 8F 8F 6H 4B 2D ONLINE PROC PROC 1 2 MIRROR SPARE FANS PROC PROC 3 7 MIRROR 1 2 3 4 5 6 FANS 3 7 1 2 3 4 5 6 4 8 4 8 Lighter  Load  
  • 39. SAN,  NAS  or  Local  Storage?   •  Shared  Storage:  SAN  or  NAS   •  Hybrid  Storage   –  Easy  to  provision   –  SAN/NAS  for  boot  images,   –  Automated  cluster   VMs,  other  workloads   rebalancing   –  Local  disk  for  Hadoop  &  HDFS   –  Scalable  Bandwidth,  Lower   Cost/GB   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Other  VM   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Hadoop   Host   Host   Host   Host   Host   Host