SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Architects)view)of)Hadoop)I/O)

      I/O)analysis)using)vProbes)
                   )
             Richard)McDougall)
                    V1.0))
                 April)2012)
Architect’s)QuesFons)
•  Does)Hadoop)really)need)compute)+)data)
   local)
•  How)much)and)what)I/O)rates)of)ephemeral)
   data)do)we)need)to)design)for?)
•  What)I/O)paKerns)do)we)need)to)support)
   HDFS?)
•  What)is)the)I/O)paKern)of)MNR)tasks)
•  Are)there)opportuniFes)for)caching)–)map)
   input,)output)or)ephemeral?)
Controlled)Small)Study)
•    Focus)on)developing)tooling)
•    Using)vProbes)+)Perl)+)R)
•    Hadoop)0.20.204)
•    Terasort)@)1GB)
•    One)Namenode,)Tasktracker,)Datanode)
Terasort)
                        Map)Task)

                        Map)Task)
                                                              Reduce)          Output)File)
Input)File)                                  Shuffle)           (Sort))
                        Map)Task)

                        Map)Task)


              Input)
              Splits)     Sort)Chunk)of)   Shuffle)output)
                                                           Combine)and)Sort)
              (x16))      Of)KeyNValues)   To)Reducers)
Log)of)the)sort)‘Job’)
$ log.pl job_201201261301_0005_1327649126255_rmc_TeraSort !
            Item   Time Jobname            Taskname Phase Start-Time End-Time Elapsed         !
             Job   0.000 201201261301_0005 !
             Job         201201261301_0005 !
             Job   0.475 201201261301_0005 PREP !
            Task   1.932 201201261301_0005 m_000017 SETUP !
      MapAttempt   3.066 201201261301_0005 m_000017 SETUP !
      MapAttempt  10.409 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.409 8.477 "setup"!
            Task  10.966 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.966 9.034 !
             Job         201201261301_0005 RUNNING !
            Task  10.970 201201261301_0005 m_000000 MAP !
            Task  10.972 201201261301_0005 m_000001 MAP !
      MapAttempt  10.981 201201261301_0005 m_000000 MAP !
      MapAttempt  65.819 201201261301_0005 m_000000 MAP SUCCESS 10.970 65.819 54.849 ""!
            Task  68.063 201201261301_0005 m_000000 MAP SUCCESS 10.970 68.063 57.093 !
      MapAttempt  10.998 201201261301_0005 m_000001 MAP !
      MapAttempt  65.363 201201261301_0005 m_000001 MAP SUCCESS 10.972 65.363 54.391 ""!
            Task  68.065 201201261301_0005 m_000001 MAP SUCCESS 10.972 68.065 57.093 !
            Task  68.066 201201261301_0005 m_000002 MAP !
            Task  68.067 201201261301_0005 m_000003 MAP !
            Task  68.068 201201261301_0005 r_000000 REDUCE !
      MapAttempt  68.075 201201261301_0005 m_000002 MAP !
      MapAttempt 139.789 201201261301_0005 m_000002 MAP SUCCESS 68.066 139.789 71.723 ""!
            Task 140.193 201201261301_0005 m_000002 MAP SUCCESS 68.066 140.193 72.127 !
      MapAttempt  68.076 201201261301_0005 m_000003 MAP !
      MapAttempt 139.927 201201261301_0005 m_000003 MAP SUCCESS 68.067 139.927 71.860 ""!
            Task 140.198 201201261301_0005 m_000003 MAP SUCCESS 68.067 140.198 72.131 !
…!
   ReduceAttempt  68.112 201201261301_0005 r_000000 REDUCE !
   ReduceAttempt 795.299 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 795.299 727.231 "reduce > reduce"!
            Task 798.223 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 798.223 730.155 !
            Task 798.226 201201261301_0005 m_000016 CLEANUP !
      MapAttempt 798.241 201201261301_0005 m_000016 CLEANUP !
      MapAttempt 806.113 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 806.113 7.887 "cleanup"!
            Task 807.252 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 807.252 9.026 !
             Job 807.253 201201261301_0005 SUCCESS 0.000 807.253 807.253 !
Terasort:)Map)and)Reduce)Phases)
    Setup)Map)   Elapsed)Time)N)Seconds)




                 Mappers)




                                            Reducer)


                                           Cleanup)Map)
Terasort:)Map)and)Reduce)Phases)
    Setup)Map)     Elapsed)Time)N)Seconds)
                 Zoom)in)
                   on)
                 Map)Task)
                   I/O)

                   Mappers)




                                         Zoom)in)
                                           on)
                                         Reduce)
                                         Task)I/O)
                                                     Reducer)

                                              Cleanup)Map)
VMware)vProbes)
•    Dynamic)
     InstrumentaFon)


•    Probe)mulFple)
     VMs)


•    Probe)
     VirtualizaFon)
     Layer)


•    VMware)Fusion)
     and)WorkstaFon)
vProbes)

GUEST:ENTER:system_call {!
    string path;!
    comm = curprocname();!
    tid = curtid();!
    pid = curpid();!
    ppid = curppid();!
    syscall_num = sysnum;!
!
    if(syscall_num == NR_open) {!
     !path = guestloadstr(sys_arg0);!
       syscall_name = "open";!
       sprintf(syscall_args, ""%s", %x, %x", path, sys_arg1, sys_arg2); !
    …!
}!
!
GUEST:OFFSET:ret_from_sys_call:0 {!
     !printf("%s/%d/%d/%d %s(%s) = %d <0>n", comm, pid, rtid, ppid, syscall_name,!
                                               syscall_args, getgpr(REG_RAX));                        !
}!
!
!
java/14774/15467/1 open("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 0, 1b6) = 144 <0>!
java/14774/15467/1 stat("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 7f0b80a4e590) = 0
<0>!
java/14774/15467/1 read(144, 7f0b80a4c470, 4096) = 167 <0>!
!
Pathname)ResoluFon)
filetracevp.pl: !
!
if ($syscall =~ m/open/) {!
                $path1 = $line;!
                $path1 =~ s/[A-z/0-9]+[ ]+[a-z]+("([^"]+)".*n/1/;!
                $fd1 = $line;!
                if ($fd1 =~ s/.* ([0-9]+) <.*>n/1/) {!
                        $fds{$pid,$fd1} = $path1;!
!
if ($syscall =~ m/write/) {!
                $params = $line;!
                if ($params =~ s/^[A-z/0-9]+[ ]+[a-z]+(([0-9]+),.* ([0-9]+)) = ([0-9]+) <(.*)>n/1,2,3,4/) {!
                        ($fd1, $size, $bytes, $lat) = split(',', $params);!
                        $path1 = $fds{$pid, $fd1};!
…!
!
!
java,14774,15467,,open,0,0,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!
java,14774,15467,,stat,0,0,0,0,0,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!
java,14774,15467,,read,4096,167,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!
!
!
!
!
Controlled)SmallNScale)Study)
  $ hadoop jar hadoop-examples-0.20.204.0.jar teragen 10000000 teradata!
  <begin trace>!
  $ hadoop jar hadoop-examples-0.20.204.0.jar terasort teradata teraout!
  !
Job Counters !                                        Hadoop)Distro)                                                            236)
      Launched reduce tasks=1!                        Hadoop)Logs)                                                              132)
      SLOTS_MILLIS_MAPS=1146887!
                                                      Hadoop)clienKmp)unjar)                                                      1)
      Launched map tasks=16!
      Data-local map tasks=16!                        Mappers)files)jobcache)N)spills)                                          1753)
      SLOTS_MILLIS_REDUCES=766823!                    Mappers)files)jobcache)N)output)                                          1777)
    File Input Format Counters !                      Reducer)Intermediate)                                                     764)
      Bytes Read=1000057358!
                                                      Reducers)Shuffle)and)Intermediate)                                         1744)
    File Output Format Counters !
      Bytes Written=1000000000!                       Jobcache)class)files)and)shell)scripts)                                      1)
    FileSystemCounters!                               Hadoop)Datanode)                                                         1690)
      FILE_BYTES_READ=2382257412!                     JVM)N)/usr/lib/jvm…)                                                       98)
      HDFS_BYTES_READ=1000059070!
                                                                                        Total&MB&                              7987&
      FILE_BYTES_WRITTEN=3402627838!
      HDFS_BYTES_WRITTEN=1000000000!
                                                                 JVM)N)/usr/lib/jvm…)
    Map-Reduce Framework!
      Map output materialized bytes=1020000096!                    Hadoop)Datanode)
      Map input records=10000000!
                                                 Jobcache)class)files)and)shell)scripts)
      Reduce shuffle bytes=1020000096!
      Spilled Records=33355441!                     Reducers)files)jobcache)N)output)
      Map output bytes=1000000000!                        Reducer)intermediate)file)
      Map input bytes=1000000000!
      Combine input records=0!                  Mappers)files)jobcache)N)map)output)
      SPLIT_RAW_BYTES=1712!                           Mappers)files)jobcache)N)spills)
      Reduce input records=10000000!
      Reduce input groups=10000000!                         Hadoop)clienKmp)unjar)
      Combine output records=0!                                         Hadoop)Logs)
      Reduce output records=10000000!
                                                                      Hadoop)Distro)
      Map output records=10000000!
                                                                                   0) 200) 400) 600) 800)1000)1200)1400)1600)1800)2000)
Hadoop)I/O)Model)
                         (With)some)data)from)early)observaFons))


                    Map)Task)
                                                    Reduce)
                    Map)Task)
Job)                                    Map)        Reduce)               Sort)
                    Map)Task)           Output)
                                        file.out*
                                                                    Spills)
                    Map)Task)

  DFS)
                    Spills)
                    &)Logs)
                                  )         Shuffle)
                                            Map_*.out*
  Input)
  Data)
                    spill*.out*   75%)of)             Combine)                        DFS)
                                                      Intermediate.out*               Output)
       )                          Disk)Bandwidth)                                 )   Data)
       12%)of)                                                                    12%)of)
       Bandwidth)                                                                 Bandwidth)
                                      HDFS)


12)
One)Mapper)Task:)Temp)Data)
path                                                                                                                                        bytes
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/file.out       67586124
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill1.out     52762519
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill0.out     52508540
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill2.out     29698564
/usr/lib/jvm/javaD6Dopenjdk/jre/lib/rt.jar                                                                                                     5057763
/home/rmc/untars/hadoopD0.20.204.0/hadoopDcoreD0.20.204.0.jar                                                                                   895582
/home/rmc/untars/hadoopD0.20.204.0/lib/log4jD1.2.15.jar                                                                                           82522
/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDlangD2.4.jar                                                                                       70477
/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDconfigurationD1.6.jar                                                                              61007
/usr/lib/x86_64DlinuxDgnu/gconv/gconvDmodules                                                                                                     51772
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/job.xml                                                        44420
/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDcollectionsD3.2.1.jar                                                                              29974
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/job.xml                   21695
/usr/lib/jvm/javaD6Dopenjdk/jre/lib/amd64/libnio.so                                                                                               15946
/home/rmc/untars/hadoopD0.20.204.0/conf/coreDsite.xml                                                                                             11024
/usr/lib/jvm/javaD6Dopenjdk/jre/lib/security/java.security                                                                                        10081
/proc/self/maps                                                                                                                                    7523
Number of I/Os




                                                       0
                                                                              10000
                                                                                                 20000
                                                                                                                    30000
                                                                                                                                                        40000
                                                                                                                                                                         50000
                                                                                                                                                                                            60000
                                                1
                                                2
                                                4
                                                8
                                               16
                                               32
                                               64
                                             128
                                             256
                                             512
                                             1024




                           I/O Size Bucket
                                             2048
                                             4096
                                             8192
                                       16384
                                       32768
                                       65536
                               131072
                                                                                                                                                                                                                       I/O)measured)at)syscall)




                                                                                                                                                                                 Number of I/Os
                                                                   Number of I/Os
                                                                                                                                                    0
                                                                                                                                                                5000
                                                                                                                                                                       10000
                                                                                                                                                                                    15000
                                                                                                                                                                                               20000
                                                                                                                                                                                                       25000
                                                                                                                                                                                                               30000




                                   0
                                                5000
                                                           10000
                                                                      15000
                                                                                      20000
                                                                                              25000
                                                                                                         30000




                                                                                                                                                1
                           1                                                                                                                    2
                           2
                                                                                                                                                4
                           4
                                                                                                                                                8
                           8
                                                                                                                                               16
                          16
                                                                                                                                               32
                          32
                          64                                                                                                                   64

                        128                                                                                                                  128
                        256                                                                                                                  256
                        512                                                                                                                  512
                        1024                                                                                                                 1024
Write I/O Size Bucket




                        2048
                                                                                                                      Read I/O Size Bucket




                                                                                                                                             2048
                        4096
                                                                                                                                             4096
                        8192
                                                                                                                                             8192
                 16384
                                                                                                                                      16384
                                                                                                                                                                                                                                                  One)Mapper)Task:)Temp)I/O)Counts)




                 32768
                                                                                                                                      32768
                 65536
     131072                                                                                                                           65536
                                                                                                                           131072
One)Mapper)Task:)Tmp)Bytes)Transferred)

2.5e+08
                                                                                                                           6e+07




2.0e+08                                                                                                                    5e+07



                                                                                                                           4e+07
1.5e+08




                                                                                                                        Bytes
Bytes




                                                                                                                           3e+07

1.0e+08

                                                                                                                           2e+07


5.0e+07
                                                                                                                           1e+07



0.0e+00                                                                                                                    0e+00

                                                                                                                                           1
                                                                                                                                           2
                                                                                                                                           4
                                                                                                                                           8
                                                                                                                                          16
                                                                                                                                          32
                                                                                                                                          64
                                                                                                                                         128
                                                                                                                                         256
                                                                                                                                         512
                                                                                                                                        1024
                                                                                                                                        2048
                                                                                                                                        4096
                                                                                                                                        8192
                                                                                                                                       16384
                                                                                                                                       32768
                                                                                                                                       65536
                                                                                                                                      131072
                                                                                                                                      262144
                                                                                                                                      524288
                                                                                                                                     1048576
                                                                                                                                     2097152
                                                                                                                                     4194304
                                                                                                                                     8388608
                                                                                                                                    16777216
                                                                                                                                    33554432
                                                                                                                                    67108864
                                                                                                                                   134217728
          1
              2
                  4
                      8
                          16
                               32
                                    64
                                         128
                                               256
                                                     512
                                                           1024
                                                                  2048
                                                                         4096
                                                                                8192
                                                                                       16384
                                                                                               32768
                                                                                                       65536
                                                                                                               131072




                                         I/O Size Bucket                                                                                          I/O Size Bucket


              I/O)measured)at)syscall)                                                                                             Logical)I/O)(sequenFal)grouping)of)syscalls))
Reducer)Task:)Temp)Data)
Number of I/Os




                                                                0e+00
                                                                                         1e+05
                                                                                                             2e+05
                                                                                                                                                       3e+05
                                                                                                                                                                                    4e+05
                                                        1
                                                        2
                                                        4
                                                        8
                                                       16
                                                       32
                                                       64
                                                     128
                                                     256
                                                     512
                                                     1024




                                   I/O Size Bucket
                                                     2048
                                                     4096
                                                     8192
                                               16384
                                               32768
                                               65536
                                      131072
                                                                                                                                                                                                                   I/O)measured)at)syscall)




                                                                        Number of I/Os                                                                                  Number of I/Os
                                                                                                                                                   0
                                                                                                                                                       50000
                                                                                                                                                               100000
                                                                                                                                                                           150000
                                                                                                                                                                                            200000
                                                                                                                                                                                                     250000
                                                                                                                                                                                                              300000




                               0
                                                        20000
                                                                        40000
                                                                                     60000
                                                                                                 80000




                           1                                                                                                                   1

                           2                                                                                                                   2

                           4                                                                                                                   4

                           8                                                                                                                   8

                          16                                                                                                                  16
                                                                                                                                              32
                          32
                                                                                                                                              64
                          64
                                                                                                                                            128
                        128
                                                                                                                                            256
                        256
                                                                                                                                            512
                        512
                                                                                                                                            1024
                        1024
                                                                                                                                                                                                                                              Reducer)Task:)Temp)I/O)Counts)




                                                                                                                     Read I/O Size Bucket




                                                                                                                                            2048
Write I/O Size Bucket




                        2048
                                                                                                                                            4096
                        4096
                                                                                                                                            8192
                        8192
                                                                                                                                     16384
                 16384
                                                                                                                                     32768
                 32768
                                                                                                                                     65536
                 65536
                                                                                                                          131072
     131072
Reducer)Task:)Tmp)Bytes)Transferred)

1.5e+09                                                                                                                    5e+08




                                                                                                                           4e+08


1.0e+09
Bytes




                                                                                                                           3e+08




                                                                                                                        Bytes
                                                                                                                           2e+08
5.0e+08


                                                                                                                           1e+08



0.0e+00                                                                                                                    0e+00
          1
              2
                  4
                      8
                          16
                               32
                                    64
                                         128
                                               256
                                                     512
                                                           1024
                                                                  2048
                                                                         4096
                                                                                8192
                                                                                       16384
                                                                                               32768
                                                                                                       65536
                                                                                                               131072




                                                                                                                                         1
                                                                                                                                         2
                                                                                                                                         4
                                                                                                                                         8
                                                                                                                                        16
                                                                                                                                        32
                                                                                                                                        64
                                                                                                                                       128
                                                                                                                                       256
                                                                                                                                       512
                                                                                                                                      1024
                                                                                                                                      2048
                                                                                                                                      4096
                                                                                                                                      8192
                                                                                                                                     16384
                                                                                                                                     32768
                                                                                                                                     65536
                                                                                                                                    131072
                                                                                                                                    262144
                                                                                                                                    524288
                                                                                                                                   1048576
                                                                                                                                   2097152
                                                                                                                                   4194304
                                                                                                                                   8388608
                                         I/O Size Bucket                                                                                     I/O Size Bucket


              I/O)measured)at)syscall)                                                                                             Logical)I/O)(sequenFal)grouping)of)syscalls))
Datanode)–)Bytes)Transferred)
           5e+08                                                                                                                 7e+08



                                                                                                                                 6e+08



                                                                                                                                 5e+08
1.5e+09
                                                                                                                                 4e+08




                                                                                                                              Bytes
           4e+08
                                                                                                                                 3e+08



                                                                                                                                 2e+08



                                                                                                                                 1e+08
1.0e+09
Bytes




           3e+08
        Bytes




                                                                                                                                 0e+00




                                                                                                                                         1
                                                                                                                                             2
                                                                                                                                                 4
                                                                                                                                                     8
                                                                                                                                                         16
                                                                                                                                                              32
                                                                                                                                                                   64
                                                                                                                                                                        128
                                                                                                                                                                              256
                                                                                                                                                                                    512
                                                                                                                                                                                           1024
                                                                                                                                                                                                   2048
                                                                                                                                                                                                           4096
                                                                                                                                                                                                                   8192
                                                                                                                                                                                                                           16384
                                                                                                                                                                                                                                    32768
                                                                                                                                                                                                                                             65536
                                                                                                                                                                                                                                                       131072
                                                                                                                                                                   Read I/O Size Bucket

                                                                                                                                 1e+09




5.0e+08                                                                                                                          8e+08

           2e+08
                                                                                                                                 6e+08




                                                                                                                              Bytes
                                                                                                                                 4e+08



      1e+08
0.0e+00
                                                                                                                                 2e+08
                1
                    2
                        4
                            8
                                16
                                     32
                                          64
                                               128
                                                     256
                                                           512
                                                                 1024
                                                                        2048
                                                                               4096
                                                                                      8192
                                                                                             16384
                                                                                                     32768
                                                                                                             65536
                                                                                                                     131072




                                                                                                                                 0e+00
                                               I/O Size Bucket
                                                                                                                                         1
                                                                                                                                             2
                                                                                                                                                 4
                                                                                                                                                     8
                                                                                                                                                         16
                                                                                                                                                              32
                                                                                                                                                                   64
                                                                                                                                                                        128
                                                                                                                                                                              256
                                                                                                                                                                                    512
                                                                                                                                                                                          1024
                                                                                                                                                                                                  2048
                                                                                                                                                                                                          4096
                                                                                                                                                                                                                  8192
                                                                                                                                                                                                                          16384
                                                                                                                                                                                                                                   32768
                                                                                                                                                                                                                                            65536
                                                                                                                                                                                                                                                     131072
                                                                                                                                                                   Write I/O Size Bucket

           0e+00
Datanode)–)Actual)vs.)Logical)I/O)Size)
           5e+08

                                                                                                                                  5e+08

1.5e+09


           4e+08                                                                                                                  4e+08




1.0e+09                                                                                                                           3e+08




                                                                                                                               Bytes
Bytes




           3e+08
        Bytes




                                                                                                                                  2e+08

5.0e+08

                                                                                                                                  1e+08

           2e+08
0.0e+00                                                                                                                           0e+00           1
                                                                                                                                                  2
                                                                                                                                                  4
                                                                                                                                                  8
                                                                                                                                                 16
                                                                                                                                                 32
                                                                                                                                                 64
                                                                                                                                                128
                                                                                                                                                256
                                                                                                                                                512
                                                                                                                                               1024
                                                                                                                                               2048
                                                                                                                                               4096
                                                                                                                                               8192
                                                                                                                                              16384
                                                                                                                                              32768
                                                                                                                                              65536
                                                                                                                                             131072
                                                                                                                                             262144
                                                                                                                                             524288
                                                                                                                                            1048576
                                                                                                                                            2097152
                                                                                                                                            4194304
                                                                                                                                            8388608
                                                                                                                                           16777216
                                                                                                                                           33554432
                                                                                                                                           67108864
                                                                                                                                          134217728
                 1
                     2
                         4
                             8
                                 16
                                      32
                                           64
                                                128
                                                      256
                                                            512
                                                                  1024
                                                                         2048
                                                                                4096
                                                                                       8192
                                                                                              16384
                                                                                                      32768
                                                                                                              65536
                                                                                                                      131072




                                                I/O Size Bucket                                                                                         I/O Size Bucket

           1e+08
                     I/O)measured)at)syscall)                                                                                             Logical)I/O)(sequenFal)grouping)of)syscalls))


           0e+00
Number of I/Os




                                                               0
                                                                                    5000
                                                                                                               10000
                                                                                                                                                        15000
                                                                                                                                                                           20000
                                                                                                                                                                                                       25000
                                                                                                                               Bytes




0e+00
                                                                   1e+08
                                                                                                       2e+08
                                                                                                                                                            3e+08
                                                                                                                                                                                      4e+08
                                                                                                                                                                                                                    5e+08
                                                          1
                                                          2
                                                          4
                                                          8
                                                          16
                                                          32
                                                          64
                                                      128
                                                      256
                                                      512
                                                      1024




                                I/O Size Bucket
                                                      2048
                                                      4096
                                                      8192
                                            16384
                                            32768
                                            65536
                                   131072


                                                                              Number of I/Os                                                                                          Number of I/Os




                                                      0
                                                                       5000
                                                                                               10000
                                                                                                                       15000
                                                                                                                                                               0
                                                                                                                                                                    2000
                                                                                                                                                                               4000
                                                                                                                                                                                              6000
                                                                                                                                                                                                          8000
                                                                                                                                                                                                                 10000




                                                  1                                                                                                       1
                                                  2                                                                                                       2
                                                  4                                                                                                       4
                                                  8                                                                                                       8
                                                                                                                                                                                                                            Datanode)–)IOPS)




                                          16                                                                                                             16
                                          32                                                                                                             32
                                          64                                                                                                             64
                                      128                                                                                                              128
                                      256                                                                                                              256
                                      512                                                                                                              512
                                 1024                                                                                                                  1024
        Write I/O Size Bucket
                                                                                                                                Read I/O Size Bucket




                                 2048                                                                                                                  2048
                                 4096                                                                                                                  4096
                                 8192                                                                                                                  8192
                         16384                                                                                                                  16384
                         32768                                                                                                                  32768
                         65536                                                                                                                  65536
             131072                                                                                                                  131072
Back)of)the)Envelope)Modeling)))
•  How)much)bandwidth)does)terasort)need?)
    –  10)seconds)of)CPU/core)Fme)per)task)
    –  128MB)of)HDFS)per)task)
    –  ~3x,)384MB)of)temporary)data)per)task)


I/O&Component& Per7task& Per7task&Bandwidth&   Per7host&(24&
                                               cores)&
HDFS)I/O)      128MB)    ~13MBytes/s)          312Mbytes/sec)
Temp)          384MB)    ~38Mbytes/sec)        912Mbytes/sec)
Do)we)need)locality?)
•  Main)issue)is)crossNsecFonal)bandwidth)
    –  Secondary)issue)is)perNhost)link)speed)
    –  Just)look)at)storage)I/O)now,)consider)shuffle)next)
 I/O&          Per7host&(24&    Network&          Rack&
 Component&    cores)&          Bandwidth&&       Bandwidth&
                                w/&0%&locality&   w/40&hosts&
 HDFS)I/O)     312Mbytes/sec)   2.5Gbits)         100gbits)
 Temp)         912Mbytes/sec)   7.3Gbits)         300gbits)

•  Possible)Conclusion)
   –  Must)have)locality)w/1Gbit)host)link)
   –  Feasible)to)have)remote)data)w/10Gbit)and)keeping)
      temp)local)only)

Weitere ähnliche Inhalte

Was ist angesagt?

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 

Was ist angesagt? (20)

Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Learning by analogy
Learning by analogyLearning by analogy
Learning by analogy
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Data cleaning and visualization
Data cleaning and visualizationData cleaning and visualization
Data cleaning and visualization
 
web technologies
web technologiesweb technologies
web technologies
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Basics of machine learning
Basics of machine learningBasics of machine learning
Basics of machine learning
 
Apache Mahout Architecture Overview
Apache Mahout Architecture OverviewApache Mahout Architecture Overview
Apache Mahout Architecture Overview
 
Classification
ClassificationClassification
Classification
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Case based reasoning
Case based reasoningCase based reasoning
Case based reasoning
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 

Andere mochten auch

Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Richard McDougall
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In Detail
Cloudera, Inc.
 

Andere mochten auch (20)

451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
Chapter 8 big data and privacy
Chapter 8 big data and privacyChapter 8 big data and privacy
Chapter 8 big data and privacy
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In Detail
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
 
Making of the Burner Board
Making of the Burner BoardMaking of the Burner Board
Making of the Burner Board
 
Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Spark tuning2016may11bida
Spark tuning2016may11bidaSpark tuning2016may11bida
Spark tuning2016may11bida
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introduction
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 

Ähnlich wie Hadoop I/O Analysis

CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogramming
dudarev
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
Radek Maciaszek
 
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
elevenma
 

Ähnlich wie Hadoop I/O Analysis (20)

Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
 
Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2
 
R meets Hadoop
R meets HadoopR meets Hadoop
R meets Hadoop
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogramming
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Lecture12
Lecture12Lecture12
Lecture12
 
Crunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesCrunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-cases
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Hadoop I/O Analysis

  • 1. Architects)view)of)Hadoop)I/O) I/O)analysis)using)vProbes) ) Richard)McDougall) V1.0)) April)2012)
  • 2. Architect’s)QuesFons) •  Does)Hadoop)really)need)compute)+)data) local) •  How)much)and)what)I/O)rates)of)ephemeral) data)do)we)need)to)design)for?) •  What)I/O)paKerns)do)we)need)to)support) HDFS?) •  What)is)the)I/O)paKern)of)MNR)tasks) •  Are)there)opportuniFes)for)caching)–)map) input,)output)or)ephemeral?)
  • 3. Controlled)Small)Study) •  Focus)on)developing)tooling) •  Using)vProbes)+)Perl)+)R) •  Hadoop)0.20.204) •  Terasort)@)1GB) •  One)Namenode,)Tasktracker,)Datanode)
  • 4. Terasort) Map)Task) Map)Task) Reduce) Output)File) Input)File) Shuffle) (Sort)) Map)Task) Map)Task) Input) Splits) Sort)Chunk)of) Shuffle)output) Combine)and)Sort) (x16)) Of)KeyNValues) To)Reducers)
  • 5. Log)of)the)sort)‘Job’) $ log.pl job_201201261301_0005_1327649126255_rmc_TeraSort ! Item Time Jobname Taskname Phase Start-Time End-Time Elapsed ! Job 0.000 201201261301_0005 ! Job 201201261301_0005 ! Job 0.475 201201261301_0005 PREP ! Task 1.932 201201261301_0005 m_000017 SETUP ! MapAttempt 3.066 201201261301_0005 m_000017 SETUP ! MapAttempt 10.409 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.409 8.477 "setup"! Task 10.966 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.966 9.034 ! Job 201201261301_0005 RUNNING ! Task 10.970 201201261301_0005 m_000000 MAP ! Task 10.972 201201261301_0005 m_000001 MAP ! MapAttempt 10.981 201201261301_0005 m_000000 MAP ! MapAttempt 65.819 201201261301_0005 m_000000 MAP SUCCESS 10.970 65.819 54.849 ""! Task 68.063 201201261301_0005 m_000000 MAP SUCCESS 10.970 68.063 57.093 ! MapAttempt 10.998 201201261301_0005 m_000001 MAP ! MapAttempt 65.363 201201261301_0005 m_000001 MAP SUCCESS 10.972 65.363 54.391 ""! Task 68.065 201201261301_0005 m_000001 MAP SUCCESS 10.972 68.065 57.093 ! Task 68.066 201201261301_0005 m_000002 MAP ! Task 68.067 201201261301_0005 m_000003 MAP ! Task 68.068 201201261301_0005 r_000000 REDUCE ! MapAttempt 68.075 201201261301_0005 m_000002 MAP ! MapAttempt 139.789 201201261301_0005 m_000002 MAP SUCCESS 68.066 139.789 71.723 ""! Task 140.193 201201261301_0005 m_000002 MAP SUCCESS 68.066 140.193 72.127 ! MapAttempt 68.076 201201261301_0005 m_000003 MAP ! MapAttempt 139.927 201201261301_0005 m_000003 MAP SUCCESS 68.067 139.927 71.860 ""! Task 140.198 201201261301_0005 m_000003 MAP SUCCESS 68.067 140.198 72.131 ! …! ReduceAttempt 68.112 201201261301_0005 r_000000 REDUCE ! ReduceAttempt 795.299 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 795.299 727.231 "reduce > reduce"! Task 798.223 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 798.223 730.155 ! Task 798.226 201201261301_0005 m_000016 CLEANUP ! MapAttempt 798.241 201201261301_0005 m_000016 CLEANUP ! MapAttempt 806.113 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 806.113 7.887 "cleanup"! Task 807.252 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 807.252 9.026 ! Job 807.253 201201261301_0005 SUCCESS 0.000 807.253 807.253 !
  • 6. Terasort:)Map)and)Reduce)Phases) Setup)Map) Elapsed)Time)N)Seconds) Mappers) Reducer) Cleanup)Map)
  • 7. Terasort:)Map)and)Reduce)Phases) Setup)Map) Elapsed)Time)N)Seconds) Zoom)in) on) Map)Task) I/O) Mappers) Zoom)in) on) Reduce) Task)I/O) Reducer) Cleanup)Map)
  • 8. VMware)vProbes) •  Dynamic) InstrumentaFon) •  Probe)mulFple) VMs) •  Probe) VirtualizaFon) Layer) •  VMware)Fusion) and)WorkstaFon)
  • 9. vProbes) GUEST:ENTER:system_call {! string path;! comm = curprocname();! tid = curtid();! pid = curpid();! ppid = curppid();! syscall_num = sysnum;! ! if(syscall_num == NR_open) {! !path = guestloadstr(sys_arg0);! syscall_name = "open";! sprintf(syscall_args, ""%s", %x, %x", path, sys_arg1, sys_arg2); ! …! }! ! GUEST:OFFSET:ret_from_sys_call:0 {! !printf("%s/%d/%d/%d %s(%s) = %d <0>n", comm, pid, rtid, ppid, syscall_name,! syscall_args, getgpr(REG_RAX)); ! }! ! ! java/14774/15467/1 open("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 0, 1b6) = 144 <0>! java/14774/15467/1 stat("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 7f0b80a4e590) = 0 <0>! java/14774/15467/1 read(144, 7f0b80a4c470, 4096) = 167 <0>! !
  • 10. Pathname)ResoluFon) filetracevp.pl: ! ! if ($syscall =~ m/open/) {! $path1 = $line;! $path1 =~ s/[A-z/0-9]+[ ]+[a-z]+("([^"]+)".*n/1/;! $fd1 = $line;! if ($fd1 =~ s/.* ([0-9]+) <.*>n/1/) {! $fds{$pid,$fd1} = $path1;! ! if ($syscall =~ m/write/) {! $params = $line;! if ($params =~ s/^[A-z/0-9]+[ ]+[a-z]+(([0-9]+),.* ([0-9]+)) = ([0-9]+) <(.*)>n/1,2,3,4/) {! ($fd1, $size, $bytes, $lat) = split(',', $params);! $path1 = $fds{$pid, $fd1};! …! ! ! java,14774,15467,,open,0,0,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,! java,14774,15467,,stat,0,0,0,0,0,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,! java,14774,15467,,read,4096,167,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,! ! ! ! !
  • 11. Controlled)SmallNScale)Study) $ hadoop jar hadoop-examples-0.20.204.0.jar teragen 10000000 teradata! <begin trace>! $ hadoop jar hadoop-examples-0.20.204.0.jar terasort teradata teraout! ! Job Counters ! Hadoop)Distro) 236) Launched reduce tasks=1! Hadoop)Logs) 132) SLOTS_MILLIS_MAPS=1146887! Hadoop)clienKmp)unjar) 1) Launched map tasks=16! Data-local map tasks=16! Mappers)files)jobcache)N)spills) 1753) SLOTS_MILLIS_REDUCES=766823! Mappers)files)jobcache)N)output) 1777) File Input Format Counters ! Reducer)Intermediate) 764) Bytes Read=1000057358! Reducers)Shuffle)and)Intermediate) 1744) File Output Format Counters ! Bytes Written=1000000000! Jobcache)class)files)and)shell)scripts) 1) FileSystemCounters! Hadoop)Datanode) 1690) FILE_BYTES_READ=2382257412! JVM)N)/usr/lib/jvm…) 98) HDFS_BYTES_READ=1000059070! Total&MB& 7987& FILE_BYTES_WRITTEN=3402627838! HDFS_BYTES_WRITTEN=1000000000! JVM)N)/usr/lib/jvm…) Map-Reduce Framework! Map output materialized bytes=1020000096! Hadoop)Datanode) Map input records=10000000! Jobcache)class)files)and)shell)scripts) Reduce shuffle bytes=1020000096! Spilled Records=33355441! Reducers)files)jobcache)N)output) Map output bytes=1000000000! Reducer)intermediate)file) Map input bytes=1000000000! Combine input records=0! Mappers)files)jobcache)N)map)output) SPLIT_RAW_BYTES=1712! Mappers)files)jobcache)N)spills) Reduce input records=10000000! Reduce input groups=10000000! Hadoop)clienKmp)unjar) Combine output records=0! Hadoop)Logs) Reduce output records=10000000! Hadoop)Distro) Map output records=10000000! 0) 200) 400) 600) 800)1000)1200)1400)1600)1800)2000)
  • 12. Hadoop)I/O)Model) (With)some)data)from)early)observaFons)) Map)Task) Reduce) Map)Task) Job) Map) Reduce) Sort) Map)Task) Output) file.out* Spills) Map)Task) DFS) Spills) &)Logs) ) Shuffle) Map_*.out* Input) Data) spill*.out* 75%)of) Combine) DFS) Intermediate.out* Output) ) Disk)Bandwidth) ) Data) 12%)of) 12%)of) Bandwidth) Bandwidth) HDFS) 12)
  • 13. One)Mapper)Task:)Temp)Data) path bytes /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/file.out 67586124 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill1.out 52762519 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill0.out 52508540 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill2.out 29698564 /usr/lib/jvm/javaD6Dopenjdk/jre/lib/rt.jar 5057763 /home/rmc/untars/hadoopD0.20.204.0/hadoopDcoreD0.20.204.0.jar 895582 /home/rmc/untars/hadoopD0.20.204.0/lib/log4jD1.2.15.jar 82522 /home/rmc/untars/hadoopD0.20.204.0/lib/commonsDlangD2.4.jar 70477 /home/rmc/untars/hadoopD0.20.204.0/lib/commonsDconfigurationD1.6.jar 61007 /usr/lib/x86_64DlinuxDgnu/gconv/gconvDmodules 51772 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/job.xml 44420 /home/rmc/untars/hadoopD0.20.204.0/lib/commonsDcollectionsD3.2.1.jar 29974 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/job.xml 21695 /usr/lib/jvm/javaD6Dopenjdk/jre/lib/amd64/libnio.so 15946 /home/rmc/untars/hadoopD0.20.204.0/conf/coreDsite.xml 11024 /usr/lib/jvm/javaD6Dopenjdk/jre/lib/security/java.security 10081 /proc/self/maps 7523
  • 14. Number of I/Os 0 10000 20000 30000 40000 50000 60000 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 I/O)measured)at)syscall) Number of I/Os Number of I/Os 0 5000 10000 15000 20000 25000 30000 0 5000 10000 15000 20000 25000 30000 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Write I/O Size Bucket 2048 Read I/O Size Bucket 2048 4096 4096 8192 8192 16384 16384 One)Mapper)Task:)Temp)I/O)Counts) 32768 32768 65536 131072 65536 131072
  • 15. One)Mapper)Task:)Tmp)Bytes)Transferred) 2.5e+08 6e+07 2.0e+08 5e+07 4e+07 1.5e+08 Bytes Bytes 3e+07 1.0e+08 2e+07 5.0e+07 1e+07 0.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 I/O Size Bucket I/O Size Bucket I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls))
  • 17. Number of I/Os 0e+00 1e+05 2e+05 3e+05 4e+05 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 I/O)measured)at)syscall) Number of I/Os Number of I/Os 0 50000 100000 150000 200000 250000 300000 0 20000 40000 60000 80000 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Reducer)Task:)Temp)I/O)Counts) Read I/O Size Bucket 2048 Write I/O Size Bucket 2048 4096 4096 8192 8192 16384 16384 32768 32768 65536 65536 131072 131072
  • 18. Reducer)Task:)Tmp)Bytes)Transferred) 1.5e+09 5e+08 4e+08 1.0e+09 Bytes 3e+08 Bytes 2e+08 5.0e+08 1e+08 0.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 I/O Size Bucket I/O Size Bucket I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls))
  • 19. Datanode)–)Bytes)Transferred) 5e+08 7e+08 6e+08 5e+08 1.5e+09 4e+08 Bytes 4e+08 3e+08 2e+08 1e+08 1.0e+09 Bytes 3e+08 Bytes 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 Read I/O Size Bucket 1e+09 5.0e+08 8e+08 2e+08 6e+08 Bytes 4e+08 1e+08 0.0e+00 2e+08 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 0e+00 I/O Size Bucket 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 Write I/O Size Bucket 0e+00
  • 20. Datanode)–)Actual)vs.)Logical)I/O)Size) 5e+08 5e+08 1.5e+09 4e+08 4e+08 1.0e+09 3e+08 Bytes Bytes 3e+08 Bytes 2e+08 5.0e+08 1e+08 2e+08 0.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 I/O Size Bucket I/O Size Bucket 1e+08 I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls)) 0e+00
  • 21. Number of I/Os 0 5000 10000 15000 20000 25000 Bytes 0e+00 1e+08 2e+08 3e+08 4e+08 5e+08 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 Number of I/Os Number of I/Os 0 5000 10000 15000 0 2000 4000 6000 8000 10000 1 1 2 2 4 4 8 8 Datanode)–)IOPS) 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Write I/O Size Bucket Read I/O Size Bucket 2048 2048 4096 4096 8192 8192 16384 16384 32768 32768 65536 65536 131072 131072
  • 22. Back)of)the)Envelope)Modeling))) •  How)much)bandwidth)does)terasort)need?) –  10)seconds)of)CPU/core)Fme)per)task) –  128MB)of)HDFS)per)task) –  ~3x,)384MB)of)temporary)data)per)task) I/O&Component& Per7task& Per7task&Bandwidth& Per7host&(24& cores)& HDFS)I/O) 128MB) ~13MBytes/s) 312Mbytes/sec) Temp) 384MB) ~38Mbytes/sec) 912Mbytes/sec)
  • 23. Do)we)need)locality?) •  Main)issue)is)crossNsecFonal)bandwidth) –  Secondary)issue)is)perNhost)link)speed) –  Just)look)at)storage)I/O)now,)consider)shuffle)next) I/O& Per7host&(24& Network& Rack& Component& cores)& Bandwidth&& Bandwidth& w/&0%&locality& w/40&hosts& HDFS)I/O) 312Mbytes/sec) 2.5Gbits) 100gbits) Temp) 912Mbytes/sec) 7.3Gbits) 300gbits) •  Possible)Conclusion) –  Must)have)locality)w/1Gbit)host)link) –  Feasible)to)have)remote)data)w/10Gbit)and)keeping) temp)local)only)