SlideShare ist ein Scribd-Unternehmen logo
1 von 190
Downloaden Sie, um offline zu lesen
Distributed Computing

                                    Varun Thacker

                                Linux User’s Group Manipal


                                     April 8, 2010




Varun Thacker (LUG Manipal)          Distributed Computing   April 8, 2010   1 / 42
Outline I
1    Introduction
        LUG Manipal
        Points To Remember
2    Distributed Computing
       Distributed Computing
       Technologies to be covered
       Idea
       Data !!
       Why Distributed Computing is Hard
       Why Distributed Computing is Important
       Three Common Distributed Architectures
3    Distributed File System
       GFS
       What a Distributed File System Does
       Google File System Architecture
       GFS Architecture: Chunks
    Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   2 / 42
Outline II
         GFS Architecture: Master
         GFS: Life of a Read
         GFS: Life of a Write
         GFS: Master Failure
4    MapReduce
      MapReduce
      Do We Need It?
      Bad News!
      MapReduce
      Map Reduce Paradigm
      MapReduce Paradigm
      Working
      Working
      Under the hood: Scheduling
      Robustness
5    Hadoop
    Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   3 / 42
Outline III
         Hadoop
         What is Hadoop
         Who uses Hadoop?
         Mapper
         Combiners
         Reducer
         Some Terminology
         Job Distribution


6    Contact Information


7    Attribution


8    Copying

    Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   4 / 42
Who are we?




  Linux User’s Group Manipal




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   5 / 42
Who are we?




  Linux User’s Group Manipal
  Life, Universe and FOSS!!




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   5 / 42
Who are we?




  Linux User’s Group Manipal
  Life, Universe and FOSS!!
  Believers of Knowledge Sharing




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   5 / 42
Who are we?




  Linux User’s Group Manipal
  Life, Universe and FOSS!!
  Believers of Knowledge Sharing
  Most technologically focused “group” in University




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   5 / 42
Who are we?




  Linux User’s Group Manipal
  Life, Universe and FOSS!!
  Believers of Knowledge Sharing
  Most technologically focused “group” in University
  LUG Manipal is a non profit “Group” alive only on voluntary work!!




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   5 / 42
Who are we?




  Linux User’s Group Manipal
  Life, Universe and FOSS!!
  Believers of Knowledge Sharing
  Most technologically focused “group” in University
  LUG Manipal is a non profit “Group” alive only on voluntary work!!
  http://lugmanipal.org




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   5 / 42
Points To Remember!!!


     If you have problem(s) don’t hesitate to ask




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   6 / 42
Points To Remember!!!


     If you have problem(s) don’t hesitate to ask
     Slides are based on Documentation so discussions are really
     important, slides are for later reference!!




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   6 / 42
Points To Remember!!!


     If you have problem(s) don’t hesitate to ask
     Slides are based on Documentation so discussions are really
     important, slides are for later reference!!
     Please dont consider sessions as Class( Classes are boring !! )




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   6 / 42
Points To Remember!!!


     If you have problem(s) don’t hesitate to ask
     Slides are based on Documentation so discussions are really
     important, slides are for later reference!!
     Please dont consider sessions as Class( Classes are boring !! )
     Speaker is just like any person sitting next to you




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   6 / 42
Points To Remember!!!


     If you have problem(s) don’t hesitate to ask
     Slides are based on Documentation so discussions are really
     important, slides are for later reference!!
     Please dont consider sessions as Class( Classes are boring !! )
     Speaker is just like any person sitting next to you
     Documentation is really important




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   6 / 42
Points To Remember!!!


     If you have problem(s) don’t hesitate to ask
     Slides are based on Documentation so discussions are really
     important, slides are for later reference!!
     Please dont consider sessions as Class( Classes are boring !! )
     Speaker is just like any person sitting next to you
     Documentation is really important
     Google is your friend




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   6 / 42
Points To Remember!!!


     If you have problem(s) don’t hesitate to ask
     Slides are based on Documentation so discussions are really
     important, slides are for later reference!!
     Please dont consider sessions as Class( Classes are boring !! )
     Speaker is just like any person sitting next to you
     Documentation is really important
     Google is your friend
     If you have questions after this workshop mail me or come to LUG
     Manipal’s forums
     http://forums.lugmanipal.org



 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   6 / 42
Distributed Computing




                               Distributed Computing




 Varun Thacker (LUG Manipal)       Distributed Computing   April 8, 2010   7 / 42
Technologies to be covered


     Distributed computing refers to the use of distributed systems to
     solve computational problems on the distributed system.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   8 / 42
Technologies to be covered


     Distributed computing refers to the use of distributed systems to
     solve computational problems on the distributed system.
     A distributed system consists of multiple computers that
     communicate through a network.




 Varun Thacker (LUG Manipal)   Distributed Computing            April 8, 2010   8 / 42
Technologies to be covered


     Distributed computing refers to the use of distributed systems to
     solve computational problems on the distributed system.
     A distributed system consists of multiple computers that
     communicate through a network.
     MapReduce is a framework which implements the idea of a
     distributed computing.




 Varun Thacker (LUG Manipal)   Distributed Computing            April 8, 2010   8 / 42
Technologies to be covered


     Distributed computing refers to the use of distributed systems to
     solve computational problems on the distributed system.
     A distributed system consists of multiple computers that
     communicate through a network.
     MapReduce is a framework which implements the idea of a
     distributed computing.
     GFS is the distributed file system on which distributed programs store
     and process data in Google. It’s free implementation is HDFS.




 Varun Thacker (LUG Manipal)   Distributed Computing            April 8, 2010   8 / 42
Technologies to be covered


     Distributed computing refers to the use of distributed systems to
     solve computational problems on the distributed system.
     A distributed system consists of multiple computers that
     communicate through a network.
     MapReduce is a framework which implements the idea of a
     distributed computing.
     GFS is the distributed file system on which distributed programs store
     and process data in Google. It’s free implementation is HDFS.
     Hadoop is an open source framework written in Java which
     implements the MapReduce technology.




 Varun Thacker (LUG Manipal)   Distributed Computing            April 8, 2010   8 / 42
Idea



       While the storage capacities of hard drives have increased massively
       over the years, access speeds—the rate at which data can be read
       from drives have not kept up.




 Varun Thacker (LUG Manipal)     Distributed Computing         April 8, 2010   9 / 42
Idea



       While the storage capacities of hard drives have increased massively
       over the years, access speeds—the rate at which data can be read
       from drives have not kept up.
       One terabyte drives are the norm, but the transfer speed is around
       100 MB/s, so it takes more than two and a half hours to read all the
       data off the disk.




 Varun Thacker (LUG Manipal)     Distributed Computing         April 8, 2010   9 / 42
Idea



       While the storage capacities of hard drives have increased massively
       over the years, access speeds—the rate at which data can be read
       from drives have not kept up.
       One terabyte drives are the norm, but the transfer speed is around
       100 MB/s, so it takes more than two and a half hours to read all the
       data off the disk.
       The obvious way to reduce the time is to read from multiple disks at
       once. Imagine if we had 100 drives, each holding one hundredth of
       the data. Working in parallel, we could read the data in under two
       minutes.




 Varun Thacker (LUG Manipal)     Distributed Computing         April 8, 2010   9 / 42
Data


     We live in the data age.An IDC estimate put the size of the “digital
     universe” at 0.18 zettabytes(?) in 2006.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   10 / 42
Data


     We live in the data age.An IDC estimate put the size of the “digital
     universe” at 0.18 zettabytes(?) in 2006.
     And by 2011 there will be a tenfold growth to 1.8 zettabytes.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   10 / 42
Data


     We live in the data age.An IDC estimate put the size of the “digital
     universe” at 0.18 zettabytes(?) in 2006.
     And by 2011 there will be a tenfold growth to 1.8 zettabytes.
     1 zetabyte is one million petabytes, or one billion terabytes.




 Varun Thacker (LUG Manipal)    Distributed Computing          April 8, 2010   10 / 42
Data


     We live in the data age.An IDC estimate put the size of the “digital
     universe” at 0.18 zettabytes(?) in 2006.
     And by 2011 there will be a tenfold growth to 1.8 zettabytes.
     1 zetabyte is one million petabytes, or one billion terabytes.
     The New York Stock Exchange generates about one terabyte of new
     trade data per day.




 Varun Thacker (LUG Manipal)    Distributed Computing          April 8, 2010   10 / 42
Data


     We live in the data age.An IDC estimate put the size of the “digital
     universe” at 0.18 zettabytes(?) in 2006.
     And by 2011 there will be a tenfold growth to 1.8 zettabytes.
     1 zetabyte is one million petabytes, or one billion terabytes.
     The New York Stock Exchange generates about one terabyte of new
     trade data per day.
     Facebook hosts approximately 10 billion photos, taking up one
     petabyte of storage.




 Varun Thacker (LUG Manipal)    Distributed Computing          April 8, 2010   10 / 42
Data


     We live in the data age.An IDC estimate put the size of the “digital
     universe” at 0.18 zettabytes(?) in 2006.
     And by 2011 there will be a tenfold growth to 1.8 zettabytes.
     1 zetabyte is one million petabytes, or one billion terabytes.
     The New York Stock Exchange generates about one terabyte of new
     trade data per day.
     Facebook hosts approximately 10 billion photos, taking up one
     petabyte of storage.
     The Large Hadron Collider near Geneva produces about 15 petabytes
     of data per year.




 Varun Thacker (LUG Manipal)    Distributed Computing          April 8, 2010   10 / 42
Why Distributed Computing is Hard




     Computers crash.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   11 / 42
Why Distributed Computing is Hard




     Computers crash.
     Network links crash.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   11 / 42
Why Distributed Computing is Hard




     Computers crash.
     Network links crash.
     Talking is slow(even ethernet has 300 microsecond latency, during
     which time your 2Ghz PC can do 600,000 cycles).




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   11 / 42
Why Distributed Computing is Hard




     Computers crash.
     Network links crash.
     Talking is slow(even ethernet has 300 microsecond latency, during
     which time your 2Ghz PC can do 600,000 cycles).
     Bandwidth is finite.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   11 / 42
Why Distributed Computing is Hard




     Computers crash.
     Network links crash.
     Talking is slow(even ethernet has 300 microsecond latency, during
     which time your 2Ghz PC can do 600,000 cycles).
     Bandwidth is finite.
     Internet scale: the computers and network are
     heterogeneous,untrustworthy, and subject to change at any time.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   11 / 42
Why Distributed Computing is Important




     Can be more reliable.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   12 / 42
Why Distributed Computing is Important




     Can be more reliable.
     Can be faster.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   12 / 42
Why Distributed Computing is Important




     Can be more reliable.
     Can be faster.
     Can be cheaper ($30 million Cray versus 100 $1000 PC’s).




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   12 / 42
Three Common Distributed Architectures



     Hope: have N computers do separate pieces of work. Speed-up < N.
     Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of
     individual crash).




 Varun Thacker (LUG Manipal)   Distributed Computing    April 8, 2010   13 / 42
Three Common Distributed Architectures



     Hope: have N computers do separate pieces of work. Speed-up < N.
     Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of
     individual crash).
     Replication: have N computers do the same thing. Speed-up < 1.
     Probability of failure = p N .




 Varun Thacker (LUG Manipal)   Distributed Computing     April 8, 2010   13 / 42
Three Common Distributed Architectures



     Hope: have N computers do separate pieces of work. Speed-up < N.
     Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of
     individual crash).
     Replication: have N computers do the same thing. Speed-up < 1.
     Probability of failure = p N .
     Master-servant: have 1 computer hand out pieces of work to N-1
     servants, and re-hand out pieces of work if servants fail. Speed-up
     < N − 1. Probability of failure ≈ p.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   13 / 42
GFS




                                      GFS




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   14 / 42
What a Distributed File System Does




     Usual file system stuff: create, read, move & find files.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   15 / 42
What a Distributed File System Does




     Usual file system stuff: create, read, move & find files.
     Allow distributed access to files.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   15 / 42
What a Distributed File System Does




     Usual file system stuff: create, read, move & find files.
     Allow distributed access to files.
     Files are stored distributedly.




 Varun Thacker (LUG Manipal)     Distributed Computing       April 8, 2010   15 / 42
What a Distributed File System Does




     Usual file system stuff: create, read, move & find files.
     Allow distributed access to files.
     Files are stored distributedly.
     If you just do #1 and #2, you are a network file system.




 Varun Thacker (LUG Manipal)     Distributed Computing       April 8, 2010   15 / 42
What a Distributed File System Does




     Usual file system stuff: create, read, move & find files.
     Allow distributed access to files.
     Files are stored distributedly.
     If you just do #1 and #2, you are a network file system.
     To do #3, it’s a good idea to also provide fault tolerance.




 Varun Thacker (LUG Manipal)     Distributed Computing       April 8, 2010   15 / 42
GFS Architecture




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   16 / 42
GFS Architecture: Chunks



     Files are divided into 64 MB chunks (last chunk of a file may be
     smaller).




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   17 / 42
GFS Architecture: Chunks



     Files are divided into 64 MB chunks (last chunk of a file may be
     smaller).
     Each chunk is identified by an unique 64-bit id.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   17 / 42
GFS Architecture: Chunks



     Files are divided into 64 MB chunks (last chunk of a file may be
     smaller).
     Each chunk is identified by an unique 64-bit id.
     Chunks are stored as regular files on local disks.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   17 / 42
GFS Architecture: Chunks



     Files are divided into 64 MB chunks (last chunk of a file may be
     smaller).
     Each chunk is identified by an unique 64-bit id.
     Chunks are stored as regular files on local disks.
     By default, each chunk is stored thrice, preferably on more than one
     rack.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   17 / 42
GFS Architecture: Chunks



     Files are divided into 64 MB chunks (last chunk of a file may be
     smaller).
     Each chunk is identified by an unique 64-bit id.
     Chunks are stored as regular files on local disks.
     By default, each chunk is stored thrice, preferably on more than one
     rack.
     To protect data integrity, each 64 KB block gets a 32 bit checksum
     that is checked on all reads.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   17 / 42
GFS Architecture: Chunks



     Files are divided into 64 MB chunks (last chunk of a file may be
     smaller).
     Each chunk is identified by an unique 64-bit id.
     Chunks are stored as regular files on local disks.
     By default, each chunk is stored thrice, preferably on more than one
     rack.
     To protect data integrity, each 64 KB block gets a 32 bit checksum
     that is checked on all reads.
     When idle, a chunkserver scans inactive chunks for corruption.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   17 / 42
GFS Architecture: Master



     Stores all metadata (namespace, access control).




 Varun Thacker (LUG Manipal)   Distributed Computing    April 8, 2010   18 / 42
GFS Architecture: Master



     Stores all metadata (namespace, access control).
     Stores (file − > chunks) and (chunk − > location) mappings.




 Varun Thacker (LUG Manipal)   Distributed Computing     April 8, 2010   18 / 42
GFS Architecture: Master



     Stores all metadata (namespace, access control).
     Stores (file − > chunks) and (chunk − > location) mappings.
     Clients get chunk locations for a file from the master, and then talk
     directly to the chunkservers for the data.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   18 / 42
GFS Architecture: Master



     Stores all metadata (namespace, access control).
     Stores (file − > chunks) and (chunk − > location) mappings.
     Clients get chunk locations for a file from the master, and then talk
     directly to the chunkservers for the data.
     Advantage of single master simplicity.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   18 / 42
GFS Architecture: Master



     Stores all metadata (namespace, access control).
     Stores (file − > chunks) and (chunk − > location) mappings.
     Clients get chunk locations for a file from the master, and then talk
     directly to the chunkservers for the data.
     Advantage of single master simplicity.
     Disadvantages of single master:




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   18 / 42
GFS Architecture: Master



     Stores all metadata (namespace, access control).
     Stores (file − > chunks) and (chunk − > location) mappings.
     Clients get chunk locations for a file from the master, and then talk
     directly to the chunkservers for the data.
     Advantage of single master simplicity.
     Disadvantages of single master:
     Metadata operations are bottlenecked.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   18 / 42
GFS Architecture: Master



     Stores all metadata (namespace, access control).
     Stores (file − > chunks) and (chunk − > location) mappings.
     Clients get chunk locations for a file from the master, and then talk
     directly to the chunkservers for the data.
     Advantage of single master simplicity.
     Disadvantages of single master:
     Metadata operations are bottlenecked.
     Maximum Number of files limited by master’s memory.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   18 / 42
GFS: Life of a Read



     Client program asks for 1 Gb of file “A” starting at the 200 millionth
     byte.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   19 / 42
GFS: Life of a Read



     Client program asks for 1 Gb of file “A” starting at the 200 millionth
     byte.
     Client GFS library asks master for chunks 3, ... 16387 of file “A”.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   19 / 42
GFS: Life of a Read



     Client program asks for 1 Gb of file “A” starting at the 200 millionth
     byte.
     Client GFS library asks master for chunks 3, ... 16387 of file “A”.
     Master responds with all of the locations of chunks 2, ... 20000 of file
     “A”.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   19 / 42
GFS: Life of a Read



     Client program asks for 1 Gb of file “A” starting at the 200 millionth
     byte.
     Client GFS library asks master for chunks 3, ... 16387 of file “A”.
     Master responds with all of the locations of chunks 2, ... 20000 of file
     “A”.
     Client caches all of these locations (with their cache time-outs)




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   19 / 42
GFS: Life of a Read



     Client program asks for 1 Gb of file “A” starting at the 200 millionth
     byte.
     Client GFS library asks master for chunks 3, ... 16387 of file “A”.
     Master responds with all of the locations of chunks 2, ... 20000 of file
     “A”.
     Client caches all of these locations (with their cache time-outs)
     Client reads chunk 2 from the closest location.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   19 / 42
GFS: Life of a Read



     Client program asks for 1 Gb of file “A” starting at the 200 millionth
     byte.
     Client GFS library asks master for chunks 3, ... 16387 of file “A”.
     Master responds with all of the locations of chunks 2, ... 20000 of file
     “A”.
     Client caches all of these locations (with their cache time-outs)
     Client reads chunk 2 from the closest location.
     Client reads chunk 3 from the closest location.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   19 / 42
GFS: Life of a Read



     Client program asks for 1 Gb of file “A” starting at the 200 millionth
     byte.
     Client GFS library asks master for chunks 3, ... 16387 of file “A”.
     Master responds with all of the locations of chunks 2, ... 20000 of file
     “A”.
     Client caches all of these locations (with their cache time-outs)
     Client reads chunk 2 from the closest location.
     Client reads chunk 3 from the closest location.
     ...




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   19 / 42
GFS: Life of a Write


     Client gets locations of chunk replicas as before.




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   20 / 42
GFS: Life of a Write


     Client gets locations of chunk replicas as before.
     For each chunk, client sends the write data to nearest replica.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   20 / 42
GFS: Life of a Write


     Client gets locations of chunk replicas as before.
     For each chunk, client sends the write data to nearest replica.
     This replica sends the data to the nearest replica to it that has not
     yet received the data.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   20 / 42
GFS: Life of a Write


     Client gets locations of chunk replicas as before.
     For each chunk, client sends the write data to nearest replica.
     This replica sends the data to the nearest replica to it that has not
     yet received the data.
     When all of the replicas have received the data, then it is safe for
     them to actually write it.




 Varun Thacker (LUG Manipal)   Distributed Computing           April 8, 2010   20 / 42
GFS: Life of a Write


     Client gets locations of chunk replicas as before.
     For each chunk, client sends the write data to nearest replica.
     This replica sends the data to the nearest replica to it that has not
     yet received the data.
     When all of the replicas have received the data, then it is safe for
     them to actually write it.
     Tricky Details:




 Varun Thacker (LUG Manipal)   Distributed Computing           April 8, 2010   20 / 42
GFS: Life of a Write


     Client gets locations of chunk replicas as before.
     For each chunk, client sends the write data to nearest replica.
     This replica sends the data to the nearest replica to it that has not
     yet received the data.
     When all of the replicas have received the data, then it is safe for
     them to actually write it.
     Tricky Details:
     Master hands out a short term ( 1 minute) lease for a particular
     replica to be the primary one.




 Varun Thacker (LUG Manipal)   Distributed Computing           April 8, 2010   20 / 42
GFS: Life of a Write


     Client gets locations of chunk replicas as before.
     For each chunk, client sends the write data to nearest replica.
     This replica sends the data to the nearest replica to it that has not
     yet received the data.
     When all of the replicas have received the data, then it is safe for
     them to actually write it.
     Tricky Details:
     Master hands out a short term ( 1 minute) lease for a particular
     replica to be the primary one.
     This primary replica assigns a serial number to each mutation so that
     every replica performs the mutations in the same order.



 Varun Thacker (LUG Manipal)   Distributed Computing           April 8, 2010   20 / 42
GFS: Master Failure




     The Master stores its state via periodic checkpoints and a mutation
     log.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   21 / 42
GFS: Master Failure




     The Master stores its state via periodic checkpoints and a mutation
     log.
     Both are replicated.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   21 / 42
GFS: Master Failure




     The Master stores its state via periodic checkpoints and a mutation
     log.
     Both are replicated.
     Master election and notification is implemented using an external lock
     server.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   21 / 42
GFS: Master Failure




     The Master stores its state via periodic checkpoints and a mutation
     log.
     Both are replicated.
     Master election and notification is implemented using an external lock
     server.
     New master restores state from checkpoint and log.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   21 / 42
MapReduce




                                MapReduce




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   22 / 42
Do We Need It?




     Yes: Otherwise some problems are too big.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   23 / 42
Do We Need It?




     Yes: Otherwise some problems are too big.
     Example: 20+ billion web pages x 20KB = 400+ terabytes




 Varun Thacker (LUG Manipal)   Distributed Computing    April 8, 2010   23 / 42
Do We Need It?




     Yes: Otherwise some problems are too big.
     Example: 20+ billion web pages x 20KB = 400+ terabytes
     One computer can read 30-35 MB/sec from disk




 Varun Thacker (LUG Manipal)   Distributed Computing    April 8, 2010   23 / 42
Do We Need It?




     Yes: Otherwise some problems are too big.
     Example: 20+ billion web pages x 20KB = 400+ terabytes
     One computer can read 30-35 MB/sec from disk
     four months to read the web




 Varun Thacker (LUG Manipal)   Distributed Computing    April 8, 2010   23 / 42
Do We Need It?




     Yes: Otherwise some problems are too big.
     Example: 20+ billion web pages x 20KB = 400+ terabytes
     One computer can read 30-35 MB/sec from disk
     four months to read the web
     Same problem with 1000 machines, < 3 hours




 Varun Thacker (LUG Manipal)   Distributed Computing    April 8, 2010   23 / 42
Bad News!



     Bad News!!




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination
     recovering from machine failure (all the time!)




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination
     recovering from machine failure (all the time!)
     debugging




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination
     recovering from machine failure (all the time!)
     debugging
     optimization




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination
     recovering from machine failure (all the time!)
     debugging
     optimization
     locality




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination
     recovering from machine failure (all the time!)
     debugging
     optimization
     locality
     Bad news II: repeat for every problem you want to solve




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination
     recovering from machine failure (all the time!)
     debugging
     optimization
     locality
     Bad news II: repeat for every problem you want to solve
     Good News I and II: MapReduce and Hadoop!




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   24 / 42
Bad News!



     Bad News!!
     communication and coordination
     recovering from machine failure (all the time!)
     debugging
     optimization
     locality
     Bad news II: repeat for every problem you want to solve
     Good News I and II: MapReduce and Hadoop!




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   24 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:
     automatic parallelization




 Varun Thacker (LUG Manipal)     Distributed Computing    April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:
     automatic parallelization
     load balancing




 Varun Thacker (LUG Manipal)     Distributed Computing    April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:
     automatic parallelization
     load balancing
     network and disk transfer optimization




 Varun Thacker (LUG Manipal)     Distributed Computing    April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:
     automatic parallelization
     load balancing
     network and disk transfer optimization
     handling of machine failures




 Varun Thacker (LUG Manipal)     Distributed Computing    April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:
     automatic parallelization
     load balancing
     network and disk transfer optimization
     handling of machine failures
     robustness




 Varun Thacker (LUG Manipal)     Distributed Computing    April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:
     automatic parallelization
     load balancing
     network and disk transfer optimization
     handling of machine failures
     robustness
     Therfore we can write application level programs and let MapReduce
     insulate us from many concerns.



 Varun Thacker (LUG Manipal)     Distributed Computing    April 8, 2010   25 / 42
MapReduce


     A simple programming model that applies to many large-scale
     computing problems
     Hide messy details in MapReduce runtime library:
     automatic parallelization
     load balancing
     network and disk transfer optimization
     handling of machine failures
     robustness
     Therfore we can write application level programs and let MapReduce
     insulate us from many concerns.



 Varun Thacker (LUG Manipal)     Distributed Computing    April 8, 2010   25 / 42
Map Reduce Paradigm




     Read a lot of data




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   26 / 42
Map Reduce Paradigm




     Read a lot of data
     Map: extract something you care about from each record.




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   26 / 42
Map Reduce Paradigm




     Read a lot of data
     Map: extract something you care about from each record.
     Shuffle and Sort.




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   26 / 42
Map Reduce Paradigm




     Read a lot of data
     Map: extract something you care about from each record.
     Shuffle and Sort.
     Reduce: aggregate, summarize, filter, or transform




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   26 / 42
Map Reduce Paradigm




     Read a lot of data
     Map: extract something you care about from each record.
     Shuffle and Sort.
     Reduce: aggregate, summarize, filter, or transform
     Write the results.




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   26 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   27 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).
     For example, key = URL, value = HTML of the web page.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   27 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).
     For example, key = URL, value = HTML of the web page.
     Programmer specifies two primary methods:




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   27 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).
     For example, key = URL, value = HTML of the web page.
     Programmer specifies two primary methods:
     Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   27 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).
     For example, key = URL, value = HTML of the web page.
     Programmer specifies two primary methods:
     Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
     Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,
     v”n”)>




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   27 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).
     For example, key = URL, value = HTML of the web page.
     Programmer specifies two primary methods:
     Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
     Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,
     v”n”)>
     All v’ with same k’ are reduced together.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   27 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).
     For example, key = URL, value = HTML of the web page.
     Programmer specifies two primary methods:
     Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
     Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,
     v”n”)>
     All v’ with same k’ are reduced together.
     (Remember the invisible “Shuffle and Sort” step.)




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   27 / 42
MapReduce Paradigm



     Basic data type: the key-value pair (k,v).
     For example, key = URL, value = HTML of the web page.
     Programmer specifies two primary methods:
     Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
     Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,
     v”n”)>
     All v’ with same k’ are reduced together.
     (Remember the invisible “Shuffle and Sort” step.)




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   27 / 42
Working




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   28 / 42
Working




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   29 / 42
Under the hood: Scheduling

     One master, many workers




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker
     Considers locality of data to worker when assigning task




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker
     Considers locality of data to worker when assigning task
     Worker reads task input (often from local disk!)




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker
     Considers locality of data to worker when assigning task
     Worker reads task input (often from local disk!)
     Worker produces R local files containing intermediate (k,v) pairs




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker
     Considers locality of data to worker when assigning task
     Worker reads task input (often from local disk!)
     Worker produces R local files containing intermediate (k,v) pairs
     Master assigns each reduce task to a free worker




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker
     Considers locality of data to worker when assigning task
     Worker reads task input (often from local disk!)
     Worker produces R local files containing intermediate (k,v) pairs
     Master assigns each reduce task to a free worker
     Worker reads intermediate (k,v) pairs from map workers




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker
     Considers locality of data to worker when assigning task
     Worker reads task input (often from local disk!)
     Worker produces R local files containing intermediate (k,v) pairs
     Master assigns each reduce task to a free worker
     Worker reads intermediate (k,v) pairs from map workers
     Worker sorts & applies user’s Reduce op to produce the output



 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Under the hood: Scheduling

     One master, many workers
     Input data split into M map tasks (typically 64 MB in size)
                                                     ¯
     Reduce phase partitioned into R reduce tasks (# of output files)
     Tasks are assigned to workers dynamically
     Master assigns each map task to a free worker
     Considers locality of data to worker when assigning task
     Worker reads task input (often from local disk!)
     Worker produces R local files containing intermediate (k,v) pairs
     Master assigns each reduce task to a free worker
     Worker reads intermediate (k,v) pairs from map workers
     Worker sorts & applies user’s Reduce op to produce the output
     User may specify Partition: which intermediate keys to which Reducer

 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   30 / 42
Robustness


     One master, many workers.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.
     Re-execute completed and in-progress map tasks.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.
     Re-execute completed and in-progress map tasks.
     Re-execute in-progress reduce tasks.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.
     Re-execute completed and in-progress map tasks.
     Re-execute in-progress reduce tasks.
     Master assigns each map task to a free worker.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.
     Re-execute completed and in-progress map tasks.
     Re-execute in-progress reduce tasks.
     Master assigns each map task to a free worker.
     Master failure:




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.
     Re-execute completed and in-progress map tasks.
     Re-execute in-progress reduce tasks.
     Master assigns each map task to a free worker.
     Master failure:
     State is checkpointed to replicated file system.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.
     Re-execute completed and in-progress map tasks.
     Re-execute in-progress reduce tasks.
     Master assigns each map task to a free worker.
     Master failure:
     State is checkpointed to replicated file system.
     New master recovers & continues.




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   31 / 42
Robustness


     One master, many workers.
     Detect failure via periodic heartbeats.
     Re-execute completed and in-progress map tasks.
     Re-execute in-progress reduce tasks.
     Master assigns each map task to a free worker.
     Master failure:
     State is checkpointed to replicated file system.
     New master recovers & continues.
     Very Robust: lost 1600 of 1800 machines once, but finished
     fine-Google.



 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   31 / 42
Hadoop




                                   Hadoop




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   32 / 42
What is hadoop



     Apache Hadoop is a Java software framework that supports
     data-intensive distributed applications under a free license.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   33 / 42
What is hadoop



     Apache Hadoop is a Java software framework that supports
     data-intensive distributed applications under a free license.
     Hadoop was inspired by Google’s MapReduce and Google File System
     (GFS) papers.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   33 / 42
What is hadoop



     Apache Hadoop is a Java software framework that supports
     data-intensive distributed applications under a free license.
     Hadoop was inspired by Google’s MapReduce and Google File System
     (GFS) papers.
     A Map/Reduce job usually splits the input data-set into independent
     chunks which are processed by the map tasks in a completely parallel
     manner.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   33 / 42
What is hadoop



     Apache Hadoop is a Java software framework that supports
     data-intensive distributed applications under a free license.
     Hadoop was inspired by Google’s MapReduce and Google File System
     (GFS) papers.
     A Map/Reduce job usually splits the input data-set into independent
     chunks which are processed by the map tasks in a completely parallel
     manner.
     It is then made input to the reduce tasks.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   33 / 42
What is hadoop



     Apache Hadoop is a Java software framework that supports
     data-intensive distributed applications under a free license.
     Hadoop was inspired by Google’s MapReduce and Google File System
     (GFS) papers.
     A Map/Reduce job usually splits the input data-set into independent
     chunks which are processed by the map tasks in a completely parallel
     manner.
     It is then made input to the reduce tasks.
     The framework takes care of scheduling tasks, monitoring them and
     re-executes the failed tasks.




 Varun Thacker (LUG Manipal)   Distributed Computing          April 8, 2010   33 / 42
Who uses Hadoop?


     Adobe




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL




 Varun Thacker (LUG Manipal)   Distributed Computing   April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.
     Facebook




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.
     Facebook
     Google




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.
     Facebook
     Google
     IBM




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.
     Facebook
     Google
     IBM
     Twitter




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.
     Facebook
     Google
     IBM
     Twitter
     Yahoo!




 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.
     Facebook
     Google
     IBM
     Twitter
     Yahoo!
     The New York Times,Last.fm,Hulu,LinkedIn


 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Who uses Hadoop?


     Adobe
     AOL
     Baidu - the leading Chinese language search engine
     Cloudera, Inc - Cloudera provides commercial support and
     professional training for Hadoop.
     Facebook
     Google
     IBM
     Twitter
     Yahoo!
     The New York Times,Last.fm,Hulu,LinkedIn


 Varun Thacker (LUG Manipal)   Distributed Computing      April 8, 2010   34 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.
     The framework then calls map method for each key/value pair.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.
     The framework then calls map method for each key/value pair.
     Applications can use the Reporter to report progress.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.
     The framework then calls map method for each key/value pair.
     Applications can use the Reporter to report progress.
     All intermediate values associated with a given output key are
     subsequently grouped by the framework, and passed to the
     Reducer(s) to determine the final output.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.
     The framework then calls map method for each key/value pair.
     Applications can use the Reporter to report progress.
     All intermediate values associated with a given output key are
     subsequently grouped by the framework, and passed to the
     Reducer(s) to determine the final output.
     The intermediate, sorted outputs are always stored in a simple
     (key-len, key, value-len, value) format.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.
     The framework then calls map method for each key/value pair.
     Applications can use the Reporter to report progress.
     All intermediate values associated with a given output key are
     subsequently grouped by the framework, and passed to the
     Reducer(s) to determine the final output.
     The intermediate, sorted outputs are always stored in a simple
     (key-len, key, value-len, value) format.
     The number of maps is usually driven by the total size of the inputs,
     that is, the total number of blocks of the input files.


 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.
     The framework then calls map method for each key/value pair.
     Applications can use the Reporter to report progress.
     All intermediate values associated with a given output key are
     subsequently grouped by the framework, and passed to the
     Reducer(s) to determine the final output.
     The intermediate, sorted outputs are always stored in a simple
     (key-len, key, value-len, value) format.
     The number of maps is usually driven by the total size of the inputs,
     that is, the total number of blocks of the input files.
     Users can optionally specify a combiner to perform local aggregation
     of the intermediate outputs.
 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   35 / 42
Mapper
     Mapper maps input key/value pairs to a set of intermediate key/value
     pairs.
     The Hadoop Map/Reduce framework spawns one map task for each
     InputSplit generated by the InputFormat.
     Output pairs do not need to be of the same types as input pairs.
     Mapper implementations are passed the JobConf for the job.
     The framework then calls map method for each key/value pair.
     Applications can use the Reporter to report progress.
     All intermediate values associated with a given output key are
     subsequently grouped by the framework, and passed to the
     Reducer(s) to determine the final output.
     The intermediate, sorted outputs are always stored in a simple
     (key-len, key, value-len, value) format.
     The number of maps is usually driven by the total size of the inputs,
     that is, the total number of blocks of the input files.
     Users can optionally specify a combiner to perform local aggregation
     of the intermediate outputs.
 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   35 / 42
Combiners



     When the map operation outputs its pairs they are already available
     in memory.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   36 / 42
Combiners



     When the map operation outputs its pairs they are already available
     in memory.
     If a combiner is used then the map key-value pairs are not
     immediately written to the output.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   36 / 42
Combiners



     When the map operation outputs its pairs they are already available
     in memory.
     If a combiner is used then the map key-value pairs are not
     immediately written to the output.
     They are collected in lists, one list per each key value.




 Varun Thacker (LUG Manipal)    Distributed Computing            April 8, 2010   36 / 42
Combiners



     When the map operation outputs its pairs they are already available
     in memory.
     If a combiner is used then the map key-value pairs are not
     immediately written to the output.
     They are collected in lists, one list per each key value.
     When a certain number of key-value pairs have been written, this
     buffer is flushed by passing all the values of each key to the combiner’s
     reduce method and outputting the key-value pairs of the combine
     operation as if they were created by the original map operation.




 Varun Thacker (LUG Manipal)    Distributed Computing            April 8, 2010   36 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   37 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.
     Reducer implementations are passed the JobConf for the job.




 Varun Thacker (LUG Manipal)   Distributed Computing        April 8, 2010   37 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.
     Reducer implementations are passed the JobConf for the job.
     The framework then calls reduce(WritableComparable, Iterator,
     OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair
     in the grouped inputs.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   37 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.
     Reducer implementations are passed the JobConf for the job.
     The framework then calls reduce(WritableComparable, Iterator,
     OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair
     in the grouped inputs.
     The reducer has 3 primary phases:




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   37 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.
     Reducer implementations are passed the JobConf for the job.
     The framework then calls reduce(WritableComparable, Iterator,
     OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair
     in the grouped inputs.
     The reducer has 3 primary phases:
     Shuffle:Input to the Reducer is the sorted output of the mappers. In
     this phase the framework fetches the relevant partition of the output
     of all the mappers, via HTTP.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   37 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.
     Reducer implementations are passed the JobConf for the job.
     The framework then calls reduce(WritableComparable, Iterator,
     OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair
     in the grouped inputs.
     The reducer has 3 primary phases:
     Shuffle:Input to the Reducer is the sorted output of the mappers. In
     this phase the framework fetches the relevant partition of the output
     of all the mappers, via HTTP.
     Sort:The framework groups Reducer inputs by keys (since different
     mappers may have output the same key) in this stage.




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   37 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.
     Reducer implementations are passed the JobConf for the job.
     The framework then calls reduce(WritableComparable, Iterator,
     OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair
     in the grouped inputs.
     The reducer has 3 primary phases:
     Shuffle:Input to the Reducer is the sorted output of the mappers. In
     this phase the framework fetches the relevant partition of the output
     of all the mappers, via HTTP.
     Sort:The framework groups Reducer inputs by keys (since different
     mappers may have output the same key) in this stage.
     Reduce:In this phase the reduce method is called for each <key, (list
     of values)> pair in the grouped inputs.

 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   37 / 42
Reducer
     Reducer reduces a set of intermediate values which share a key to a
     smaller set of values.
     Reducer implementations are passed the JobConf for the job.
     The framework then calls reduce(WritableComparable, Iterator,
     OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair
     in the grouped inputs.
     The reducer has 3 primary phases:
     Shuffle:Input to the Reducer is the sorted output of the mappers. In
     this phase the framework fetches the relevant partition of the output
     of all the mappers, via HTTP.
     Sort:The framework groups Reducer inputs by keys (since different
     mappers may have output the same key) in this stage.
     Reduce:In this phase the reduce method is called for each <key, (list
     of values)> pair in the grouped inputs.
     The generated ouput is a new value.
 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   37 / 42
Some Terminology




     Job – A “full program” - an execution of a Mapper and Reducer
     across a data set.




 Varun Thacker (LUG Manipal)   Distributed Computing     April 8, 2010   38 / 42
Some Terminology




     Job – A “full program” - an execution of a Mapper and Reducer
     across a data set.
     Task – An execution of a Mapper or a Reducer on a slice of data




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   38 / 42
Some Terminology




     Job – A “full program” - an execution of a Mapper and Reducer
     across a data set.
     Task – An execution of a Mapper or a Reducer on a slice of data
     Task Attempt – A particular instance of an attempt to execute a task
     on a machine.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   38 / 42
Job Distribution




     MapReduce programs are contained in a Java “jar” file + an XML file
     containing serialized program configuration options.




 Varun Thacker (LUG Manipal)   Distributed Computing     April 8, 2010   39 / 42
Job Distribution




     MapReduce programs are contained in a Java “jar” file + an XML file
     containing serialized program configuration options.
     Running a MapReduce job places these files into the HDFS and
     notifies TaskTrackers where to retrieve the relevant program code.




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   39 / 42
Job Distribution




     MapReduce programs are contained in a Java “jar” file + an XML file
     containing serialized program configuration options.
     Running a MapReduce job places these files into the HDFS and
     notifies TaskTrackers where to retrieve the relevant program code.
     Data Distribution: Implicit in design of MapReduce!




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   39 / 42
Contact Information




         Varun Thacker
                                               Linux User’s Group Manipal
  varunthacker1989@gmail.com
                                                http://lugmanipal.org
             http:
                                            http://forums.lugmanipal.org
//varunthacker.wordpress.com




 Varun Thacker (LUG Manipal)   Distributed Computing         April 8, 2010   40 / 42
Attribution




                              Google
    Under the Creative Commons Attribution-Share Alike 2.5 Generic.




 Varun Thacker (LUG Manipal)   Distributed Computing     April 8, 2010   41 / 42
Copying




         Creative Commons Attribution-Share Alike 2.5 India License
        http://creativecommons.org/licenses/by-sa/2.5/in/




 Varun Thacker (LUG Manipal)   Distributed Computing       April 8, 2010   42 / 42

Weitere ähnliche Inhalte

Ähnlich wie Distributed Computing

OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data storeJ On The Beach
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...Paolo Nesi
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchersDirk Roorda
 
Stuff we do with OSS in libraries (Bergen, 2009)
Stuff we do with OSS in libraries (Bergen, 2009)Stuff we do with OSS in libraries (Bergen, 2009)
Stuff we do with OSS in libraries (Bergen, 2009)Nicolas Morin
 
Andhra pradesh workshop user manual october 2016
Andhra pradesh workshop user manual october 2016Andhra pradesh workshop user manual october 2016
Andhra pradesh workshop user manual october 2016OERindia
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaJazz Yao-Tsung Wang
 
10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about HadoopDonald Miner
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Chris Mattmann
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE
 
Brain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceBrain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceKrzysztof Gorgolewski
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysisxulioc
 

Ähnlich wie Distributed Computing (20)

OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
London level39
London level39London level39
London level39
 
Stuff we do with OSS in libraries (Bergen, 2009)
Stuff we do with OSS in libraries (Bergen, 2009)Stuff we do with OSS in libraries (Bergen, 2009)
Stuff we do with OSS in libraries (Bergen, 2009)
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Hadoop
HadoopHadoop
Hadoop
 
doing_parallel.pdf
doing_parallel.pdfdoing_parallel.pdf
doing_parallel.pdf
 
Andhra pradesh workshop user manual october 2016
Andhra pradesh workshop user manual october 2016Andhra pradesh workshop user manual october 2016
Andhra pradesh workshop user manual october 2016
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
 
10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Brain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceBrain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible Neuroscince
 
Hadoop
HadoopHadoop
Hadoop
 
Niatalk24jan10
Niatalk24jan10Niatalk24jan10
Niatalk24jan10
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysis
 

Distributed Computing

  • 1. Distributed Computing Varun Thacker Linux User’s Group Manipal April 8, 2010 Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 1 / 42
  • 2. Outline I 1 Introduction LUG Manipal Points To Remember 2 Distributed Computing Distributed Computing Technologies to be covered Idea Data !! Why Distributed Computing is Hard Why Distributed Computing is Important Three Common Distributed Architectures 3 Distributed File System GFS What a Distributed File System Does Google File System Architecture GFS Architecture: Chunks Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 2 / 42
  • 3. Outline II GFS Architecture: Master GFS: Life of a Read GFS: Life of a Write GFS: Master Failure 4 MapReduce MapReduce Do We Need It? Bad News! MapReduce Map Reduce Paradigm MapReduce Paradigm Working Working Under the hood: Scheduling Robustness 5 Hadoop Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 3 / 42
  • 4. Outline III Hadoop What is Hadoop Who uses Hadoop? Mapper Combiners Reducer Some Terminology Job Distribution 6 Contact Information 7 Attribution 8 Copying Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 4 / 42
  • 5. Who are we? Linux User’s Group Manipal Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
  • 6. Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
  • 7. Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
  • 8. Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Most technologically focused “group” in University Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
  • 9. Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Most technologically focused “group” in University LUG Manipal is a non profit “Group” alive only on voluntary work!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
  • 10. Who are we? Linux User’s Group Manipal Life, Universe and FOSS!! Believers of Knowledge Sharing Most technologically focused “group” in University LUG Manipal is a non profit “Group” alive only on voluntary work!! http://lugmanipal.org Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42
  • 11. Points To Remember!!! If you have problem(s) don’t hesitate to ask Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
  • 12. Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
  • 13. Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
  • 14. Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
  • 15. Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Documentation is really important Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
  • 16. Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Documentation is really important Google is your friend Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
  • 17. Points To Remember!!! If you have problem(s) don’t hesitate to ask Slides are based on Documentation so discussions are really important, slides are for later reference!! Please dont consider sessions as Class( Classes are boring !! ) Speaker is just like any person sitting next to you Documentation is really important Google is your friend If you have questions after this workshop mail me or come to LUG Manipal’s forums http://forums.lugmanipal.org Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42
  • 18. Distributed Computing Distributed Computing Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 7 / 42
  • 19. Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
  • 20. Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
  • 21. Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. MapReduce is a framework which implements the idea of a distributed computing. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
  • 22. Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. MapReduce is a framework which implements the idea of a distributed computing. GFS is the distributed file system on which distributed programs store and process data in Google. It’s free implementation is HDFS. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
  • 23. Technologies to be covered Distributed computing refers to the use of distributed systems to solve computational problems on the distributed system. A distributed system consists of multiple computers that communicate through a network. MapReduce is a framework which implements the idea of a distributed computing. GFS is the distributed file system on which distributed programs store and process data in Google. It’s free implementation is HDFS. Hadoop is an open source framework written in Java which implements the MapReduce technology. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42
  • 24. Idea While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42
  • 25. Idea While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up. One terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42
  • 26. Idea While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up. One terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42
  • 27. Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
  • 28. Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
  • 29. Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
  • 30. Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. The New York Stock Exchange generates about one terabyte of new trade data per day. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
  • 31. Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately 10 billion photos, taking up one petabyte of storage. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
  • 32. Data We live in the data age.An IDC estimate put the size of the “digital universe” at 0.18 zettabytes(?) in 2006. And by 2011 there will be a tenfold growth to 1.8 zettabytes. 1 zetabyte is one million petabytes, or one billion terabytes. The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately 10 billion photos, taking up one petabyte of storage. The Large Hadron Collider near Geneva produces about 15 petabytes of data per year. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42
  • 33. Why Distributed Computing is Hard Computers crash. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
  • 34. Why Distributed Computing is Hard Computers crash. Network links crash. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
  • 35. Why Distributed Computing is Hard Computers crash. Network links crash. Talking is slow(even ethernet has 300 microsecond latency, during which time your 2Ghz PC can do 600,000 cycles). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
  • 36. Why Distributed Computing is Hard Computers crash. Network links crash. Talking is slow(even ethernet has 300 microsecond latency, during which time your 2Ghz PC can do 600,000 cycles). Bandwidth is finite. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
  • 37. Why Distributed Computing is Hard Computers crash. Network links crash. Talking is slow(even ethernet has 300 microsecond latency, during which time your 2Ghz PC can do 600,000 cycles). Bandwidth is finite. Internet scale: the computers and network are heterogeneous,untrustworthy, and subject to change at any time. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42
  • 38. Why Distributed Computing is Important Can be more reliable. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42
  • 39. Why Distributed Computing is Important Can be more reliable. Can be faster. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42
  • 40. Why Distributed Computing is Important Can be more reliable. Can be faster. Can be cheaper ($30 million Cray versus 100 $1000 PC’s). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42
  • 41. Three Common Distributed Architectures Hope: have N computers do separate pieces of work. Speed-up < N. Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of individual crash). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42
  • 42. Three Common Distributed Architectures Hope: have N computers do separate pieces of work. Speed-up < N. Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of individual crash). Replication: have N computers do the same thing. Speed-up < 1. Probability of failure = p N . Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42
  • 43. Three Common Distributed Architectures Hope: have N computers do separate pieces of work. Speed-up < N. Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of individual crash). Replication: have N computers do the same thing. Speed-up < 1. Probability of failure = p N . Master-servant: have 1 computer hand out pieces of work to N-1 servants, and re-hand out pieces of work if servants fail. Speed-up < N − 1. Probability of failure ≈ p. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42
  • 44. GFS GFS Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 14 / 42
  • 45. What a Distributed File System Does Usual file system stuff: create, read, move & find files. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
  • 46. What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
  • 47. What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Files are stored distributedly. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
  • 48. What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Files are stored distributedly. If you just do #1 and #2, you are a network file system. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
  • 49. What a Distributed File System Does Usual file system stuff: create, read, move & find files. Allow distributed access to files. Files are stored distributedly. If you just do #1 and #2, you are a network file system. To do #3, it’s a good idea to also provide fault tolerance. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42
  • 50. GFS Architecture Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 16 / 42
  • 51. GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
  • 52. GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
  • 53. GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
  • 54. GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. By default, each chunk is stored thrice, preferably on more than one rack. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
  • 55. GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. By default, each chunk is stored thrice, preferably on more than one rack. To protect data integrity, each 64 KB block gets a 32 bit checksum that is checked on all reads. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
  • 56. GFS Architecture: Chunks Files are divided into 64 MB chunks (last chunk of a file may be smaller). Each chunk is identified by an unique 64-bit id. Chunks are stored as regular files on local disks. By default, each chunk is stored thrice, preferably on more than one rack. To protect data integrity, each 64 KB block gets a 32 bit checksum that is checked on all reads. When idle, a chunkserver scans inactive chunks for corruption. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42
  • 57. GFS Architecture: Master Stores all metadata (namespace, access control). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
  • 58. GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
  • 59. GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
  • 60. GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
  • 61. GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Disadvantages of single master: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
  • 62. GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Disadvantages of single master: Metadata operations are bottlenecked. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
  • 63. GFS Architecture: Master Stores all metadata (namespace, access control). Stores (file − > chunks) and (chunk − > location) mappings. Clients get chunk locations for a file from the master, and then talk directly to the chunkservers for the data. Advantage of single master simplicity. Disadvantages of single master: Metadata operations are bottlenecked. Maximum Number of files limited by master’s memory. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42
  • 64. GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
  • 65. GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
  • 66. GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
  • 67. GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
  • 68. GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Client reads chunk 2 from the closest location. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
  • 69. GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Client reads chunk 2 from the closest location. Client reads chunk 3 from the closest location. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
  • 70. GFS: Life of a Read Client program asks for 1 Gb of file “A” starting at the 200 millionth byte. Client GFS library asks master for chunks 3, ... 16387 of file “A”. Master responds with all of the locations of chunks 2, ... 20000 of file “A”. Client caches all of these locations (with their cache time-outs) Client reads chunk 2 from the closest location. Client reads chunk 3 from the closest location. ... Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42
  • 71. GFS: Life of a Write Client gets locations of chunk replicas as before. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
  • 72. GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
  • 73. GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
  • 74. GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
  • 75. GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Tricky Details: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
  • 76. GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Tricky Details: Master hands out a short term ( 1 minute) lease for a particular replica to be the primary one. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
  • 77. GFS: Life of a Write Client gets locations of chunk replicas as before. For each chunk, client sends the write data to nearest replica. This replica sends the data to the nearest replica to it that has not yet received the data. When all of the replicas have received the data, then it is safe for them to actually write it. Tricky Details: Master hands out a short term ( 1 minute) lease for a particular replica to be the primary one. This primary replica assigns a serial number to each mutation so that every replica performs the mutations in the same order. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42
  • 78. GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
  • 79. GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Both are replicated. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
  • 80. GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Both are replicated. Master election and notification is implemented using an external lock server. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
  • 81. GFS: Master Failure The Master stores its state via periodic checkpoints and a mutation log. Both are replicated. Master election and notification is implemented using an external lock server. New master restores state from checkpoint and log. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42
  • 82. MapReduce MapReduce Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 22 / 42
  • 83. Do We Need It? Yes: Otherwise some problems are too big. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
  • 84. Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
  • 85. Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes One computer can read 30-35 MB/sec from disk Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
  • 86. Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes One computer can read 30-35 MB/sec from disk four months to read the web Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
  • 87. Do We Need It? Yes: Otherwise some problems are too big. Example: 20+ billion web pages x 20KB = 400+ terabytes One computer can read 30-35 MB/sec from disk four months to read the web Same problem with 1000 machines, < 3 hours Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42
  • 88. Bad News! Bad News!! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 89. Bad News! Bad News!! communication and coordination Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 90. Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 91. Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 92. Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 93. Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 94. Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Bad news II: repeat for every problem you want to solve Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 95. Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Bad news II: repeat for every problem you want to solve Good News I and II: MapReduce and Hadoop! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 96. Bad News! Bad News!! communication and coordination recovering from machine failure (all the time!) debugging optimization locality Bad news II: repeat for every problem you want to solve Good News I and II: MapReduce and Hadoop! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42
  • 97. MapReduce A simple programming model that applies to many large-scale computing problems Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 98. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 99. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 100. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 101. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 102. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 103. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures robustness Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 104. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures robustness Therfore we can write application level programs and let MapReduce insulate us from many concerns. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 105. MapReduce A simple programming model that applies to many large-scale computing problems Hide messy details in MapReduce runtime library: automatic parallelization load balancing network and disk transfer optimization handling of machine failures robustness Therfore we can write application level programs and let MapReduce insulate us from many concerns. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42
  • 106. Map Reduce Paradigm Read a lot of data Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
  • 107. Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
  • 108. Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Shuffle and Sort. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
  • 109. Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Shuffle and Sort. Reduce: aggregate, summarize, filter, or transform Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
  • 110. Map Reduce Paradigm Read a lot of data Map: extract something you care about from each record. Shuffle and Sort. Reduce: aggregate, summarize, filter, or transform Write the results. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42
  • 111. MapReduce Paradigm Basic data type: the key-value pair (k,v). Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 112. MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 113. MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 114. MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 115. MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 116. MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> All v’ with same k’ are reduced together. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 117. MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> All v’ with same k’ are reduced together. (Remember the invisible “Shuffle and Sort” step.) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 118. MapReduce Paradigm Basic data type: the key-value pair (k,v). For example, key = URL, value = HTML of the web page. Programmer specifies two primary methods: Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)> Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’, v”n”)> All v’ with same k’ are reduced together. (Remember the invisible “Shuffle and Sort” step.) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42
  • 119. Working Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 28 / 42
  • 120. Working Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 29 / 42
  • 121. Under the hood: Scheduling One master, many workers Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 122. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 123. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 124. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 125. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 126. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 127. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 128. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 129. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 130. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Worker reads intermediate (k,v) pairs from map workers Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 131. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Worker reads intermediate (k,v) pairs from map workers Worker sorts & applies user’s Reduce op to produce the output Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 132. Under the hood: Scheduling One master, many workers Input data split into M map tasks (typically 64 MB in size) ¯ Reduce phase partitioned into R reduce tasks (# of output files) Tasks are assigned to workers dynamically Master assigns each map task to a free worker Considers locality of data to worker when assigning task Worker reads task input (often from local disk!) Worker produces R local files containing intermediate (k,v) pairs Master assigns each reduce task to a free worker Worker reads intermediate (k,v) pairs from map workers Worker sorts & applies user’s Reduce op to produce the output User may specify Partition: which intermediate keys to which Reducer Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42
  • 133. Robustness One master, many workers. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 134. Robustness One master, many workers. Detect failure via periodic heartbeats. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 135. Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 136. Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 137. Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 138. Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 139. Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: State is checkpointed to replicated file system. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 140. Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: State is checkpointed to replicated file system. New master recovers & continues. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 141. Robustness One master, many workers. Detect failure via periodic heartbeats. Re-execute completed and in-progress map tasks. Re-execute in-progress reduce tasks. Master assigns each map task to a free worker. Master failure: State is checkpointed to replicated file system. New master recovers & continues. Very Robust: lost 1600 of 1800 machines once, but finished fine-Google. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42
  • 142. Hadoop Hadoop Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 32 / 42
  • 143. What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
  • 144. What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
  • 145. What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
  • 146. What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. It is then made input to the reduce tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
  • 147. What is hadoop Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. It is then made input to the reduce tasks. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42
  • 148. Who uses Hadoop? Adobe Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 149. Who uses Hadoop? Adobe AOL Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 150. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 151. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 152. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 153. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 154. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 155. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 156. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Yahoo! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 157. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Yahoo! The New York Times,Last.fm,Hulu,LinkedIn Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 158. Who uses Hadoop? Adobe AOL Baidu - the leading Chinese language search engine Cloudera, Inc - Cloudera provides commercial support and professional training for Hadoop. Facebook Google IBM Twitter Yahoo! The New York Times,Last.fm,Hulu,LinkedIn Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42
  • 159. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 160. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 161. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 162. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 163. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 164. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 165. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 166. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 167. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 168. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. Users can optionally specify a combiner to perform local aggregation of the intermediate outputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 169. Mapper Mapper maps input key/value pairs to a set of intermediate key/value pairs. The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat. Output pairs do not need to be of the same types as input pairs. Mapper implementations are passed the JobConf for the job. The framework then calls map method for each key/value pair. Applications can use the Reporter to report progress. All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format. The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. Users can optionally specify a combiner to perform local aggregation of the intermediate outputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42
  • 170. Combiners When the map operation outputs its pairs they are already available in memory. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
  • 171. Combiners When the map operation outputs its pairs they are already available in memory. If a combiner is used then the map key-value pairs are not immediately written to the output. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
  • 172. Combiners When the map operation outputs its pairs they are already available in memory. If a combiner is used then the map key-value pairs are not immediately written to the output. They are collected in lists, one list per each key value. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
  • 173. Combiners When the map operation outputs its pairs they are already available in memory. If a combiner is used then the map key-value pairs are not immediately written to the output. They are collected in lists, one list per each key value. When a certain number of key-value pairs have been written, this buffer is flushed by passing all the values of each key to the combiner’s reduce method and outputting the key-value pairs of the combine operation as if they were created by the original map operation. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42
  • 174. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 175. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 176. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 177. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 178. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 179. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 180. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. Reduce:In this phase the reduce method is called for each <key, (list of values)> pair in the grouped inputs. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 181. Reducer Reducer reduces a set of intermediate values which share a key to a smaller set of values. Reducer implementations are passed the JobConf for the job. The framework then calls reduce(WritableComparable, Iterator, OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair in the grouped inputs. The reducer has 3 primary phases: Shuffle:Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. Sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. Reduce:In this phase the reduce method is called for each <key, (list of values)> pair in the grouped inputs. The generated ouput is a new value. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42
  • 182. Some Terminology Job – A “full program” - an execution of a Mapper and Reducer across a data set. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42
  • 183. Some Terminology Job – A “full program” - an execution of a Mapper and Reducer across a data set. Task – An execution of a Mapper or a Reducer on a slice of data Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42
  • 184. Some Terminology Job – A “full program” - an execution of a Mapper and Reducer across a data set. Task – An execution of a Mapper or a Reducer on a slice of data Task Attempt – A particular instance of an attempt to execute a task on a machine. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42
  • 185. Job Distribution MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42
  • 186. Job Distribution MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options. Running a MapReduce job places these files into the HDFS and notifies TaskTrackers where to retrieve the relevant program code. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42
  • 187. Job Distribution MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options. Running a MapReduce job places these files into the HDFS and notifies TaskTrackers where to retrieve the relevant program code. Data Distribution: Implicit in design of MapReduce! Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42
  • 188. Contact Information Varun Thacker Linux User’s Group Manipal varunthacker1989@gmail.com http://lugmanipal.org http: http://forums.lugmanipal.org //varunthacker.wordpress.com Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 40 / 42
  • 189. Attribution Google Under the Creative Commons Attribution-Share Alike 2.5 Generic. Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 41 / 42
  • 190. Copying Creative Commons Attribution-Share Alike 2.5 India License http://creativecommons.org/licenses/by-sa/2.5/in/ Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 42 / 42