SlideShare ist ein Scribd-Unternehmen logo
1 von 15
MAP/REDUCE

        HPC4 Seminar
            IPM
       December 2011




                               Omid Djoudi
                        od90125@yahoo.com


2011       IPM - HPC4                    1
Map Reduce
Paradigm for divide / conquer

Input list divided in n splits – processed by m blades
Aggregation performed on the results

Close integration between middleware and infrastructure

Software and Hardware interact to achieve a very specific target
-> Diverge from “service oriented” paradigm where infrastructure is
    abstracted from the program logic


2011                               IPM - HPC4                         2
Map Reduce

        Program
                              Data management


                  INFRASTRUCTURE




        Program

       Data management

                  INFRASTRUCTURE




2011                     IPM - HPC4             3
Map Reduce
Programming model

Based on functional programming

Map : λx . x²
Reduce : λx . λy . x+y
-> Σ(xi)²

Number of algorithms can be implemented in a sequence of
  map/reduce(s).

2011                        IPM - HPC4                     4
Map Reduce
Programming model

243253
Map -> intermediate values
<2, 4> <4, 16> <3,9> <2,4> <5, 25> <3,9>
Reduce -> final results
<2,8> <3,18> <4,16> <5,25>

Reduce phase cannot begin until Map has finished
-> No streaming


2011                         IPM - HPC4            5
Map Reduce
MAPPER :
Responsible for data processing step




2011                         IPM - HPC4   6
Map Reduce
REDUCE :

Data output from map sorted
and grouped on the same key

<k22, v22><k21, v21><k21,v20>
<k21, v21><k21,v20><k22,v22>
<k21,[v21,v20]> <k22, v22>


Reducer iterates on this list
and combines the values for
each key

2011                            IPM - HPC4   7
Map Reduce
Map functions run in parallel
Transform independent input data into independent
intermediate data

Reduce functions run in parallel
Aggregate independent output keys

No data sharing



2011                   IPM - HPC4                   8
Map Reduce

map (key,number):
  for each number in file-contents:
      emit (number, number²)


reduce (key, values):
  sum = 0
  for each value in values:
      sum = sum + value
  emit (word, sum)




2011                   IPM - HPC4     9
Map Reduce




2011      IPM - HPC4   10
Map Reduce
PARTITIONER :

• We use multiple reducers

• After shuffling, the icons of
  same shape will be in the
  same reducers.

• The partitioner decides
  which keys goes where.


2011                          IPM - HPC4   11
Map Reduce
       NODE 1                                    NODE 2                                    NODE 3

           2,3,5,2                                         2,3                                      5, 2, 5


                 MAPPER                                    MAPPER                                    MAPPER




        <2,4><3,9><5,25><2,4>                             <2,4><3,9>                          <5,25><2,4><5,25>

                PARTITIONER                               PARTITIONER                               PARTITIONER




 Output generated by map “shuffled”
     (transferred) to the corresponding
     reduce node
                              NODE 5                                    NODE 6



                                  <2,4><2,4><2,4>
                                                                             <3,9><3,9>
                                 <5,25><5,25><5,25>


                                       REDUCER                                   REDUCER

                                        <2,12>
                                                                                  <3,18>
                                        <5,75>




2011                                                      IPM - HPC4                                              12
Map Reduce
COMBINER :

• Intermediate step between
  Map and Reduce
• Run on mapper nodes
• Save bandwidth before
  sending data to reducer
• Must be associative:
       (a op b) op c = a op (b op c)
       and commutative
       (a op b) = (b op a)

2011                                   IPM - HPC4   13
Map Reduce




2011      IPM - HPC4   14
Map Reduce
The data presented to reduce() is sorted by key on each node

-> The output of sort is put to a file, so reduce() can read the file sequentially
    and do the processing

The sorting is actually performed during the map phase, and merged during
   shuffle phase on the reduce node.

+ Fair distribution of processing – optimisation in the middleware

- No clear separation of responsibilities, more difficult to perform capacity
   planning.


2011                                  IPM - HPC4                                 15

Weitere ähnliche Inhalte

Was ist angesagt?

Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Embedded programming u3 part 1
Embedded programming u3 part 1Embedded programming u3 part 1
Embedded programming u3 part 1Karthik Vivek
 
Creating Slope-Enhanced Shaded Relief Using Global Mapper
Creating Slope-Enhanced Shaded Relief Using Global MapperCreating Slope-Enhanced Shaded Relief Using Global Mapper
Creating Slope-Enhanced Shaded Relief Using Global MapperKent D. Brown
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingLu Wei
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin
 
Block diagram representation
Block diagram representationBlock diagram representation
Block diagram representationnirali monani
 
Power Systems Engineering - Matlab programs for Power system Simulation Lab -...
Power Systems Engineering - Matlab programs for Power system Simulation Lab -...Power Systems Engineering - Matlab programs for Power system Simulation Lab -...
Power Systems Engineering - Matlab programs for Power system Simulation Lab -...Mathankumar S
 
Block diagram &amp;_overall_transferfunction_of_a_multiloop_control_system
Block diagram &amp;_overall_transferfunction_of_a_multiloop_control_systemBlock diagram &amp;_overall_transferfunction_of_a_multiloop_control_system
Block diagram &amp;_overall_transferfunction_of_a_multiloop_control_systemPrashant thakur
 
MATLAB programs Power System Simulation lab (Electrical Engineer)
MATLAB programs Power System Simulation  lab (Electrical Engineer)MATLAB programs Power System Simulation  lab (Electrical Engineer)
MATLAB programs Power System Simulation lab (Electrical Engineer)Mathankumar S
 
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...AishwaryaRavishankar8
 
Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...
Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...
Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...ISA Interchange
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...IJECEIAES
 
Model compression
Model compressionModel compression
Model compressionNanhee Kim
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingMLAI2
 

Was ist angesagt? (20)

Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Embedded programming u3 part 1
Embedded programming u3 part 1Embedded programming u3 part 1
Embedded programming u3 part 1
 
Creating Slope-Enhanced Shaded Relief Using Global Mapper
Creating Slope-Enhanced Shaded Relief Using Global MapperCreating Slope-Enhanced Shaded Relief Using Global Mapper
Creating Slope-Enhanced Shaded Relief Using Global Mapper
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
 
Meridian_Award
Meridian_AwardMeridian_Award
Meridian_Award
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 
3D-DRESD Polaris
3D-DRESD Polaris3D-DRESD Polaris
3D-DRESD Polaris
 
Block diagram representation
Block diagram representationBlock diagram representation
Block diagram representation
 
Power Systems Engineering - Matlab programs for Power system Simulation Lab -...
Power Systems Engineering - Matlab programs for Power system Simulation Lab -...Power Systems Engineering - Matlab programs for Power system Simulation Lab -...
Power Systems Engineering - Matlab programs for Power system Simulation Lab -...
 
Block diagram &amp;_overall_transferfunction_of_a_multiloop_control_system
Block diagram &amp;_overall_transferfunction_of_a_multiloop_control_systemBlock diagram &amp;_overall_transferfunction_of_a_multiloop_control_system
Block diagram &amp;_overall_transferfunction_of_a_multiloop_control_system
 
MGCP4LCSS Workflow
MGCP4LCSS WorkflowMGCP4LCSS Workflow
MGCP4LCSS Workflow
 
MATLAB programs Power System Simulation lab (Electrical Engineer)
MATLAB programs Power System Simulation  lab (Electrical Engineer)MATLAB programs Power System Simulation  lab (Electrical Engineer)
MATLAB programs Power System Simulation lab (Electrical Engineer)
 
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
 
Block diagram
Block diagramBlock diagram
Block diagram
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...
Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...
Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...
 
Model compression
Model compressionModel compression
Model compression
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
 

Andere mochten auch

Mortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked FirstMortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked FirstJustMKollum
 
01 Risk Management
01 Risk Management01 Risk Management
01 Risk ManagementOmid Djoudi
 
Molecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe oMolecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe oaleuribeo
 
New Product launch
New Product launchNew Product launch
New Product launchNihal Jain
 
Trends and issues in education present
Trends and issues in education presentTrends and issues in education present
Trends and issues in education presentLiyana Hamdan
 
Introduction to critical thinking
Introduction to critical thinkingIntroduction to critical thinking
Introduction to critical thinkingHawwa Shiuna
 
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...Nihal Jain
 
Asu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio PcmAsu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio Pcmenergy4you
 
Ispring sample
Ispring sampleIspring sample
Ispring samplelumaho
 
Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product. Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product. Nihal Jain
 

Andere mochten auch (17)

Mortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked FirstMortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked First
 
Media evaluation
Media evaluationMedia evaluation
Media evaluation
 
01 Risk Management
01 Risk Management01 Risk Management
01 Risk Management
 
Colossians 1v15
Colossians 1v15Colossians 1v15
Colossians 1v15
 
Seventhedition
SeventheditionSeventhedition
Seventhedition
 
03 Hadoop
03 Hadoop03 Hadoop
03 Hadoop
 
Molecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe oMolecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe o
 
Media evaluation
Media evaluationMedia evaluation
Media evaluation
 
New Product launch
New Product launchNew Product launch
New Product launch
 
Trends and issues in education present
Trends and issues in education presentTrends and issues in education present
Trends and issues in education present
 
Introduction to critical thinking
Introduction to critical thinkingIntroduction to critical thinking
Introduction to critical thinking
 
Iklan spa perak
Iklan spa perakIklan spa perak
Iklan spa perak
 
04 Algorithms
04 Algorithms04 Algorithms
04 Algorithms
 
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
 
Asu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio PcmAsu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio Pcm
 
Ispring sample
Ispring sampleIspring sample
Ispring sample
 
Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product. Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product.
 

Ähnlich wie 02 Map Reduce

MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Introduction to MapReduce using Disco
Introduction to MapReduce using DiscoIntroduction to MapReduce using Disco
Introduction to MapReduce using DiscoJim Roepcke
 
Introducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkIntroducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkSamuel Yee
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Geoff Rothman Presentation on Parallel Processing
Geoff Rothman Presentation on Parallel ProcessingGeoff Rothman Presentation on Parallel Processing
Geoff Rothman Presentation on Parallel ProcessingGeoff Rothman
 
MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#Erik Lebel
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 

Ähnlich wie 02 Map Reduce (20)

Intro to Map Reduce
Intro to Map ReduceIntro to Map Reduce
Intro to Map Reduce
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Map reduce team and yarn
Map reduce team and yarnMap reduce team and yarn
Map reduce team and yarn
 
Introduction to MapReduce using Disco
Introduction to MapReduce using DiscoIntroduction to MapReduce using Disco
Introduction to MapReduce using Disco
 
Introducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkIntroducing MapReduce Programming Framework
Introducing MapReduce Programming Framework
 
Scalding
ScaldingScalding
Scalding
 
E031201032036
E031201032036E031201032036
E031201032036
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
MapReduce
MapReduceMapReduce
MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Geoff Rothman Presentation on Parallel Processing
Geoff Rothman Presentation on Parallel ProcessingGeoff Rothman Presentation on Parallel Processing
Geoff Rothman Presentation on Parallel Processing
 
Hive
HiveHive
Hive
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 

02 Map Reduce

  • 1. MAP/REDUCE HPC4 Seminar IPM December 2011 Omid Djoudi od90125@yahoo.com 2011 IPM - HPC4 1
  • 2. Map Reduce Paradigm for divide / conquer Input list divided in n splits – processed by m blades Aggregation performed on the results Close integration between middleware and infrastructure Software and Hardware interact to achieve a very specific target -> Diverge from “service oriented” paradigm where infrastructure is abstracted from the program logic 2011 IPM - HPC4 2
  • 3. Map Reduce Program Data management INFRASTRUCTURE Program Data management INFRASTRUCTURE 2011 IPM - HPC4 3
  • 4. Map Reduce Programming model Based on functional programming Map : λx . x² Reduce : λx . λy . x+y -> Σ(xi)² Number of algorithms can be implemented in a sequence of map/reduce(s). 2011 IPM - HPC4 4
  • 5. Map Reduce Programming model 243253 Map -> intermediate values <2, 4> <4, 16> <3,9> <2,4> <5, 25> <3,9> Reduce -> final results <2,8> <3,18> <4,16> <5,25> Reduce phase cannot begin until Map has finished -> No streaming 2011 IPM - HPC4 5
  • 6. Map Reduce MAPPER : Responsible for data processing step 2011 IPM - HPC4 6
  • 7. Map Reduce REDUCE : Data output from map sorted and grouped on the same key <k22, v22><k21, v21><k21,v20> <k21, v21><k21,v20><k22,v22> <k21,[v21,v20]> <k22, v22> Reducer iterates on this list and combines the values for each key 2011 IPM - HPC4 7
  • 8. Map Reduce Map functions run in parallel Transform independent input data into independent intermediate data Reduce functions run in parallel Aggregate independent output keys No data sharing 2011 IPM - HPC4 8
  • 9. Map Reduce map (key,number): for each number in file-contents: emit (number, number²) reduce (key, values): sum = 0 for each value in values: sum = sum + value emit (word, sum) 2011 IPM - HPC4 9
  • 10. Map Reduce 2011 IPM - HPC4 10
  • 11. Map Reduce PARTITIONER : • We use multiple reducers • After shuffling, the icons of same shape will be in the same reducers. • The partitioner decides which keys goes where. 2011 IPM - HPC4 11
  • 12. Map Reduce NODE 1 NODE 2 NODE 3 2,3,5,2 2,3 5, 2, 5 MAPPER MAPPER MAPPER <2,4><3,9><5,25><2,4> <2,4><3,9> <5,25><2,4><5,25> PARTITIONER PARTITIONER PARTITIONER Output generated by map “shuffled” (transferred) to the corresponding reduce node NODE 5 NODE 6 <2,4><2,4><2,4> <3,9><3,9> <5,25><5,25><5,25> REDUCER REDUCER <2,12> <3,18> <5,75> 2011 IPM - HPC4 12
  • 13. Map Reduce COMBINER : • Intermediate step between Map and Reduce • Run on mapper nodes • Save bandwidth before sending data to reducer • Must be associative: (a op b) op c = a op (b op c) and commutative (a op b) = (b op a) 2011 IPM - HPC4 13
  • 14. Map Reduce 2011 IPM - HPC4 14
  • 15. Map Reduce The data presented to reduce() is sorted by key on each node -> The output of sort is put to a file, so reduce() can read the file sequentially and do the processing The sorting is actually performed during the map phase, and merged during shuffle phase on the reduce node. + Fair distribution of processing – optimisation in the middleware - No clear separation of responsibilities, more difficult to perform capacity planning. 2011 IPM - HPC4 15