02 Map Reduce

•

0 gefällt mir•338 views

Omid Djoudi

Map Reduce
Paradigm for divide / conquer

Input list divided in n splits – processed by m blades
Aggregation performed on the results

Close integration between middleware and infrastructure

Software and Hardware interact to achieve a very specific target
-> Diverge from “service oriented” paradigm where infrastructure is
abstracted from the program logic

2011 IPM - HPC4 2

Map Reduce

Program
Data management

INFRASTRUCTURE

Program

Data management

INFRASTRUCTURE

2011 IPM - HPC4 3

Map Reduce
Programming model

Based on functional programming

Map : λx . x²
Reduce : λx . λy . x+y
-> Σ(xi)²

Number of algorithms can be implemented in a sequence of
map/reduce(s).

2011 IPM - HPC4 4

Map Reduce
Programming model

243253
Map -> intermediate values
<2, 4> <4, 16> <3,9> <2,4> <5, 25> <3,9>
Reduce -> final results
<2,8> <3,18> <4,16> <5,25>

Reduce phase cannot begin until Map has finished
-> No streaming

2011 IPM - HPC4 5

Map Reduce
MAPPER :
Responsible for data processing step

2011 IPM - HPC4 6

Map Reduce
REDUCE :

Data output from map sorted
and grouped on the same key

<k22, v22><k21, v21><k21,v20>
<k21, v21><k21,v20><k22,v22>
<k21,[v21,v20]> <k22, v22>

Reducer iterates on this list
and combines the values for
each key

2011 IPM - HPC4 7

Map Reduce
Map functions run in parallel
Transform independent input data into independent
intermediate data

Reduce functions run in parallel
Aggregate independent output keys

No data sharing

2011 IPM - HPC4 8

Map Reduce

map (key,number):
for each number in file-contents:
emit (number, number²)

reduce (key, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)

2011 IPM - HPC4 9

Map Reduce
PARTITIONER :

• We use multiple reducers

• After shuffling, the icons of
same shape will be in the
same reducers.

• The partitioner decides
which keys goes where.

2011 IPM - HPC4 11

Map Reduce
NODE 1 NODE 2 NODE 3

2,3,5,2 2,3 5, 2, 5

MAPPER MAPPER MAPPER

<2,4><3,9><5,25><2,4> <2,4><3,9> <5,25><2,4><5,25>

PARTITIONER PARTITIONER PARTITIONER

Output generated by map “shuffled”
(transferred) to the corresponding
reduce node
NODE 5 NODE 6

<2,4><2,4><2,4>
<3,9><3,9>
<5,25><5,25><5,25>

REDUCER REDUCER

<2,12>
<3,18>
<5,75>

2011 IPM - HPC4 12

Map Reduce
COMBINER :

• Intermediate step between
Map and Reduce
• Run on mapper nodes
• Save bandwidth before
sending data to reducer
• Must be associative:
(a op b) op c = a op (b op c)
and commutative
(a op b) = (b op a)

2011 IPM - HPC4 13

Map Reduce
The data presented to reduce() is sorted by key on each node

-> The output of sort is put to a file, so reduce() can read the file sequentially
and do the processing

The sorting is actually performed during the map phase, and merged during
shuffle phase on the reduce node.

+ Fair distribution of processing – optimisation in the middleware

- No clear separation of responsibilities, more difficult to perform capacity
planning.

2011 IPM - HPC4 15

Empfohlen

MapReduce ParadigmNilaNila16

Main map reduceMasoumeh Rezaei Jam

Map reduce대호 김

An Enhanced MapReduce Model (on BSP)Yu Liu

My mapreduce1 presentationNoha Elprince

Assignment of Different-Sized Inputs in MapReduceShantanu Sharma

MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh

MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin

Empfohlen

MapReduce ParadigmNilaNila16

Main map reduceMasoumeh Rezaei Jam

Map reduce대호 김

An Enhanced MapReduce Model (on BSP)Yu Liu

My mapreduce1 presentationNoha Elprince

Assignment of Different-Sized Inputs in MapReduceShantanu Sharma

MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh

MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin

Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh

Embedded programming u3 part 1Karthik Vivek

Creating Slope-Enhanced Shaded Relief Using Global MapperKent D. Brown

Wei's notes on MapReduce SchedulingLu Wei

Meridian_AwardMike Osbourn

C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin

3D-DRESD PolarisMarco Santambrogio

Block diagram representationnirali monani

Power Systems Engineering - Matlab programs for Power system Simulation Lab -...Mathankumar S

Block diagram &_overall_transferfunction_of_a_multiloop_control_systemPrashant thakur

MGCP4LCSS WorkflowSafe Software

MATLAB programs Power System Simulation lab (Electrical Engineer)Mathankumar S

FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...AishwaryaRavishankar8

Block diagramyash patel

Unit3 MapReduceIntegral university, India

Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...ISA Interchange

Map reduce - simplified data processing on large clustersCleverence Kombe

Compressor based approximate multiplier architectures for media processing ap...IJECEIAES

Model compressionNanhee Kim

Accurate Learning of Graph Representations with Graph Multiset PoolingMLAI2

Mortal Kombat- Who's Picked FirstJustMKollum

Media evaluationMarcusLloydAK

Weitere ähnliche Inhalte

Was ist angesagt?

Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh

Embedded programming u3 part 1Karthik Vivek

Creating Slope-Enhanced Shaded Relief Using Global MapperKent D. Brown

Wei's notes on MapReduce SchedulingLu Wei

Meridian_AwardMike Osbourn

C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin

3D-DRESD PolarisMarco Santambrogio

Block diagram representationnirali monani

Power Systems Engineering - Matlab programs for Power system Simulation Lab -...Mathankumar S

Block diagram &_overall_transferfunction_of_a_multiloop_control_systemPrashant thakur

MGCP4LCSS WorkflowSafe Software

MATLAB programs Power System Simulation lab (Electrical Engineer)Mathankumar S

FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...AishwaryaRavishankar8

Block diagramyash patel

Unit3 MapReduceIntegral university, India

Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...ISA Interchange

Map reduce - simplified data processing on large clustersCleverence Kombe

Compressor based approximate multiplier architectures for media processing ap...IJECEIAES

Model compressionNanhee Kim

Accurate Learning of Graph Representations with Graph Multiset PoolingMLAI2

Was ist angesagt? (20)

Mapreduce - Simplified Data Processing on Large Clusters

Embedded programming u3 part 1

Creating Slope-Enhanced Shaded Relief Using Global Mapper

Wei's notes on MapReduce Scheduling

Meridian_Award

C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

3D-DRESD Polaris

Block diagram representation

Power Systems Engineering - Matlab programs for Power system Simulation Lab -...

Block diagram &_overall_transferfunction_of_a_multiloop_control_system

MGCP4LCSS Workflow

MATLAB programs Power System Simulation lab (Electrical Engineer)

FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...

Block diagram

Unit3 MapReduce

Robust design of a 2 dof gmv controller a direct self-tuning and fuzzy schedu...

Map reduce - simplified data processing on large clusters

Compressor based approximate multiplier architectures for media processing ap...

Model compression

Accurate Learning of Graph Representations with Graph Multiset Pooling

Andere mochten auch

Mortal Kombat- Who's Picked FirstJustMKollum

Media evaluationMarcusLloydAK

01 Risk ManagementOmid Djoudi

Colossians 1v15jonathannorbury

SeventheditionMuhammad Shodiq

03 HadoopOmid Djoudi

Molecular biology folding alejandra uribe oaleuribeo

Media evaluationMarcusLloydAK

New Product launchNihal Jain

Trends and issues in education presentLiyana Hamdan

Introduction to critical thinkingHawwa Shiuna

Iklan spa perakLiyana Hamdan

04 AlgorithmsOmid Djoudi

Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...Nihal Jain

Asu December 2010 Application For Bio Pcmenergy4you

Ispring samplelumaho

Product launch od the most innovative & unique product. Nihal Jain

Andere mochten auch (17)

Mortal Kombat- Who's Picked First

Media evaluation

01 Risk Management

Colossians 1v15

Seventhedition

03 Hadoop

Molecular biology folding alejandra uribe o

Media evaluation

New Product launch

Trends and issues in education present

Introduction to critical thinking

Iklan spa perak

04 Algorithms

Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...

Asu December 2010 Application For Bio Pcm

Ispring sample

Product launch od the most innovative & unique product.

Ähnlich wie 02 Map Reduce

Intro to Map ReduceDoron Vainrub

MapReduce ParadigmDilip Reddy

Map reduce team and yarnManojkumarKumaresan1

Introduction to MapReduce using DiscoJim Roepcke

Introducing MapReduce Programming FrameworkSamuel Yee

ScaldingMario Pastorelli

E031201032036ijceronline

The google MapReduceRomain Jacotin

Hadoop-IntroductionSandeep Deshmukh

Introduction to HadoopApache Apex

Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease

MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13

MapReduceZuhair khayyat

Map ReducePrashant Gupta

Geoff Rothman Presentation on Parallel ProcessingGeoff Rothman

HiveSrinath Reddy

Hadoop Map ReduceVNIT-ACM Student Chapter

MapMap-Reduce recipes in with c#Erik Lebel

Introduction to Spark on HadoopCarol McDonald

Ähnlich wie 02 Map Reduce (20)

Intro to Map Reduce

MapReduce Paradigm

Map reduce team and yarn

Introduction to MapReduce using Disco

Introducing MapReduce Programming Framework

Scalding

E031201032036

The google MapReduce

Hadoop-Introduction

Introduction to Hadoop

Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)

MAP REDUCE IN DATA SCIENCE.pptx

MapReduce

Map Reduce

Geoff Rothman Presentation on Parallel Processing

Hive

Hadoop Map Reduce

MapMap-Reduce recipes in with c#

Introduction to Spark on Hadoop

02 Map Reduce

1. MAP/REDUCE HPC4 Seminar IPM December 2011 Omid Djoudi od90125@yahoo.com 2011 IPM - HPC4 1

2. Map Reduce Paradigm for divide / conquer Input list divided in n splits – processed by m blades Aggregation performed on the results Close integration between middleware and infrastructure Software and Hardware interact to achieve a very specific target -> Diverge from “service oriented” paradigm where infrastructure is abstracted from the program logic 2011 IPM - HPC4 2

3. Map Reduce Program Data management INFRASTRUCTURE Program Data management INFRASTRUCTURE 2011 IPM - HPC4 3

4. Map Reduce Programming model Based on functional programming Map : λx . x² Reduce : λx . λy . x+y -> Σ(xi)² Number of algorithms can be implemented in a sequence of map/reduce(s). 2011 IPM - HPC4 4

5. Map Reduce Programming model 243253 Map -> intermediate values <2, 4> <4, 16> <3,9> <2,4> <5, 25> <3,9> Reduce -> final results <2,8> <3,18> <4,16> <5,25> Reduce phase cannot begin until Map has finished -> No streaming 2011 IPM - HPC4 5

6. Map Reduce MAPPER : Responsible for data processing step 2011 IPM - HPC4 6

7. Map Reduce REDUCE : Data output from map sorted and grouped on the same key <k22, v22><k21, v21><k21,v20> <k21, v21><k21,v20><k22,v22> <k21,[v21,v20]> <k22, v22> Reducer iterates on this list and combines the values for each key 2011 IPM - HPC4 7

8. Map Reduce Map functions run in parallel Transform independent input data into independent intermediate data Reduce functions run in parallel Aggregate independent output keys No data sharing 2011 IPM - HPC4 8

9. Map Reduce map (key,number): for each number in file-contents: emit (number, number²) reduce (key, values): sum = 0 for each value in values: sum = sum + value emit (word, sum) 2011 IPM - HPC4 9

10. Map Reduce 2011 IPM - HPC4 10

11. Map Reduce PARTITIONER : • We use multiple reducers • After shuffling, the icons of same shape will be in the same reducers. • The partitioner decides which keys goes where. 2011 IPM - HPC4 11

12. Map Reduce NODE 1 NODE 2 NODE 3 2,3,5,2 2,3 5, 2, 5 MAPPER MAPPER MAPPER <2,4><3,9><5,25><2,4> <2,4><3,9> <5,25><2,4><5,25> PARTITIONER PARTITIONER PARTITIONER Output generated by map “shuffled” (transferred) to the corresponding reduce node NODE 5 NODE 6 <2,4><2,4><2,4> <3,9><3,9> <5,25><5,25><5,25> REDUCER REDUCER <2,12> <3,18> <5,75> 2011 IPM - HPC4 12

13. Map Reduce COMBINER : • Intermediate step between Map and Reduce • Run on mapper nodes • Save bandwidth before sending data to reducer • Must be associative: (a op b) op c = a op (b op c) and commutative (a op b) = (b op a) 2011 IPM - HPC4 13

14. Map Reduce 2011 IPM - HPC4 14

15. Map Reduce The data presented to reduce() is sorted by key on each node -> The output of sort is put to a file, so reduce() can read the file sequentially and do the processing The sorting is actually performed during the map phase, and merged during shuffle phase on the reduce node. + Fair distribution of processing – optimisation in the middleware - No clear separation of responsibilities, more difficult to perform capacity planning. 2011 IPM - HPC4 15