Lec2 Mapred

M
Distributed Computing Seminar Lecture 2: MapReduce Theory and Implementation Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the contents of this presentation are © Copyright 2007 University of Washington and licensed under the Creative Commons Attribution 2.5 License.
Outline ,[object Object],[object Object],[object Object],[object Object]
Functional Programming Review ,[object Object],[object Object],[object Object],[object Object]
Functional Programming Review ,[object Object],[object Object],[object Object]
Functional Updates Do Not Modify Structures ,[object Object],[object Object],[object Object],The append() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.  But it  never modifies lst !
Functions Can Be Used As Arguments ,[object Object],It does not matter what f does to its argument; DoDouble() will do it twice. What is the type of this function?
Map ,[object Object],[object Object]
Fold ,[object Object],[object Object]
fold left vs. fold right ,[object Object],[object Object],[object Object],SML Implementation: fun foldl f a []  = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a []  = a | foldr f a (x::xs) = f(x, (foldr f a xs))
Example ,[object Object],[object Object],[object Object]
Example (Solved) ,[object Object],[object Object],[object Object],[object Object],[object Object]
A More Complicated Fold Problem ,[object Object],[object Object],[object Object]
A More Complicated Map Problem ,[object Object],[object Object]
map Implementation ,[object Object],[object Object],fun map f []  = [] | map f (x::xs) = (f x) :: (map f xs)
Implicit Parallelism In map ,[object Object],[object Object],[object Object]
MapReduce
Motivation: Large Scale Data Processing ,[object Object],[object Object],[object Object]
MapReduce ,[object Object],[object Object],[object Object],[object Object]
Programming Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
map ,[object Object],[object Object]
reduce ,[object Object],[object Object],[object Object]
 
Parallelism ,[object Object],[object Object],[object Object],[object Object]
Example: Count word occurrences map(String input_key, String input_value): // input_key: document name  // input_value: document contents  for each  word w  in  input_value:  EmitIntermediate (w, "1");  reduce(String output_key, Iterator intermediate_values):  // output_key: a word  // output_values: a list of counts  int  result = 0;  for each  v  in  intermediate_values:  result += ParseInt(v); Emit (AsString(result));
Example vs. Actual Source Code ,[object Object],[object Object],[object Object],[object Object]
Locality ,[object Object],[object Object]
Fault Tolerance ,[object Object],[object Object],[object Object],[object Object],[object Object]
Optimizations ,[object Object],[object Object],[object Object],Why is it safe to redundantly execute map tasks? Wouldn’t this mess up the total computation?
Optimizations ,[object Object],[object Object],Under what conditions is it sound to use a combiner?
MapReduce Conclusions ,[object Object],[object Object],[object Object],[object Object]
Next Time... ,[object Object]
1 von 31

Recomendados

Map reduce (from Google) von
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)Sri Prasanna
657 views31 Folien
Mapreduce: Theory and implementation von
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementationSri Prasanna
1.7K views30 Folien
Aggregators: Data Day Texas, 2015 von
Aggregators: Data Day Texas, 2015Aggregators: Data Day Texas, 2015
Aggregators: Data Day Texas, 2015johnynek
670 views40 Folien
Relational Algebra and MapReduce von
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
14.1K views23 Folien
Stack von
StackStack
Stackmaamir farooq
169 views5 Folien
Data Visualization With R von
Data Visualization With RData Visualization With R
Data Visualization With RRsquared Academy
4K views132 Folien

Más contenido relacionado

Was ist angesagt?

Optimization of graph storage using GoFFish von
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFishAnushree Prasanna Kumar
183 views17 Folien
Map algebra von
Map algebraMap algebra
Map algebraEhsan Hamzei
738 views14 Folien
MapReduce : Simplified Data Processing on Large Clusters von
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh
894 views15 Folien
Data Visualization With R: Introduction von
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: IntroductionRsquared Academy
598 views13 Folien
06 how to write a map reduce version of k-means clustering von
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
995 views5 Folien
Optimization for iterative queries on Mapreduce von
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreducemakoto onizuka
1.1K views35 Folien

Was ist angesagt?(20)

MapReduce : Simplified Data Processing on Large Clusters von Abolfazl Asudeh
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
Abolfazl Asudeh894 views
Data Visualization With R: Introduction von Rsquared Academy
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: Introduction
Rsquared Academy598 views
06 how to write a map reduce version of k-means clustering von Subhas Kumar Ghosh
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh995 views
Optimization for iterative queries on Mapreduce von makoto onizuka
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
makoto onizuka1.1K views
Real Time Framework by Tonny von Agate Studio
Real Time Framework by TonnyReal Time Framework by Tonny
Real Time Framework by Tonny
Agate Studio479 views
Data Visualization With R: Learn To Combine Multiple Graphs von Rsquared Academy
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple Graphs
Rsquared Academy1.1K views
Mi primer map reduce von betabeers
Mi primer map reduceMi primer map reduce
Mi primer map reduce
betabeers518 views
Mi primer map reduce von Ruben Orta
Mi primer map reduceMi primer map reduce
Mi primer map reduce
Ruben Orta1.8K views
On Extending MapReduce - Survey and Experiments von Yu Liu
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
Yu Liu253 views
Parallel & Distributed Computing von rohit_ainapure
Parallel & Distributed ComputingParallel & Distributed Computing
Parallel & Distributed Computing
rohit_ainapure315 views
Data Visualization With R: Learn To Modify Font Of Graphical Parameters von Rsquared Academy
Data Visualization With R: Learn To Modify Font Of Graphical ParametersData Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
Rsquared Academy607 views

Destacado

Parallel Programming Primer 1 von
Parallel Programming Primer 1Parallel Programming Primer 1
Parallel Programming Primer 1mobius.cn
557 views48 Folien
Industrial management von
Industrial management Industrial management
Industrial management Anshu Singh
36.7K views107 Folien
Industrial management von
Industrial managementIndustrial management
Industrial managementAkshay Yawale
15.5K views581 Folien
Lexical analyzer von
Lexical analyzerLexical analyzer
Lexical analyzerAshwini Sonawane
48.3K views79 Folien
Compiler Design von
Compiler DesignCompiler Design
Compiler DesignMir Majid
46.2K views18 Folien
Compiler Chapter 1 von
Compiler Chapter 1Compiler Chapter 1
Compiler Chapter 1Huawei Technologies
53.3K views129 Folien

Destacado(9)

Parallel Programming Primer 1 von mobius.cn
Parallel Programming Primer 1Parallel Programming Primer 1
Parallel Programming Primer 1
mobius.cn557 views
Industrial management von Anshu Singh
Industrial management Industrial management
Industrial management
Anshu Singh36.7K views
Industrial management von Akshay Yawale
Industrial managementIndustrial management
Industrial management
Akshay Yawale15.5K views
Compiler Design von Mir Majid
Compiler DesignCompiler Design
Compiler Design
Mir Majid 46.2K views
Introduction to computer network von Ashita Agrawal
Introduction to computer networkIntroduction to computer network
Introduction to computer network
Ashita Agrawal226.1K views
BASIC CONCEPTS OF COMPUTER NETWORKS von Kak Yong
BASIC CONCEPTS OF COMPUTER NETWORKS BASIC CONCEPTS OF COMPUTER NETWORKS
BASIC CONCEPTS OF COMPUTER NETWORKS
Kak Yong614.9K views

Similar a Lec2 Mapred

Big data shim von
Big data shimBig data shim
Big data shimtistrue
613 views39 Folien
Map Reduce von
Map ReduceMap Reduce
Map ReduceSri Prasanna
534 views36 Folien
Hadoop Map Reduce von
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map ReduceVNIT-ACM Student Chapter
11.1K views20 Folien
Unit3 MapReduce von
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduceIntegral university, India
47 views11 Folien
MapReduce-Notes.pdf von
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdfAnilVijayagiri
5 views6 Folien
Functional Programming in F# von
Functional Programming in F#Functional Programming in F#
Functional Programming in F#Dmitri Nesteruk
777 views37 Folien

Similar a Lec2 Mapred(20)

Big data shim von tistrue
Big data shimBig data shim
Big data shim
tistrue613 views
Map reduce presentation von ateeq ateeq
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq1.8K views
Stacks,queues,linked-list von pinakspatel
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-list
pinakspatel1.6K views
I JUST NEED THE GRAPHH FILE PLEASE In this project yo.pdf von sukhvir71
I JUST NEED THE GRAPHH FILE PLEASE      In this project yo.pdfI JUST NEED THE GRAPHH FILE PLEASE      In this project yo.pdf
I JUST NEED THE GRAPHH FILE PLEASE In this project yo.pdf
sukhvir712 views
Introduction to MapReduce von Hassan A-j
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Hassan A-j852 views
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3 von Philip Schwarz
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Philip Schwarz806 views
On fuctional programming, high order functions, ML von Simone Di Maulo
On fuctional programming, high order functions, MLOn fuctional programming, high order functions, ML
On fuctional programming, high order functions, ML
Simone Di Maulo807 views
Fusing Transformations of Strict Scala Collections with Views von Philip Schwarz
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with Views
Philip Schwarz20 views
Multinomial Logistic Regression with Apache Spark von DB Tsai
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai12.9K views
Alpine Spark Implementation - Technical von alpinedatalabs
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
alpinedatalabs10.4K views

Más de mobius.cn

Lec4 Clustering von
Lec4 ClusteringLec4 Clustering
Lec4 Clusteringmobius.cn
1K views53 Folien
Lec3 Dfs von
Lec3 DfsLec3 Dfs
Lec3 Dfsmobius.cn
520 views29 Folien
Lec5 Pagerank von
Lec5 PagerankLec5 Pagerank
Lec5 Pagerankmobius.cn
480 views33 Folien
Lec1 Intro von
Lec1 IntroLec1 Intro
Lec1 Intromobius.cn
334 views37 Folien
Advanced Lighting Techniques Dan Baker (Meltdown 2005) von
Advanced Lighting Techniques   Dan Baker (Meltdown 2005)Advanced Lighting Techniques   Dan Baker (Meltdown 2005)
Advanced Lighting Techniques Dan Baker (Meltdown 2005)mobius.cn
2.9K views90 Folien
Influence map von
Influence mapInfluence map
Influence mapmobius.cn
5.7K views40 Folien

Más de mobius.cn(6)

Lec4 Clustering von mobius.cn
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
mobius.cn1K views
Lec5 Pagerank von mobius.cn
Lec5 PagerankLec5 Pagerank
Lec5 Pagerank
mobius.cn480 views
Lec1 Intro von mobius.cn
Lec1 IntroLec1 Intro
Lec1 Intro
mobius.cn334 views
Advanced Lighting Techniques Dan Baker (Meltdown 2005) von mobius.cn
Advanced Lighting Techniques   Dan Baker (Meltdown 2005)Advanced Lighting Techniques   Dan Baker (Meltdown 2005)
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
mobius.cn2.9K views
Influence map von mobius.cn
Influence mapInfluence map
Influence map
mobius.cn5.7K views

Lec2 Mapred

  • 1. Distributed Computing Seminar Lecture 2: MapReduce Theory and Implementation Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the contents of this presentation are © Copyright 2007 University of Washington and licensed under the Creative Commons Attribution 2.5 License.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.  
  • 23.
  • 24. Example: Count word occurrences map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate (w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit (AsString(result));
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.