SlideShare ist ein Scribd-Unternehmen logo
Embarrassingly Parallel
Problems
CS5225 Parallel and Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Some slides adapted from Dr. Srinath Perera
Embarrassingly Parallel Problems
 A.k.a. Delightfully Parallel Problems
 Can be easily parallelizable
 Usually use simple communication patterns
 Usually work without much communication
among each other
 Map-Reduce programming model provides a
powerful abstraction to handle embarrassingly
parallel problems
2
Map-Reduce
 Common pattern to solve parallel problems
 Based on 2 constructs from functional programming,
map & reduce
 Introduced by Google
 Dean et. al., “MapReduce: Simplified Data Processing
on Large Clusters,” OSDI, 2004
 Extensible for different applications
 Scale to very large number of nodes
 Hide details like failures from users
3
High-Order Functions
 Programming languages (e.g., Java) pass data
as parameters & results of functions
 Higher-order functions pass both data as well as
functions as parameters or results of functions
 E.g., Python, Ruby, JavaScript
 For example
def f(x):
return x + 3
def g(function, x):
return function(x) * function(x)
print g(f, 7) 4
Map-Reduce
 Accepts 2 functions as inputs
1. Map function
 Y fn1(X)
 Accepts input X & outputs another Y
2. Reduce function
 Z fn2(List<Y>)
 Accepts array of Y’s & returns another output Z
5
Map-Reduce (Contd.)
 Map-reduce support is provided by a function
like following
 Y map-reduce(mapfn, reducefn, List<X>)
 Map reduce implementation takes list of inputs
(list) & does following
 Apply map function to each entry in the list, which
emit (key, value) pairs
 Collect results, group them by keys, & then pass them
to reduce function as array
6
Map-Reduce (Contd.)
7
Source: www.datasciencecentral.com/profiles/blogs/practical-
illustration-of-map-reduce-hadoop-style-on-real-data
Map-Reduce for Word Counting
8
Source: http://xiaochongzhang.me/blog/?p=338
How to do this for a large dataset using a distributed system?
In Class Activity
1. Card sorting
2. Card sorting with 2 rounds
3. Identify missing cards
9
Inspired by Marcio Silva's “The MapReduce Card Game” at
http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
Why Map-Reduce?
 Implementing same pattern in a distributed
system isn’t that easy
 Need to worry about communication, failures,
initialization, etc.
 MapReduce frameworks worry about all those
 You write map & reduce functions & call
framework
 It forces you to think parallel in design time
 It gives you a higher-level of abstraction to think in
 It’s very generic, & covers lot of usecases
 See http://wiki.apache.org/hadoop/PoweredBy
10
Map-Reduce Implementations
 Can be implemented in many ways
 In-memory implementation
 Distributed implementation
 Communication by messages
 Communication by file system
 Communication by databases
 Communication Requirements
 Need broadcast & reduce operations only
11
Map-Reduce with Hadoop
 Apache Hadoop is an implementation of Map-
reduce
 Handles all details about distributed execution
 You just have to give Map & Reduce functions
12
Map-Reduce Data Model
13
Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
Map-Reduce Data Model (Cont.)
 Hadoop breaks input data into multiple data items by
new lines & runs map function once for each data item
 When executed, map function outputs (key, value) pairs
 Hadoop collects all (key, value) pairs generated by map
function, sorts them by the key, & groups values with the
same key together into groups
 For each distinct key, Hadoop runs reduce function once
while passing key & list of values for that key as input
 Reduce function outputs (key, value) pairs, & Hadoop
writes them to a file as final result
14
Execution on a Cluster/Cloud
15
Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
MapReduce Execution
16
Source: Dean et. al.,
“MapReduce, OSDI, 2004
Designing Map-Reduce Applications
 You control task granularity by changing no of
map & reduce tasks
 How many map tasks?
 How many reduce tasks?
 Fine Grain  more parallelism  more
communication overhead and vise versa
 Usually frameworks handle load balancing &
failures
 If large number of maps are there, you need a
Combine Function as well
17
Examples
 Sorting
 How to sort an array of 1 million integers using
MapReduce?
 Inverted Index
 Normal index is a mapping from document to terms
 Inverted index is mapping from terms to documents
 If we have a million documents, how do we build a
inverted index using MapReduce?
 Frequency Distribution of Word Occurrences
 Count number of occurrences & build a histogram
18
Examples (Cont.)
 Stitch Imagery
 For Google maps, Google need to combine many
map data into a single set of data
 Business Intelligence
 A business want to create a graph of income
generated by each region & marketing money spend
on each region
19
Examples (Cont.)
 K-Means
 Assume you are given a list of earth quakes
coordinates happened in the world in last 50 years.
 You are asked to use K-Means Clustering algorithm
to find 10 locations around which those earth quakes
were located.
 K-Means starts with 10 random cluster locations.
 It proceeds iteratively, & at each iteration, it assigns each
data point (earth quake) to the closest cluster location
 At end of each iteration, it recalculates each cluster location
using mean of all data point coordinates assigned to that
location
 It stops when cluster locations doesn’t change after
recalculation 20
K-Means Algorithm
List kmeans(datapointsList , initialClustersList){
oldlocations = null;
newLocations = initialClustersList ;
while(oldlocations != newLocations){
for(d in datapointsList){
oldlocations = newLocations ;
newLocations = //recalculate locations
}
//assign d to closest location in newLocations
}
}
return newLocations ;
21

Weitere ähnliche Inhalte

Ähnlich wie Embarrassingly/Delightfully Parallel Problems

Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
Shubham Bansal
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
Geoffrey Fox
 
E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 
Big data
Big dataBig data
Big data
rajsandhu1989
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
Jyotirmoy Dey
 
B04 06 0918
B04 06 0918B04 06 0918
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
dharanis15
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
Robert Grossman
 
Hadoop
HadoopHadoop
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
ijsrd.com
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Yahoo Developer Network
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
Noha Elprince
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 

Ähnlich wie Embarrassingly/Delightfully Parallel Problems (20)

Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
E031201032036
E031201032036E031201032036
E031201032036
 
Big data
Big dataBig data
Big data
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Hadoop
HadoopHadoop
Hadoop
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 

Mehr von Dilum Bandara

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Dilum Bandara
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
Dilum Bandara
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
Dilum Bandara
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
Dilum Bandara
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Dilum Bandara
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
Dilum Bandara
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
Dilum Bandara
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
Dilum Bandara
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
Dilum Bandara
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
Dilum Bandara
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
Dilum Bandara
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dilum Bandara
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
Dilum Bandara
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
Dilum Bandara
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
Dilum Bandara
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
Dilum Bandara
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband Communication
Dilum Bandara
 

Mehr von Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband Communication
 

Kürzlich hochgeladen

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 

Kürzlich hochgeladen (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 

Embarrassingly/Delightfully Parallel Problems

  • 1. Embarrassingly Parallel Problems CS5225 Parallel and Concurrent Programming Dilum Bandara Dilum.Bandara@uom.lk Some slides adapted from Dr. Srinath Perera
  • 2. Embarrassingly Parallel Problems  A.k.a. Delightfully Parallel Problems  Can be easily parallelizable  Usually use simple communication patterns  Usually work without much communication among each other  Map-Reduce programming model provides a powerful abstraction to handle embarrassingly parallel problems 2
  • 3. Map-Reduce  Common pattern to solve parallel problems  Based on 2 constructs from functional programming, map & reduce  Introduced by Google  Dean et. al., “MapReduce: Simplified Data Processing on Large Clusters,” OSDI, 2004  Extensible for different applications  Scale to very large number of nodes  Hide details like failures from users 3
  • 4. High-Order Functions  Programming languages (e.g., Java) pass data as parameters & results of functions  Higher-order functions pass both data as well as functions as parameters or results of functions  E.g., Python, Ruby, JavaScript  For example def f(x): return x + 3 def g(function, x): return function(x) * function(x) print g(f, 7) 4
  • 5. Map-Reduce  Accepts 2 functions as inputs 1. Map function  Y fn1(X)  Accepts input X & outputs another Y 2. Reduce function  Z fn2(List<Y>)  Accepts array of Y’s & returns another output Z 5
  • 6. Map-Reduce (Contd.)  Map-reduce support is provided by a function like following  Y map-reduce(mapfn, reducefn, List<X>)  Map reduce implementation takes list of inputs (list) & does following  Apply map function to each entry in the list, which emit (key, value) pairs  Collect results, group them by keys, & then pass them to reduce function as array 6
  • 8. Map-Reduce for Word Counting 8 Source: http://xiaochongzhang.me/blog/?p=338 How to do this for a large dataset using a distributed system?
  • 9. In Class Activity 1. Card sorting 2. Card sorting with 2 rounds 3. Identify missing cards 9 Inspired by Marcio Silva's “The MapReduce Card Game” at http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
  • 10. Why Map-Reduce?  Implementing same pattern in a distributed system isn’t that easy  Need to worry about communication, failures, initialization, etc.  MapReduce frameworks worry about all those  You write map & reduce functions & call framework  It forces you to think parallel in design time  It gives you a higher-level of abstraction to think in  It’s very generic, & covers lot of usecases  See http://wiki.apache.org/hadoop/PoweredBy 10
  • 11. Map-Reduce Implementations  Can be implemented in many ways  In-memory implementation  Distributed implementation  Communication by messages  Communication by file system  Communication by databases  Communication Requirements  Need broadcast & reduce operations only 11
  • 12. Map-Reduce with Hadoop  Apache Hadoop is an implementation of Map- reduce  Handles all details about distributed execution  You just have to give Map & Reduce functions 12
  • 13. Map-Reduce Data Model 13 Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
  • 14. Map-Reduce Data Model (Cont.)  Hadoop breaks input data into multiple data items by new lines & runs map function once for each data item  When executed, map function outputs (key, value) pairs  Hadoop collects all (key, value) pairs generated by map function, sorts them by the key, & groups values with the same key together into groups  For each distinct key, Hadoop runs reduce function once while passing key & list of values for that key as input  Reduce function outputs (key, value) pairs, & Hadoop writes them to a file as final result 14
  • 15. Execution on a Cluster/Cloud 15 Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
  • 16. MapReduce Execution 16 Source: Dean et. al., “MapReduce, OSDI, 2004
  • 17. Designing Map-Reduce Applications  You control task granularity by changing no of map & reduce tasks  How many map tasks?  How many reduce tasks?  Fine Grain  more parallelism  more communication overhead and vise versa  Usually frameworks handle load balancing & failures  If large number of maps are there, you need a Combine Function as well 17
  • 18. Examples  Sorting  How to sort an array of 1 million integers using MapReduce?  Inverted Index  Normal index is a mapping from document to terms  Inverted index is mapping from terms to documents  If we have a million documents, how do we build a inverted index using MapReduce?  Frequency Distribution of Word Occurrences  Count number of occurrences & build a histogram 18
  • 19. Examples (Cont.)  Stitch Imagery  For Google maps, Google need to combine many map data into a single set of data  Business Intelligence  A business want to create a graph of income generated by each region & marketing money spend on each region 19
  • 20. Examples (Cont.)  K-Means  Assume you are given a list of earth quakes coordinates happened in the world in last 50 years.  You are asked to use K-Means Clustering algorithm to find 10 locations around which those earth quakes were located.  K-Means starts with 10 random cluster locations.  It proceeds iteratively, & at each iteration, it assigns each data point (earth quake) to the closest cluster location  At end of each iteration, it recalculates each cluster location using mean of all data point coordinates assigned to that location  It stops when cluster locations doesn’t change after recalculation 20
  • 21. K-Means Algorithm List kmeans(datapointsList , initialClustersList){ oldlocations = null; newLocations = initialClustersList ; while(oldlocations != newLocations){ for(d in datapointsList){ oldlocations = newLocations ; newLocations = //recalculate locations } //assign d to closest location in newLocations } } return newLocations ; 21

Hinweis der Redaktion

  1. Shovel example