SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Parallel Computing with R

Parallel Computing with R
Literature Seminar
Abhirup Mallik
malli066@umn.edu
School of Statistics
University of Minnesota

November 15, 2013
Parallel Computing with R
Why Parallel?

Why Parallel?

R does not take advantage of multiple cores by default
Does not support passing by reference
Parallel Computing with R
Why Parallel?

Why Parallel?

R does not take advantage of multiple cores by default
Does not support passing by reference
Can not read files dynamically ... etc..
Parallel Computing with R
Why Parallel?

Why Parallel?

R does not take advantage of multiple cores by default
Does not support passing by reference
Can not read files dynamically ... etc..
Parallel Computing with R
What is Parallel computing with R

What is Parallel?

’Parallel’ : Doing more than one tasks at the same time.
Use different cores of a same CPU for different tasks.
Parallel Computing with R
What is Parallel computing with R

What is Parallel?

’Parallel’ : Doing more than one tasks at the same time.
Use different cores of a same CPU for different tasks.
Use different computers in a cluster for different tasks.
Parallel Computing with R
What is Parallel computing with R

What is Parallel?

’Parallel’ : Doing more than one tasks at the same time.
Use different cores of a same CPU for different tasks.
Use different computers in a cluster for different tasks.
Parallel Computing with R
How to go Parallel?

Using Multicore (Implicit Parallelism)
Main process forks to child process which runs in parallel in
different cores.
1 library ( parallel )
2 mclapply (X , FUN , ...)

Or use
1
2
3
4
5
6

library ( parallel )
... setup stuff ..
for ( isplit in 1: nsplit ) {
mcparallel ( some R expression involving isplit )
}
out <- collect ()
Parallel Computing with R
How to go Parallel?

Warnings:
All child process compete for memory.
Closing terminal or closing any graphical window only kills
parent.
’CRTL + C’ Kills the parent, not the children.
Kill the children if they are unresponsive.
Parallel Computing with R
How to go Parallel?

Using SNOW (Explicit Parallelism)
Make a cluster by any one of these options
1 cl <- makeCluster ( spec , type , ...)
2 cl <- m a k e P S O C K c l u s t e r ( names , ...)
3 cl <- ma ke F or kC lu s te r ( nnodes = , ...)

Export essential objects to the cluster:
1 clusterExport ( cl , c ( var1 , fun1 , ..) )

Evaluate on cluster:
1 clusterEvalQ ( cl , expr )
2 parLapply ( cl = NULL , X , fun , ...)
3 parSapply ( cl = NULL , X , fun , ...)

Stop the cluster
Parallel Computing with R
Demonstration

Demonstration

Using Swiss fertility data from 1888 (R-base).
1 > str ( swiss )
2 ’ data . frame ’: 47 obs . of
3 $ Fertility
: num
4 $ Agriculture
: num
5 $ Examination
: int
6 $ Education
: int
7 $ Catholic
: num
8 $ Infant . Mortality : num

6 variables :
80.2 83.1 92.5 85.8 76.9 76.1 ...
17 45.1 39.7 36.5 43.5 35.3 ...
15 6 5 12 17 9 16 14 12 16 ...
12 9 5 7 15 7 7 8 7 13 ...
9.96 84.84 93.4 33.77 5.16 ...
22.2 22.2 20.2 20.3 20.6 26.6 ...
Parallel Computing with R
Demonstration

Demonstration
10 fold cross validation
1 fold <- sample ( seq (1 , 10) , size = nrow ( swiss ) ,
2
replace = TRUE )

Cross validation for ’i’th Fold
1 fold . cv <- function ( i ) {
2 train <- swiss [ fold ! = i , ]
3 test <- swiss [ fold == i , ]
4 swiss . rf <- randomForest ( sqrt ( Fertility ) ~ .
5
- Catholic + I ( Catholic < 50) , data = train )
6 predict . test <- predict ( swiss . rf , test , type = " response " )
7 actual . test <- sqrt ( test $ Fertility )
8 err <- predict . test - actual . test
9 sum ( err * err )
10 }
Parallel Computing with R
Demonstration

How to create a cluster?

Create a local cluster of size 4 (parallel socket)
1 cl <- m a k e P S O C K c l u s t e r (4)

Create a local cluster on different cores of the CPU (8 cores).
1 cl <- ma ke F or kC lu s te r (8)
Parallel Computing with R
Demonstration

How to create a cluster in our LAB?
Create password less log in using ssh keygen (from Shell):
1 ssh - keygen -t dsa
2 cat ~ / . ssh / id _ dsa . pub >> ~ / . ssh / authorized _ keys

#check which computers are running
1 grephosts LAB
2  # Then ssh all the computers you want to connect to once ,
and it will be remembered for the session .

Now we are ready to make a cluster:
1 library ( parallel )
2 machines <- c ( " crab " , " sugar " , " strike " , " hyland " , " lovejoy "
, " driller " )

3 address <- rapply ( lapply ( machines , nsl ) , c )
4 cl <- m a k e P S O C K c l u s t e r ( address )
Parallel Computing with R
Demonstration

How to create a cluster in our LAB?

If you are connecting to stat.umn.edu from your own computer, to
create a password-less ssh session:
1 ssh - keygen -t dsa
2  # Then use scp to copy id _ dsa . pub to ~ / . ssh / authorized _ keys
Parallel Computing with R
Demonstration

Comparison
On cluster:
1
2
3
4
5
6
7
8
9
10

> system . time ({
+
garbage <- clusterEvalQ ( cl , data ( swiss ) )
+
garbage <- clusterEvalQ ( cl , library ( randomForest ) )
+
clusterExport ( cl , c ( " fold " , " fold . cv " ) )
+
c l u s t e r S e t R N G S t r e a m ( cl , 123)
+
res3 <- do . call (c , parLapply ( cl , 1:10 , fold . cv ) )
+
stopCluster ( cl )
+ })
user system elapsed
0.008
0.000
0.838

On Multicore:
1 > system . time ({
2 +
res1 <- do . call (c , mclapply (1:10 , fold . cv , mc . cores = 8) )
3
4

})
user
0.386

system elapsed
0.162
0.120
Parallel Computing with R
Demonstration

Using Fork cluster:
1
2
3
4
5
6
7
8
9
10
11

> system . time ({
+
cl <- m ak eF o rk Cl us t er (8)
+
garbage <- clusterEvalQ ( cl , data ( swiss ) )
+
garbage <- clusterEvalQ ( cl , library ( randomForest ) )
+
clusterExport ( cl , c ( " fold " , " fold . cv " ) )
+
c l u s t e r S e t R N G S t r e a m ( cl , 123)
+
res3 <- do . call (c , parLapply ( cl , 1:10 , fold . cv ) )
+
stopCluster ( cl )
+ })
user system elapsed
0.010
0.054
0.153

Without any parallelization:
1 > system . time ({
2 +
res2 <- do . call (c , lapply (1:10 , fold . cv ) )
3 +
})
4
user system elapsed
5
0.233
0.000
0.235
Parallel Computing with R
When to go Parallel?

When to go Parallel?

When gain from parallelization is much more than the cost of
data transfer, network delays, etc...
If the problem is Embarrassingly parallel: No dependency
between the parallel tasks.
Parallel Computing with R
When to go Parallel?

When to go Parallel?

When gain from parallelization is much more than the cost of
data transfer, network delays, etc...
If the problem is Embarrassingly parallel: No dependency
between the parallel tasks.
Cross validation or Bootstrapping are examples where going
parallel would work.
Parallel Computing with R
When to go Parallel?

When to go Parallel?

When gain from parallelization is much more than the cost of
data transfer, network delays, etc...
If the problem is Embarrassingly parallel: No dependency
between the parallel tasks.
Cross validation or Bootstrapping are examples where going
parallel would work.
Iterative numerical methods like co-ordinate descent or
Newton-Rapson, going parallel may not be possible.
Parallel Computing with R
When to go Parallel?

When to go Parallel?

When gain from parallelization is much more than the cost of
data transfer, network delays, etc...
If the problem is Embarrassingly parallel: No dependency
between the parallel tasks.
Cross validation or Bootstrapping are examples where going
parallel would work.
Iterative numerical methods like co-ordinate descent or
Newton-Rapson, going parallel may not be possible.
Parallel Computing with R
To infinity and beyond

What is beyond the wall?

Parallelization in Big data framework: RHadoop
Other and related implementations of parallelization: MPI,
NWS, etc...
Other cool libraries: foreach, snowfall, etc...
GPU !!
Parallel Computing with R
Where to get codes?

Where to get the codes?

All the codes in this presentation is available at :
https://github.com/abhirupkgp/parallelseminar/blob/master/cv.R
Parallel Computing with R
References

Acknowledgements and References

Sincere thanks to Charles Geyer
Resourceful slides by Ryan Rosario.
Some other and more resourceful slides.
Parallel R Book
Parallel Computing with R
Thank You

Thank You !!

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 

Was ist angesagt? (20)

Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
FastR+Apache Flink
FastR+Apache FlinkFastR+Apache Flink
FastR+Apache Flink
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 

Andere mochten auch

Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
Ryan Rosario
 
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Ryan Rosario
 
Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and concepts
Ajay Ohri
 

Andere mochten auch (16)

Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
Taking R to the Limit (High Performance Computing in R), Part 1 -- Paralleliz...
 
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
Taking R to the Limit (High Performance Computing in R), Part 2 -- Large Data...
 
Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and concepts
 
Massive scale analytics with Stratosphere using R
Massive scale analytics with Stratosphere using RMassive scale analytics with Stratosphere using R
Massive scale analytics with Stratosphere using R
 
Accessing R from Python using RPy2
Accessing R from Python using RPy2Accessing R from Python using RPy2
Accessing R from Python using RPy2
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on Clouds
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Ähnlich wie Parallel Computing with R

Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
Cdiscount
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 

Ähnlich wie Parallel Computing with R (20)

Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
 
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
 
Intro to OpenMP
Intro to OpenMPIntro to OpenMP
Intro to OpenMP
 
Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic Programming
 
St Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel RSt Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel R
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubix
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
cluster(python)
cluster(python)cluster(python)
cluster(python)
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
 

Kürzlich hochgeladen

Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Kürzlich hochgeladen (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Parallel Computing with R

  • 1. Parallel Computing with R Parallel Computing with R Literature Seminar Abhirup Mallik malli066@umn.edu School of Statistics University of Minnesota November 15, 2013
  • 2. Parallel Computing with R Why Parallel? Why Parallel? R does not take advantage of multiple cores by default Does not support passing by reference
  • 3. Parallel Computing with R Why Parallel? Why Parallel? R does not take advantage of multiple cores by default Does not support passing by reference Can not read files dynamically ... etc..
  • 4. Parallel Computing with R Why Parallel? Why Parallel? R does not take advantage of multiple cores by default Does not support passing by reference Can not read files dynamically ... etc..
  • 5. Parallel Computing with R What is Parallel computing with R What is Parallel? ’Parallel’ : Doing more than one tasks at the same time. Use different cores of a same CPU for different tasks.
  • 6. Parallel Computing with R What is Parallel computing with R What is Parallel? ’Parallel’ : Doing more than one tasks at the same time. Use different cores of a same CPU for different tasks. Use different computers in a cluster for different tasks.
  • 7. Parallel Computing with R What is Parallel computing with R What is Parallel? ’Parallel’ : Doing more than one tasks at the same time. Use different cores of a same CPU for different tasks. Use different computers in a cluster for different tasks.
  • 8. Parallel Computing with R How to go Parallel? Using Multicore (Implicit Parallelism) Main process forks to child process which runs in parallel in different cores. 1 library ( parallel ) 2 mclapply (X , FUN , ...) Or use 1 2 3 4 5 6 library ( parallel ) ... setup stuff .. for ( isplit in 1: nsplit ) { mcparallel ( some R expression involving isplit ) } out <- collect ()
  • 9. Parallel Computing with R How to go Parallel? Warnings: All child process compete for memory. Closing terminal or closing any graphical window only kills parent. ’CRTL + C’ Kills the parent, not the children. Kill the children if they are unresponsive.
  • 10. Parallel Computing with R How to go Parallel? Using SNOW (Explicit Parallelism) Make a cluster by any one of these options 1 cl <- makeCluster ( spec , type , ...) 2 cl <- m a k e P S O C K c l u s t e r ( names , ...) 3 cl <- ma ke F or kC lu s te r ( nnodes = , ...) Export essential objects to the cluster: 1 clusterExport ( cl , c ( var1 , fun1 , ..) ) Evaluate on cluster: 1 clusterEvalQ ( cl , expr ) 2 parLapply ( cl = NULL , X , fun , ...) 3 parSapply ( cl = NULL , X , fun , ...) Stop the cluster
  • 11. Parallel Computing with R Demonstration Demonstration Using Swiss fertility data from 1888 (R-base). 1 > str ( swiss ) 2 ’ data . frame ’: 47 obs . of 3 $ Fertility : num 4 $ Agriculture : num 5 $ Examination : int 6 $ Education : int 7 $ Catholic : num 8 $ Infant . Mortality : num 6 variables : 80.2 83.1 92.5 85.8 76.9 76.1 ... 17 45.1 39.7 36.5 43.5 35.3 ... 15 6 5 12 17 9 16 14 12 16 ... 12 9 5 7 15 7 7 8 7 13 ... 9.96 84.84 93.4 33.77 5.16 ... 22.2 22.2 20.2 20.3 20.6 26.6 ...
  • 12. Parallel Computing with R Demonstration Demonstration 10 fold cross validation 1 fold <- sample ( seq (1 , 10) , size = nrow ( swiss ) , 2 replace = TRUE ) Cross validation for ’i’th Fold 1 fold . cv <- function ( i ) { 2 train <- swiss [ fold ! = i , ] 3 test <- swiss [ fold == i , ] 4 swiss . rf <- randomForest ( sqrt ( Fertility ) ~ . 5 - Catholic + I ( Catholic < 50) , data = train ) 6 predict . test <- predict ( swiss . rf , test , type = " response " ) 7 actual . test <- sqrt ( test $ Fertility ) 8 err <- predict . test - actual . test 9 sum ( err * err ) 10 }
  • 13. Parallel Computing with R Demonstration How to create a cluster? Create a local cluster of size 4 (parallel socket) 1 cl <- m a k e P S O C K c l u s t e r (4) Create a local cluster on different cores of the CPU (8 cores). 1 cl <- ma ke F or kC lu s te r (8)
  • 14. Parallel Computing with R Demonstration How to create a cluster in our LAB? Create password less log in using ssh keygen (from Shell): 1 ssh - keygen -t dsa 2 cat ~ / . ssh / id _ dsa . pub >> ~ / . ssh / authorized _ keys #check which computers are running 1 grephosts LAB 2 # Then ssh all the computers you want to connect to once , and it will be remembered for the session . Now we are ready to make a cluster: 1 library ( parallel ) 2 machines <- c ( " crab " , " sugar " , " strike " , " hyland " , " lovejoy " , " driller " ) 3 address <- rapply ( lapply ( machines , nsl ) , c ) 4 cl <- m a k e P S O C K c l u s t e r ( address )
  • 15. Parallel Computing with R Demonstration How to create a cluster in our LAB? If you are connecting to stat.umn.edu from your own computer, to create a password-less ssh session: 1 ssh - keygen -t dsa 2 # Then use scp to copy id _ dsa . pub to ~ / . ssh / authorized _ keys
  • 16. Parallel Computing with R Demonstration Comparison On cluster: 1 2 3 4 5 6 7 8 9 10 > system . time ({ + garbage <- clusterEvalQ ( cl , data ( swiss ) ) + garbage <- clusterEvalQ ( cl , library ( randomForest ) ) + clusterExport ( cl , c ( " fold " , " fold . cv " ) ) + c l u s t e r S e t R N G S t r e a m ( cl , 123) + res3 <- do . call (c , parLapply ( cl , 1:10 , fold . cv ) ) + stopCluster ( cl ) + }) user system elapsed 0.008 0.000 0.838 On Multicore: 1 > system . time ({ 2 + res1 <- do . call (c , mclapply (1:10 , fold . cv , mc . cores = 8) ) 3 4 }) user 0.386 system elapsed 0.162 0.120
  • 17. Parallel Computing with R Demonstration Using Fork cluster: 1 2 3 4 5 6 7 8 9 10 11 > system . time ({ + cl <- m ak eF o rk Cl us t er (8) + garbage <- clusterEvalQ ( cl , data ( swiss ) ) + garbage <- clusterEvalQ ( cl , library ( randomForest ) ) + clusterExport ( cl , c ( " fold " , " fold . cv " ) ) + c l u s t e r S e t R N G S t r e a m ( cl , 123) + res3 <- do . call (c , parLapply ( cl , 1:10 , fold . cv ) ) + stopCluster ( cl ) + }) user system elapsed 0.010 0.054 0.153 Without any parallelization: 1 > system . time ({ 2 + res2 <- do . call (c , lapply (1:10 , fold . cv ) ) 3 + }) 4 user system elapsed 5 0.233 0.000 0.235
  • 18. Parallel Computing with R When to go Parallel? When to go Parallel? When gain from parallelization is much more than the cost of data transfer, network delays, etc... If the problem is Embarrassingly parallel: No dependency between the parallel tasks.
  • 19. Parallel Computing with R When to go Parallel? When to go Parallel? When gain from parallelization is much more than the cost of data transfer, network delays, etc... If the problem is Embarrassingly parallel: No dependency between the parallel tasks. Cross validation or Bootstrapping are examples where going parallel would work.
  • 20. Parallel Computing with R When to go Parallel? When to go Parallel? When gain from parallelization is much more than the cost of data transfer, network delays, etc... If the problem is Embarrassingly parallel: No dependency between the parallel tasks. Cross validation or Bootstrapping are examples where going parallel would work. Iterative numerical methods like co-ordinate descent or Newton-Rapson, going parallel may not be possible.
  • 21. Parallel Computing with R When to go Parallel? When to go Parallel? When gain from parallelization is much more than the cost of data transfer, network delays, etc... If the problem is Embarrassingly parallel: No dependency between the parallel tasks. Cross validation or Bootstrapping are examples where going parallel would work. Iterative numerical methods like co-ordinate descent or Newton-Rapson, going parallel may not be possible.
  • 22. Parallel Computing with R To infinity and beyond What is beyond the wall? Parallelization in Big data framework: RHadoop Other and related implementations of parallelization: MPI, NWS, etc... Other cool libraries: foreach, snowfall, etc... GPU !!
  • 23. Parallel Computing with R Where to get codes? Where to get the codes? All the codes in this presentation is available at : https://github.com/abhirupkgp/parallelseminar/blob/master/cv.R
  • 24. Parallel Computing with R References Acknowledgements and References Sincere thanks to Charles Geyer Resourceful slides by Ryan Rosario. Some other and more resourceful slides. Parallel R Book
  • 25. Parallel Computing with R Thank You Thank You !!