SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Example Parallel Overview snow fork Summary
Parallel Computing with R
Péter Sólymos
Edmonton R User Group meeting, April 26, 2013
Example Parallel Overview snow fork Summary
Ovenbird example from 'detect' package
> str(oven)
'data.frame': 891 obs. of 11 variables:
$ count : int 1 0 0 1 0 0 0 0 0 0 ...
$ route : int 2 2 2 2 2 2 2 2 2 2 ...
$ stop : int 2 4 6 8 10 12 14 16 18 20 ...
$ pforest: num 0.947 0.903 0.814 0.89 0.542 ...
$ pdecid : num 0.575 0.562 0.549 0.679 0.344 ...
$ pagri : num 0 0 0 0 0.414 ...
$ long : num 609343 608556 607738 607680 607944 ...
$ lat : num 5949071 5947735 5946301 5944720 5943088 ...
$ observ : Factor w/ 4 levels "ARS","DW","RDW",..: 4 4 4 4 4 4 4 4 4 4 ...
$ julian : int 181 181 181 181 181 181 181 181 181 181 ...
$ timeday: int 2 4 6 8 10 12 14 16 18 20 ...
Example Parallel Overview snow fork Summary
NegBin GLM with bootstrap
> library(MASS)
> m <- glm.nb(count ~ pforest, oven)
> fun1 <- function(i) {
+ id <- sample.int(nrow(oven), nrow(oven), replace = TRUE)
+ coef(glm.nb(count ~ pforest, oven[id, ]))
+ }
> B <- 199
> system.time(bm <- sapply(1:B, fun1))
user system elapsed
26.79 0.02 27.11
> bm <- cbind(coef(m), bm)
> cbind(coef(summary(m))[, 1:2], `Boot. SE` = apply(bm, 1, sd))
Estimate Std. Error Boot. SE
(Intercept) -2.177 0.1277 0.1229
pforest 2.674 0.1709 0.1553
Example Parallel Overview snow fork Summary
Parallel bootstrap
> library(parallel)
> (cl <- makePSOCKcluster(3))
socket cluster with 3 nodes on host 'localhost'
> clusterExport(cl, "oven")
> tmp <- clusterEvalQ(cl, library(MASS))
> t0 <- proc.time()
> bm2 <- parSapply(cl, 1:B, fun1)
> proc.time() - t0
user system elapsed
0.00 0.00 11.06
> stopCluster(cl)
Example Parallel Overview snow fork Summary
High performance computing (HPC)
ˆ Parallel computing,
ˆ large memory and out-of-memory data,
ˆ interfaces for compiled code,
ˆ proling tools,
ˆ batch scheduling.
CRAN Task View: High-Performance and Parallel Computing with R
Example Parallel Overview snow fork Summary
Parallel computing
Embarassingly parallel problems:
ˆ bootstrap,
ˆ MCMC,
ˆ simulations.
Can be broken down into independent pieces.1
1Schmidberger et al. 2009 JSS: State of the Art in Parallel Computing with R
Example Parallel Overview snow fork Summary
Parallel computing
ˆ explicit (distributed memory),
ˆ implicit (shared memory),
ˆ grid,
ˆ Hadoop,
ˆ GPUs.
Example Parallel Overview snow fork Summary
Starting a cluster
 library(snow)
 cl - makeCluster(3, type = SOCK)
Cluster types:
ˆ SOCK, multicore
ˆ PVM, Parallel Virtual Machine
ˆ MPI, Message Passing Interface
ˆ NWS, NetWorkSpaces (multicore  grid)
Error: invalid connection
Example Parallel Overview snow fork Summary
Distribute stu, evaluate expressions
 clusterExport(cl, oven)
 clusterEvalQ(cl, library(MASS))
[[1]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
[[2]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
[[3]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
Example Parallel Overview snow fork Summary
Random Number Generation (RNG)
 library(rlecuyer)
 tmp - clusterEvalQ(cl, set.seed(1234))
 clusterEvalQ(cl, rnorm(5))
[[1]]
[1] -1.2071 0.2774 1.0844 -2.3457 0.4291
[[2]]
[1] -1.2071 0.2774 1.0844 -2.3457 0.4291
 snow:::clusterSetupRNG(cl)
[1] RNGstream
 clusterEvalQ(cl, rnorm(5))
[[1]]
[1] -1.14063 -0.49816 -0.76670 -0.04821 -1.09852
[[2]]
[1] 0.7050 0.4821 -1.2848 0.7198 0.7386
Important when calculating indices or doing simulations.
Example Parallel Overview snow fork Summary
Apply operations: split
 parallel:::parLapply
function (cl = NULL, X, fun, ...)
{
cl - defaultCluster(cl)
do.call(c, clusterApply(cl, x = splitList(X, length(cl)),
fun = lapply, fun, ...), quote = TRUE)
}
bytecode: 0x04c1eba8
environment: namespace:parallel
 snow:::splitList(1:10, length(cl))
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 6 7 8 9 10
Example Parallel Overview snow fork Summary
Apply operations: evaluate and combine
 f - function(i) i * 2
 (res - clusterApply(cl, snow:::splitList(1:10, length(cl)),
+ f))
[[1]]
[1] 2 4 6
[[2]]
[1] 8 10 12 14
[[3]]
[1] 16 18 20
 do.call(c, res)
[1] 2 4 6 8 10 12 14 16 18 20
Example Parallel Overview snow fork Summary
Apply operations: load balancing
 f - function(i) i * 2
 unlist(parallel:::parLapplyLB(cl, 1:10, f))
[1] 2 4 6 8 10 12 14 16 18 20
Example Parallel Overview snow fork Summary
Implicit parallelism
No need to distribute stu, only evaluate on child processes.
 mclapply(X, FUN, mc.cores)
Example Parallel Overview snow fork Summary
Summary
Parallel computing is not hard on a single computer.
Diculty comes in when using large, shared, and heterogeneous
resources.
 stopCluster(cl)

Weitere ähnliche Inhalte

Was ist angesagt?

SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrRomain Francois
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 
App-o-Lockalypse now!
App-o-Lockalypse now!App-o-Lockalypse now!
App-o-Lockalypse now!Oddvar Moe
 
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийOlga Lavrentieva
 
This is not your father's monitoring.
This is not your father's monitoring.This is not your father's monitoring.
This is not your father's monitoring.Mathias Herberts
 
OSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionOSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionShuya Osaki
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB
 
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...Amazon Web Services
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Mathias Herberts
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in Rmickey24
 
Kubernetes Tutorial
Kubernetes TutorialKubernetes Tutorial
Kubernetes TutorialCi Jie Li
 
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...akaptur
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapRodolphe Quiédeville
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨flyinweb
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 

Was ist angesagt? (20)

SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
App-o-Lockalypse now!
App-o-Lockalypse now!App-o-Lockalypse now!
App-o-Lockalypse now!
 
Profiling Ruby
Profiling RubyProfiling Ruby
Profiling Ruby
 
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
 
This is not your father's monitoring.
This is not your father's monitoring.This is not your father's monitoring.
This is not your father's monitoring.
 
OSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionOSTEP Chapter2 Introduction
OSTEP Chapter2 Introduction
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
 
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Tracing and awk in ns2
Tracing and awk in ns2Tracing and awk in ns2
Tracing and awk in ns2
 
Db2
Db2Db2
Db2
 
Kubernetes Tutorial
Kubernetes TutorialKubernetes Tutorial
Kubernetes Tutorial
 
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
 
C++ Optimization Tips
C++ Optimization TipsC++ Optimization Tips
C++ Optimization Tips
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 

Andere mochten auch (12)

Lesson 10 Application Program Interface
Lesson 10 Application Program InterfaceLesson 10 Application Program Interface
Lesson 10 Application Program Interface
 
Fork CMS
Fork CMSFork CMS
Fork CMS
 
FORK Overview
FORK OverviewFORK Overview
FORK Overview
 
Git & GitHub
Git & GitHubGit & GitHub
Git & GitHub
 
Unix kernal
Unix kernalUnix kernal
Unix kernal
 
Linux Process & CF scheduling
Linux Process & CF schedulingLinux Process & CF scheduling
Linux Process & CF scheduling
 
System call (Fork +Exec)
System call (Fork +Exec)System call (Fork +Exec)
System call (Fork +Exec)
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
 
Chapter 3 - Processes
Chapter 3 - ProcessesChapter 3 - Processes
Chapter 3 - Processes
 
Linux Programming
Linux ProgrammingLinux Programming
Linux Programming
 
System call
System callSystem call
System call
 
System calls
System callsSystem calls
System calls
 

Ähnlich wie Parallel Computing with R

Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Cdiscount
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwnARUN DN
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014Amazon Web Services
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningMax Kleiner
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
pstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databaseRiyaj Shamsudeen
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsemBO_Conference
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 

Ähnlich wie Parallel Computing with R (20)

Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
 
pstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle database
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 

Kürzlich hochgeladen

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Parallel Computing with R

  • 1. Example Parallel Overview snow fork Summary Parallel Computing with R Péter Sólymos Edmonton R User Group meeting, April 26, 2013
  • 2. Example Parallel Overview snow fork Summary Ovenbird example from 'detect' package > str(oven) 'data.frame': 891 obs. of 11 variables: $ count : int 1 0 0 1 0 0 0 0 0 0 ... $ route : int 2 2 2 2 2 2 2 2 2 2 ... $ stop : int 2 4 6 8 10 12 14 16 18 20 ... $ pforest: num 0.947 0.903 0.814 0.89 0.542 ... $ pdecid : num 0.575 0.562 0.549 0.679 0.344 ... $ pagri : num 0 0 0 0 0.414 ... $ long : num 609343 608556 607738 607680 607944 ... $ lat : num 5949071 5947735 5946301 5944720 5943088 ... $ observ : Factor w/ 4 levels "ARS","DW","RDW",..: 4 4 4 4 4 4 4 4 4 4 ... $ julian : int 181 181 181 181 181 181 181 181 181 181 ... $ timeday: int 2 4 6 8 10 12 14 16 18 20 ...
  • 3. Example Parallel Overview snow fork Summary NegBin GLM with bootstrap > library(MASS) > m <- glm.nb(count ~ pforest, oven) > fun1 <- function(i) { + id <- sample.int(nrow(oven), nrow(oven), replace = TRUE) + coef(glm.nb(count ~ pforest, oven[id, ])) + } > B <- 199 > system.time(bm <- sapply(1:B, fun1)) user system elapsed 26.79 0.02 27.11 > bm <- cbind(coef(m), bm) > cbind(coef(summary(m))[, 1:2], `Boot. SE` = apply(bm, 1, sd)) Estimate Std. Error Boot. SE (Intercept) -2.177 0.1277 0.1229 pforest 2.674 0.1709 0.1553
  • 4. Example Parallel Overview snow fork Summary Parallel bootstrap > library(parallel) > (cl <- makePSOCKcluster(3)) socket cluster with 3 nodes on host 'localhost' > clusterExport(cl, "oven") > tmp <- clusterEvalQ(cl, library(MASS)) > t0 <- proc.time() > bm2 <- parSapply(cl, 1:B, fun1) > proc.time() - t0 user system elapsed 0.00 0.00 11.06 > stopCluster(cl)
  • 5. Example Parallel Overview snow fork Summary High performance computing (HPC) ˆ Parallel computing, ˆ large memory and out-of-memory data, ˆ interfaces for compiled code, ˆ proling tools, ˆ batch scheduling. CRAN Task View: High-Performance and Parallel Computing with R
  • 6. Example Parallel Overview snow fork Summary Parallel computing Embarassingly parallel problems: ˆ bootstrap, ˆ MCMC, ˆ simulations. Can be broken down into independent pieces.1 1Schmidberger et al. 2009 JSS: State of the Art in Parallel Computing with R
  • 7. Example Parallel Overview snow fork Summary Parallel computing ˆ explicit (distributed memory), ˆ implicit (shared memory), ˆ grid, ˆ Hadoop, ˆ GPUs.
  • 8. Example Parallel Overview snow fork Summary Starting a cluster library(snow) cl - makeCluster(3, type = SOCK) Cluster types: ˆ SOCK, multicore ˆ PVM, Parallel Virtual Machine ˆ MPI, Message Passing Interface ˆ NWS, NetWorkSpaces (multicore grid) Error: invalid connection
  • 9. Example Parallel Overview snow fork Summary Distribute stu, evaluate expressions clusterExport(cl, oven) clusterEvalQ(cl, library(MASS)) [[1]] [1] MASS methods stats graphics [5] grDevices utils datasets base [[2]] [1] MASS methods stats graphics [5] grDevices utils datasets base [[3]] [1] MASS methods stats graphics [5] grDevices utils datasets base
  • 10. Example Parallel Overview snow fork Summary Random Number Generation (RNG) library(rlecuyer) tmp - clusterEvalQ(cl, set.seed(1234)) clusterEvalQ(cl, rnorm(5)) [[1]] [1] -1.2071 0.2774 1.0844 -2.3457 0.4291 [[2]] [1] -1.2071 0.2774 1.0844 -2.3457 0.4291 snow:::clusterSetupRNG(cl) [1] RNGstream clusterEvalQ(cl, rnorm(5)) [[1]] [1] -1.14063 -0.49816 -0.76670 -0.04821 -1.09852 [[2]] [1] 0.7050 0.4821 -1.2848 0.7198 0.7386 Important when calculating indices or doing simulations.
  • 11. Example Parallel Overview snow fork Summary Apply operations: split parallel:::parLapply function (cl = NULL, X, fun, ...) { cl - defaultCluster(cl) do.call(c, clusterApply(cl, x = splitList(X, length(cl)), fun = lapply, fun, ...), quote = TRUE) } bytecode: 0x04c1eba8 environment: namespace:parallel snow:::splitList(1:10, length(cl)) [[1]] [1] 1 2 3 4 5 [[2]] [1] 6 7 8 9 10
  • 12. Example Parallel Overview snow fork Summary Apply operations: evaluate and combine f - function(i) i * 2 (res - clusterApply(cl, snow:::splitList(1:10, length(cl)), + f)) [[1]] [1] 2 4 6 [[2]] [1] 8 10 12 14 [[3]] [1] 16 18 20 do.call(c, res) [1] 2 4 6 8 10 12 14 16 18 20
  • 13. Example Parallel Overview snow fork Summary Apply operations: load balancing f - function(i) i * 2 unlist(parallel:::parLapplyLB(cl, 1:10, f)) [1] 2 4 6 8 10 12 14 16 18 20
  • 14. Example Parallel Overview snow fork Summary Implicit parallelism No need to distribute stu, only evaluate on child processes. mclapply(X, FUN, mc.cores)
  • 15. Example Parallel Overview snow fork Summary Summary Parallel computing is not hard on a single computer. Diculty comes in when using large, shared, and heterogeneous resources. stopCluster(cl)