SlideShare ist ein Scribd-Unternehmen logo
1 von 49
[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object]
RevoScaleR package ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Overview of my talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Scalability ,[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Keys to scalability ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Huge data sets are becoming common ,[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Huge benefits to huge data ,[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Two huge problems: capacity and speed ,[object Object],[object Object],[object Object],[object Object],[object Object]
We are currently incapable of analyzing a lot of the data we have ,[object Object],[object Object],[object Object],Revolution R Enterprise
Requires software solutions ,[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
High Performance Analytics: HPA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Revolution’s approach to HPA and scalability in the RevoScaleR package ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Some interface design goals ,[object Object],[object Object],[object Object],Revolution R Enterprise
Sample code for logit on laptop ,[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Sample code for logit on a cluster ,[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Sample data transformation code ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Chunking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
The basis for a solution for capacity, speed, distributed  and streaming data – PEMA’s ,[object Object],[object Object],[object Object]
External memory algorithms are widely available ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallel algorithms are widely available  ,[object Object],[object Object]
Not much literature on PEMA’s ,[object Object],[object Object],Revolution R Enterprise
Parallelizing external memory algorithms ,[object Object],[object Object],[object Object],[object Object]
Parallel Computations with Synchronization and Communication
Embarrassingly Parallel Computations
Sequential External Memory Computations
Parallel External Memory Computations
Example external memory algorithm for the mean of a variable ,[object Object],[object Object],[object Object],[object Object]
PEMA’s in RevoScaleR ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
PEMA’s in RevoScaleR (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Storing and reading data ,[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
The XDF file format ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
The XDF file format (2) ,[object Object],[object Object],[object Object],Revolution R Enterprise
Overview of data path on a computer ,[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Handling data in memory ,[object Object],[object Object],[object Object],Revolution R Enterprise
Use of multiple cores per computer ,[object Object],[object Object],[object Object],Revolution R Enterprise
Core 0 (Thread 0) Core n (Thread n) Core 2 (Thread 2) Core 1 (Thread 1) Multicore Processor (4, 8, 16+ cores) Data Data Data Disk RevoScaleR Shared Memory ,[object Object],[object Object],[object Object],[object Object],RevoScaleR – Multi-Threaded Processing
Use of multiple computers ,[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Compute Node (RevoScaleR) Compute Node (RevoScaleR) Master Node (RevoScaleR) Data Partition Data Partition Compute Node (RevoScaleR) Compute Node (RevoScaleR) Data Partition Data Partition ,[object Object],[object Object],[object Object],[object Object],RevoScaleR – Distributed Computing
Data management capabilities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Analysis algorithms  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Benchmarks using the airline data ,[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Distributed Import of Airline Data ,[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Scalability of RevoScaleR with Rows Regression, 1 million - 1.1 billion rows, 443 betas (4 core laptop)  Revolution R Enterprise
Scalability of RevoScaleR with Nodes Regression, 1 billion rows, 443 betas (1 to 5 nodes, 4 cores per node) Revolution R Enterprise
Scalability of RevoScaleR with Nodes Logit, 123.5 million rows, 443 betas, 5 iterations   (1 to 5 nodes, 4 cores per node) Revolution R Enterprise
Scalability of RevoScaleR with Nodes Logit, 1 billion rows, 443 betas, 5 iterations   (1 to 5 nodes, 4 cores per node) Revolution R Enterprise
Comparative benchmarks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Revolution R Enterprise
Contact information ,[object Object],[object Object],Revolution R Enterprise

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

R program
R programR program
R program
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
R language
R languageR language
R language
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
A brief introduction to 'R' statistical package
A brief introduction to 'R' statistical packageA brief introduction to 'R' statistical package
A brief introduction to 'R' statistical package
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
 
R programming language
R programming languageR programming language
R programming language
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
A short tutorial on r
A short tutorial on rA short tutorial on r
A short tutorial on r
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 

Andere mochten auch

Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)
AMIDST Toolbox
 

Andere mochten auch (7)

Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)
 
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
Using R for Analyzing Loans, Portfolios and Risk:  From Academic Theory to Fi...Using R for Analyzing Loans, Portfolios and Risk:  From Academic Theory to Fi...
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
 
R Spatial Analysis using SP
R Spatial Analysis using SPR Spatial Analysis using SP
R Spatial Analysis using SP
 
Facebook data analysis using r
Facebook data analysis using rFacebook data analysis using r
Facebook data analysis using r
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 

Ähnlich wie Scalable Data Analysis in R -- Lee Edlefsen

useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsen
rusersla
 
Scalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar PresentationScalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar Presentation
Revolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
MLconf
 

Ähnlich wie Scalable Data Analysis in R -- Lee Edlefsen (20)

useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsen
 
Scalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar PresentationScalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar Presentation
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 

Mehr von Revolution Analytics

The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
Revolution Analytics
 

Mehr von Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Scalable Data Analysis in R -- Lee Edlefsen