SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Revolution Confidential




R evolution R E nterpris e 6
100% R and More




P res ented by:
David S mith
V P Marketing and C ommunity
S ue R anney
V P P roduct Management
               Revolution Confidential
Revolution Confidential




P oll Ques tion
    Which stats package do you use
                 most?
In today’s webc as t:                 Revolution Confidential




 About Open-Source R and Revolution R
  Enterprise

 What’s New in Revolution R Enterprise 6

 Resources, Q&A




                                                        3
What is R ?                          Download the White PaperConfidential
                                            R is Hot
                                                      Revolution



                                            bit.ly/r-is-hot
 Data analysis software
 A programming language
   Development platform designed by and for statisticians
 An environment
   Huge library of algorithms for data access, data
    manipulation, analysis and graphics
 An open-source software project
   Free, open, and active
 A community
   Thousands of contributors, 2 million users
   Resources and help in every domain

                                                                     4
From: The R Ecosystem
R Us er C ommunity                  Revolution Confidential
                     bit.ly/R-ecosystem




                                                      5
R evolution R E nterpris e is   Revolution Confidential




                                                  6
R P roduc tivity E nvironment (Windows )
                                                                                               Revolution Confidential
                                          Script with type
                                          ahead and code                           Solutions window
                                             snippets                               for organizing
                                                                                    code and data

     Sophisticated
    debugging with
 breakpoints , variable                              Objects
      values etc.                                 loaded in the
                                                       R
                                                  Environment
                  Packages                                                                           Object
                installed and                                                                        details
                   loaded




             http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm

                                                                                                                 7
P erformanc e: Multi-threaded Math                                                              Revolution Confidential




  Open                                                 Revolution R
  Source R                                               Enterprise




 Computation (4-core laptop)                Open Source R              Revolution R                Speedup
 Linear Algebra1
       Matrix Multiply                               176 sec                  9.3 sec                    18x
       Cholesky Factorization                       25.5 sec                  1.3 sec                    19x
       Linear Discriminant Analysis                  189 sec                  74 sec                       3x
 General R Benchmarks2
       R Benchmarks (Matrix Functions)                22 sec                  3.5 sec                      5x
       R Benchmarks (Program Control)                 5.6 sec                 5.4 sec        Not appreciable

                                         1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
                                         2. http://r.research.att.com/benchmarks/

                                                                                                                  8
A c ommon analytic platform ac ros s big
data arc hitec tures                   Revolution Confidential




                   File Based
    Hadoop                        In-database
                     Cluster




                                                         9
R evoS c aleR on Dis tributed C omputing C lus ters
                                             Revolution Confidential
(Windows HP C S erver, P latform L S F )

                                   Compute
                                    Node

        Data
       Partition
                                   Compute
        Data                        Node
       Partition
    BIGData
                                                        Master
                                                        Node
      Partition                    Compute
   DATA                             Node

        Data
       Partition
                                   Compute
                                    Node



 Data Step, Statistical Summary, Tables/Cubes, Covariance,
  Linear & Logistic Regression, GLM, K-means clustering, …


                                                                 10
S c alable dis tributed c omputing with
R evolution R E nterpris e and Hadoop              Revolution Confidential




             Map-Reduce




                  RHadoop: http://bit.ly/RHadoop                    11
In-Databas e E xec ution with IB M Netezza     Revolution Confidential




          More info: http://bit.ly/R-Netezza

                                                                12
E nterpris e-Wide Deployment                             Revolution Confidential


        Production                 Research and Development




  Revolution R Enterprise Server
  + Hadoop
  + IBM Netezza                     Data Scientists / Modelers
  + Server cluster


      Management                      End-User Deployment
       Console
                                   Excel        Web          BI
  RevoDeployR Server                            App



   Web Services API
                                     Analysts / Corporate Users

                                                                          13
Revolution Confidential




 On-Call Technical Support
 Consulting
   Migration | Analytics | Applications | Validation
 Training
   R | Revolution R | Statistical Topics
 Systems Integration
   BI | ERP | Databases | Cloud

                    www.revolutionanalytics.com/services                    14
Why R evolution R ?                                   Open-Source
                                                           R
                                                                                   RRE6
                                                                                 Workstation
                                                                                                    RRE6
                                                                                             Revolution Confidential
                                                                                                   Server
      Interface with multiple data sources                    ✓                       ✓✓                          ✓✓
                 Exploratory data analysis                  ✓✓                        ✓✓                          ✓✓
        Wide range of statistical methods                   ✓✓                        ✓✓                          ✓✓
                    Parallel Programming                      ✓                         ✓                         ✓✓
              Multi-threaded performance                      ✘                         ✓                         ✓✓
                        Big Data Analytics                    ✘                         ✓                         ✓✓
      Distributed Analytics (Grid / Cluster)                  ✘                      Client                       ✓✓
                         Cloud Computing                      ✘                         ✘                         ✓✓
                       Hadoop Integration                     ✘                      Client                       ✓✓
                  IBM Netezza Integration                     ✘                      Client                       ✓✓
                        Multi-user support                    ✘                         ✘                         ✓✓
   Scheduled, monitored batch production                      ✘                         ✘                         ✓✓
   Secure code deployment, management                         ✘                         ✘                         ✓✓
                Integration into Data Apps                    ✘                         ✘                         ✓✓

                                http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php        15
Revolution Confidential




P oll Ques tion
      What’s most important to you
     about Revolution R Enterprise?
Revolution Confidential




What’s new in
R evolution R E nterpris e 6




P res ented by:
S ue R anney
V P P roduct Development



               Revolution Confidential
R evolution R E nterpris e 6                  Revolution Confidential




 Key Areas of Enhancements
   Latest stable release of open-source R (2.14.2)
   High Performance Analytics: Fast, scalable,
    distributable, full-featured analysis of huge data
    sets
   High Performance Computing: Run arbitrary R
    functions in parallel across cores or nodes of a
    cluster
R 2.14.2                                            Revolution Confidential




 Incorporation of ‘parallel’ as base package
   ‘foreach’ users can use doParallel backend
   Users of RevoScaleR’s ‘rxExec’ HPC function can
    use new compute contexts to run arbitrary R
    functions in parallel
      Compute context for the ‘parallel’ package
      Compute context for any ‘foreach’ backend
 Standard functions and packages in R are pre-
  compiled into byte-code using ‘compiler’
  package
   The benefit in speed depends on the specific
    function but code’s performance can improve by a
    factor of 2x times or more.
High P erformanc e A nalytic s (HPA ) in
R evoS c aleR                                 Revolution Confidential




 High Performance Computing + Data
 Full-featured, fast, and scalable analysis
  functions
 Same code works on small and big data
 Same code works on a variety of compute
  contexts - a laptop, server, cluster, or the cloud
 Scales approximately linearly with the number
  of observations – without increasing memory
  requirements

                    Revolution R Enterprise                    20
Direc tly A nalyze E xternal Data S ets with
R evoS c aleR HPA F unc tions NE W                  Revolution Confidential




 The RevoScaleR package provides easy ways to
  directly access and analyze external data sets (data
  sources)
     Delimited ASCII
     Fixed format ASCII
     SAS data sets (.sas7bdat)
     SPSS data sets (.sav)
     ODBC connections
 No need to have SAS or SPSS installed to access
  data in SAS or SPSS file formats.
 Get started on analyses without first importing data
 Still have the option of importing into efficient .xdf file
  format

                       Revolution R Enterprise                       21
R evoS c aleR : HPA A lgorithms             Revolution Confidential




 Descriptive statistics (rxSummary)
 Tables and cubes (rxCube, rxCrossTabs)
 Correlations/covariances (rxCovCor, rxCor,
  rxCov, rxSSCP)
 K means clustering (rxKmeans)
 Linear regressions (rxLinMod)
 Logistic regressions (rxLogit)
 Generalized Linear Models (rxGlm) NEW!
 Predictions (scoring) (rxPredict)

                  Revolution R Enterprise                    22
Tips for Handling B ig Data in R                Revolution Confidential




 Use algorithms that process data in chunks.
   The functions provided with RevoScaleR are
    scalable because they process data in ‘chunks.’
   If the number of observations doubles, you can still
    perform the same data analyses with the same
    amount of memory – it will just take longer
 Use functions optimized for big data
   The implementations of RevoScaleR analysis
    algorithms are all optimized for handling big data.
   RevoScaleR analysis functions provide significant
    speed improvements over alternatives, even if you
    can fit all of your data in memory.

                     Revolution R Enterprise                     23
Revolution Confidential




                 24
B eyond In-Memory Data A nalys is                        Revolution Confidential




 RevoScaleR functions can read from data sets on disk in
  chunks, so you can increase the number of observations in
  the data set beyond what can be analyzed in memory all at
  once
 RevoScaleR analysis functions process chunks of data in
  parallel, taking greater advantage of your computing
  resources (Parallel External Memory Algorithms)
    Multiple cores on a desktop/server
    Cluster/grids have added advantage of more hard drives
     for storing & accessing data
       Windows HPC Server Cluster
       “Burst” computations to Azure in the cloud NEW
       IBM Platform LSF Grid NEW

                         Revolution R Enterprise                          25
‘B ig Data’ G eneralized L inear Models NE W  Revolution Confidential




 Relaxes the assumptions for a standard
  linear model.
 Used in insurance, finance, biotech, and
  other industries.
 Example 1: Count data (Poisson)
   Number of vehicles an auto policy holder owns
   Number of credit cards a person holds
   Number of bacterial colonies in a Petri dish

                    Revolution R Enterprise                    26
Revolution Confidential

G L M: Other E xamples
 Example 2: Positive values with positive
  skew (Gamma)
   Value of auto insurance claims for claims filed
 Example 3: Positive data that also contains
  exact zeros (Tweedie Model)
   Data on insured vehicles (claims amount is zero
    for many vehicles; range of positive claims
    values for others)
   Rainfall data
                    Revolution R Enterprise                    27
Revolution Confidential

Quic k Demo Inc orporating rxG L M
 Use 5% Sample of the U.S. 2000 Census to
  look at annual property insurance premiums
 Data manipulations: sub-sample data and
  modify categorical data
 Perform summary statistics; draw histogram
 Estimate a Tweedie model using rxGlm
 Estimate predictions for targeted demographic
  characteristics
 Visualize the results
 Analyze bigger model using a cluster

                  Revolution R Enterprise                    28
C loud C omputing with A zure B urs t NE WRevolution Confidential




 Windows Azure is a cloud platform that
  enables you to manage computations across
  a global network of Microsoft-managed
  datacenters
 Revolution R Enterprise 6.0 can burst
  computations to Windows Azure from
  Windows HPC Server
 Particularly suited to parallel HPC such as
  simulations

                                                           29
A S imple S imulation E xample                       Revolution Confidential




 For each run:
    Generate data with a known distribution
     (Using code that accompanies the article "Pure Premium
     Regression with the Tweedie Model" by Glenn Meyers,
     Actuarial Review, May 2009 )
    Estimate the model using rxGLM
 Compare the means of the estimated coefficients with
  the known parameters of the underlying distribution
 Do a small number of runs locally
 Do a large number of runs ‘bursting’ to the Azure cloud
  (monitor jobs with HPC Job Scheduler, just as with on-
  premises nodes)

                       Revolution R Enterprise                        30
Revolution Confidential




P oll Ques tion
   What new feature of Revolution R
   Enterprise 6 is most interesting to
                  you?
T hank You!                                                            Revolution Confidential



 Download slides, replay from today’s webinar
      http://bit.ly/z9xUG9

 Learn more about Revolution R Enterprise
    Overview: revolutionanalytics.com/products
    New feature videos:
     http://www.revolutionanalytics.com/products/new-features.php

 Contact Revolution Analytics
    http://bit.ly/hey-revo


       June 28: Achieving High-Performing, Simulation-Based
          Operational Risk Measurement with RevoScaleR
             David Humke, Vice President, The Northern Trust Company

         www.revolutionanalytics.com/news-events/free-webinars

                                                                                        32
Revolution Confidential




The leading commercial provider of software and support for the
          popular open source R statistics language.



                 www.revolutionanalytics.com
                     +1 (650) 646 9545
                   Twitter: @RevolutionR



                                                                          33

Weitere ähnliche Inhalte

Ähnlich wie 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...Revolution Analytics
 
New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis
New Features in Revolution R Enterprise 5.0 to Support Scalable Data AnalysisNew Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis
New Features in Revolution R Enterprise 5.0 to Support Scalable Data AnalysisRevolution Analytics
 
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Revolution Analytics
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionRevolution Analytics
 
Revolution R Enterprise: 100% R and More (14 Mar 2013)
Revolution R Enterprise: 100% R and More (14 Mar 2013)Revolution R Enterprise: 100% R and More (14 Mar 2013)
Revolution R Enterprise: 100% R and More (14 Mar 2013)Revolution Analytics
 
Scalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar PresentationScalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar PresentationRevolution Analytics
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobsBill Jacobs
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsenrusersla
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 

Ähnlich wie 100% R and More: Plus What's New in Revolution R Enterprise 6.0 (20)

Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
 
New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis
New Features in Revolution R Enterprise 5.0 to Support Scalable Data AnalysisNew Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis
New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
 
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and more
 
Revolution R Enterprise: 100% R and More (14 Mar 2013)
Revolution R Enterprise: 100% R and More (14 Mar 2013)Revolution R Enterprise: 100% R and More (14 Mar 2013)
Revolution R Enterprise: 100% R and More (14 Mar 2013)
 
Scalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar PresentationScalable Data Analysis in R Webinar Presentation
Scalable Data Analysis in R Webinar Presentation
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobs
 
Revolution R - 100% R and More
Revolution R - 100% R and MoreRevolution R - 100% R and More
Revolution R - 100% R and More
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsen
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 

Mehr von Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 

Mehr von Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Kürzlich hochgeladen

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

100% R and More: Plus What's New in Revolution R Enterprise 6.0

  • 1. Revolution Confidential R evolution R E nterpris e 6 100% R and More P res ented by: David S mith V P Marketing and C ommunity S ue R anney V P P roduct Management Revolution Confidential
  • 2. Revolution Confidential P oll Ques tion Which stats package do you use most?
  • 3. In today’s webc as t: Revolution Confidential  About Open-Source R and Revolution R Enterprise  What’s New in Revolution R Enterprise 6  Resources, Q&A 3
  • 4. What is R ? Download the White PaperConfidential R is Hot Revolution bit.ly/r-is-hot  Data analysis software  A programming language  Development platform designed by and for statisticians  An environment  Huge library of algorithms for data access, data manipulation, analysis and graphics  An open-source software project  Free, open, and active  A community  Thousands of contributors, 2 million users  Resources and help in every domain 4
  • 5. From: The R Ecosystem R Us er C ommunity Revolution Confidential bit.ly/R-ecosystem 5
  • 6. R evolution R E nterpris e is Revolution Confidential 6
  • 7. R P roduc tivity E nvironment (Windows ) Revolution Confidential Script with type ahead and code Solutions window snippets for organizing code and data Sophisticated debugging with breakpoints , variable Objects values etc. loaded in the R Environment Packages Object installed and details loaded http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm 7
  • 8. P erformanc e: Multi-threaded Math Revolution Confidential Open Revolution R Source R Enterprise Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra1 Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x General R Benchmarks2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ 8
  • 9. A c ommon analytic platform ac ros s big data arc hitec tures Revolution Confidential File Based Hadoop In-database Cluster 9
  • 10. R evoS c aleR on Dis tributed C omputing C lus ters Revolution Confidential (Windows HP C S erver, P latform L S F ) Compute Node Data Partition Compute Data Node Partition BIGData Master Node Partition Compute DATA Node Data Partition Compute Node  Data Step, Statistical Summary, Tables/Cubes, Covariance, Linear & Logistic Regression, GLM, K-means clustering, … 10
  • 11. S c alable dis tributed c omputing with R evolution R E nterpris e and Hadoop Revolution Confidential Map-Reduce RHadoop: http://bit.ly/RHadoop 11
  • 12. In-Databas e E xec ution with IB M Netezza Revolution Confidential More info: http://bit.ly/R-Netezza 12
  • 13. E nterpris e-Wide Deployment Revolution Confidential Production Research and Development Revolution R Enterprise Server + Hadoop + IBM Netezza Data Scientists / Modelers + Server cluster Management End-User Deployment Console Excel Web BI RevoDeployR Server App Web Services API Analysts / Corporate Users 13
  • 14. Revolution Confidential  On-Call Technical Support  Consulting  Migration | Analytics | Applications | Validation  Training  R | Revolution R | Statistical Topics  Systems Integration  BI | ERP | Databases | Cloud www.revolutionanalytics.com/services 14
  • 15. Why R evolution R ? Open-Source R RRE6 Workstation RRE6 Revolution Confidential Server Interface with multiple data sources ✓ ✓✓ ✓✓ Exploratory data analysis ✓✓ ✓✓ ✓✓ Wide range of statistical methods ✓✓ ✓✓ ✓✓ Parallel Programming ✓ ✓ ✓✓ Multi-threaded performance ✘ ✓ ✓✓ Big Data Analytics ✘ ✓ ✓✓ Distributed Analytics (Grid / Cluster) ✘ Client ✓✓ Cloud Computing ✘ ✘ ✓✓ Hadoop Integration ✘ Client ✓✓ IBM Netezza Integration ✘ Client ✓✓ Multi-user support ✘ ✘ ✓✓ Scheduled, monitored batch production ✘ ✘ ✓✓ Secure code deployment, management ✘ ✘ ✓✓ Integration into Data Apps ✘ ✘ ✓✓ http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php 15
  • 16. Revolution Confidential P oll Ques tion What’s most important to you about Revolution R Enterprise?
  • 17. Revolution Confidential What’s new in R evolution R E nterpris e 6 P res ented by: S ue R anney V P P roduct Development Revolution Confidential
  • 18. R evolution R E nterpris e 6 Revolution Confidential  Key Areas of Enhancements  Latest stable release of open-source R (2.14.2)  High Performance Analytics: Fast, scalable, distributable, full-featured analysis of huge data sets  High Performance Computing: Run arbitrary R functions in parallel across cores or nodes of a cluster
  • 19. R 2.14.2 Revolution Confidential  Incorporation of ‘parallel’ as base package  ‘foreach’ users can use doParallel backend  Users of RevoScaleR’s ‘rxExec’ HPC function can use new compute contexts to run arbitrary R functions in parallel  Compute context for the ‘parallel’ package  Compute context for any ‘foreach’ backend  Standard functions and packages in R are pre- compiled into byte-code using ‘compiler’ package  The benefit in speed depends on the specific function but code’s performance can improve by a factor of 2x times or more.
  • 20. High P erformanc e A nalytic s (HPA ) in R evoS c aleR Revolution Confidential  High Performance Computing + Data  Full-featured, fast, and scalable analysis functions  Same code works on small and big data  Same code works on a variety of compute contexts - a laptop, server, cluster, or the cloud  Scales approximately linearly with the number of observations – without increasing memory requirements Revolution R Enterprise 20
  • 21. Direc tly A nalyze E xternal Data S ets with R evoS c aleR HPA F unc tions NE W Revolution Confidential  The RevoScaleR package provides easy ways to directly access and analyze external data sets (data sources)  Delimited ASCII  Fixed format ASCII  SAS data sets (.sas7bdat)  SPSS data sets (.sav)  ODBC connections  No need to have SAS or SPSS installed to access data in SAS or SPSS file formats.  Get started on analyses without first importing data  Still have the option of importing into efficient .xdf file format Revolution R Enterprise 21
  • 22. R evoS c aleR : HPA A lgorithms Revolution Confidential  Descriptive statistics (rxSummary)  Tables and cubes (rxCube, rxCrossTabs)  Correlations/covariances (rxCovCor, rxCor, rxCov, rxSSCP)  K means clustering (rxKmeans)  Linear regressions (rxLinMod)  Logistic regressions (rxLogit)  Generalized Linear Models (rxGlm) NEW!  Predictions (scoring) (rxPredict) Revolution R Enterprise 22
  • 23. Tips for Handling B ig Data in R Revolution Confidential  Use algorithms that process data in chunks.  The functions provided with RevoScaleR are scalable because they process data in ‘chunks.’  If the number of observations doubles, you can still perform the same data analyses with the same amount of memory – it will just take longer  Use functions optimized for big data  The implementations of RevoScaleR analysis algorithms are all optimized for handling big data.  RevoScaleR analysis functions provide significant speed improvements over alternatives, even if you can fit all of your data in memory. Revolution R Enterprise 23
  • 25. B eyond In-Memory Data A nalys is Revolution Confidential  RevoScaleR functions can read from data sets on disk in chunks, so you can increase the number of observations in the data set beyond what can be analyzed in memory all at once  RevoScaleR analysis functions process chunks of data in parallel, taking greater advantage of your computing resources (Parallel External Memory Algorithms)  Multiple cores on a desktop/server  Cluster/grids have added advantage of more hard drives for storing & accessing data  Windows HPC Server Cluster  “Burst” computations to Azure in the cloud NEW  IBM Platform LSF Grid NEW Revolution R Enterprise 25
  • 26. ‘B ig Data’ G eneralized L inear Models NE W Revolution Confidential  Relaxes the assumptions for a standard linear model.  Used in insurance, finance, biotech, and other industries.  Example 1: Count data (Poisson)  Number of vehicles an auto policy holder owns  Number of credit cards a person holds  Number of bacterial colonies in a Petri dish Revolution R Enterprise 26
  • 27. Revolution Confidential G L M: Other E xamples  Example 2: Positive values with positive skew (Gamma)  Value of auto insurance claims for claims filed  Example 3: Positive data that also contains exact zeros (Tweedie Model)  Data on insured vehicles (claims amount is zero for many vehicles; range of positive claims values for others)  Rainfall data Revolution R Enterprise 27
  • 28. Revolution Confidential Quic k Demo Inc orporating rxG L M  Use 5% Sample of the U.S. 2000 Census to look at annual property insurance premiums  Data manipulations: sub-sample data and modify categorical data  Perform summary statistics; draw histogram  Estimate a Tweedie model using rxGlm  Estimate predictions for targeted demographic characteristics  Visualize the results  Analyze bigger model using a cluster Revolution R Enterprise 28
  • 29. C loud C omputing with A zure B urs t NE WRevolution Confidential  Windows Azure is a cloud platform that enables you to manage computations across a global network of Microsoft-managed datacenters  Revolution R Enterprise 6.0 can burst computations to Windows Azure from Windows HPC Server  Particularly suited to parallel HPC such as simulations 29
  • 30. A S imple S imulation E xample Revolution Confidential  For each run:  Generate data with a known distribution (Using code that accompanies the article "Pure Premium Regression with the Tweedie Model" by Glenn Meyers, Actuarial Review, May 2009 )  Estimate the model using rxGLM  Compare the means of the estimated coefficients with the known parameters of the underlying distribution  Do a small number of runs locally  Do a large number of runs ‘bursting’ to the Azure cloud (monitor jobs with HPC Job Scheduler, just as with on- premises nodes) Revolution R Enterprise 30
  • 31. Revolution Confidential P oll Ques tion What new feature of Revolution R Enterprise 6 is most interesting to you?
  • 32. T hank You! Revolution Confidential  Download slides, replay from today’s webinar  http://bit.ly/z9xUG9  Learn more about Revolution R Enterprise  Overview: revolutionanalytics.com/products  New feature videos: http://www.revolutionanalytics.com/products/new-features.php  Contact Revolution Analytics  http://bit.ly/hey-revo June 28: Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR David Humke, Vice President, The Northern Trust Company www.revolutionanalytics.com/news-events/free-webinars 32
  • 33. Revolution Confidential The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR 33