R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
VP of Product Development, Dr. Sue Ranney will also provide an overview of the features introduced in Revolution R Enterprise 6.0 including:
1. Big Data Generalized Linear Model, the new RevoScaleR function that provides a fast, scalable, distributable implementation of generalized linear models, offering impressive speed-ups relative to glm on in-memory data frames
2. Platform LSF Cluster Support, which allows you to create a distributed compute context for the Platform LSF workload manager
3. Azure Burst support added to RxHpcServer
4. Updated R engine (R 2.14.2)
5. Ability to use RevoScaleR analysis functions with non-xdf data sources such as SAS, SPSS or text
6. New methods for RxXdfData data sources including head, tail, names, dim, colnames, length, str, and formula
7. New function rxRoc for generating ROC curves
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
100% R and More: Plus What's New in Revolution R Enterprise 6.0
1. Revolution Confidential
R evolution R E nterpris e 6
100% R and More
P res ented by:
David S mith
V P Marketing and C ommunity
S ue R anney
V P P roduct Management
Revolution Confidential
3. In today’s webc as t: Revolution Confidential
About Open-Source R and Revolution R
Enterprise
What’s New in Revolution R Enterprise 6
Resources, Q&A
3
4. What is R ? Download the White PaperConfidential
R is Hot
Revolution
bit.ly/r-is-hot
Data analysis software
A programming language
Development platform designed by and for statisticians
An environment
Huge library of algorithms for data access, data
manipulation, analysis and graphics
An open-source software project
Free, open, and active
A community
Thousands of contributors, 2 million users
Resources and help in every domain
4
5. From: The R Ecosystem
R Us er C ommunity Revolution Confidential
bit.ly/R-ecosystem
5
6. R evolution R E nterpris e is Revolution Confidential
6
7. R P roduc tivity E nvironment (Windows )
Revolution Confidential
Script with type
ahead and code Solutions window
snippets for organizing
code and data
Sophisticated
debugging with
breakpoints , variable Objects
values etc. loaded in the
R
Environment
Packages Object
installed and details
loaded
http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm
7
8. P erformanc e: Multi-threaded Math Revolution Confidential
Open Revolution R
Source R Enterprise
Computation (4-core laptop) Open Source R Revolution R Speedup
Linear Algebra1
Matrix Multiply 176 sec 9.3 sec 18x
Cholesky Factorization 25.5 sec 1.3 sec 19x
Linear Discriminant Analysis 189 sec 74 sec 3x
General R Benchmarks2
R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x
R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/
8
9. A c ommon analytic platform ac ros s big
data arc hitec tures Revolution Confidential
File Based
Hadoop In-database
Cluster
9
10. R evoS c aleR on Dis tributed C omputing C lus ters
Revolution Confidential
(Windows HP C S erver, P latform L S F )
Compute
Node
Data
Partition
Compute
Data Node
Partition
BIGData
Master
Node
Partition Compute
DATA Node
Data
Partition
Compute
Node
Data Step, Statistical Summary, Tables/Cubes, Covariance,
Linear & Logistic Regression, GLM, K-means clustering, …
10
11. S c alable dis tributed c omputing with
R evolution R E nterpris e and Hadoop Revolution Confidential
Map-Reduce
RHadoop: http://bit.ly/RHadoop 11
12. In-Databas e E xec ution with IB M Netezza Revolution Confidential
More info: http://bit.ly/R-Netezza
12
13. E nterpris e-Wide Deployment Revolution Confidential
Production Research and Development
Revolution R Enterprise Server
+ Hadoop
+ IBM Netezza Data Scientists / Modelers
+ Server cluster
Management End-User Deployment
Console
Excel Web BI
RevoDeployR Server App
Web Services API
Analysts / Corporate Users
13
14. Revolution Confidential
On-Call Technical Support
Consulting
Migration | Analytics | Applications | Validation
Training
R | Revolution R | Statistical Topics
Systems Integration
BI | ERP | Databases | Cloud
www.revolutionanalytics.com/services 14
15. Why R evolution R ? Open-Source
R
RRE6
Workstation
RRE6
Revolution Confidential
Server
Interface with multiple data sources ✓ ✓✓ ✓✓
Exploratory data analysis ✓✓ ✓✓ ✓✓
Wide range of statistical methods ✓✓ ✓✓ ✓✓
Parallel Programming ✓ ✓ ✓✓
Multi-threaded performance ✘ ✓ ✓✓
Big Data Analytics ✘ ✓ ✓✓
Distributed Analytics (Grid / Cluster) ✘ Client ✓✓
Cloud Computing ✘ ✘ ✓✓
Hadoop Integration ✘ Client ✓✓
IBM Netezza Integration ✘ Client ✓✓
Multi-user support ✘ ✘ ✓✓
Scheduled, monitored batch production ✘ ✘ ✓✓
Secure code deployment, management ✘ ✘ ✓✓
Integration into Data Apps ✘ ✘ ✓✓
http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php 15
17. Revolution Confidential
What’s new in
R evolution R E nterpris e 6
P res ented by:
S ue R anney
V P P roduct Development
Revolution Confidential
18. R evolution R E nterpris e 6 Revolution Confidential
Key Areas of Enhancements
Latest stable release of open-source R (2.14.2)
High Performance Analytics: Fast, scalable,
distributable, full-featured analysis of huge data
sets
High Performance Computing: Run arbitrary R
functions in parallel across cores or nodes of a
cluster
19. R 2.14.2 Revolution Confidential
Incorporation of ‘parallel’ as base package
‘foreach’ users can use doParallel backend
Users of RevoScaleR’s ‘rxExec’ HPC function can
use new compute contexts to run arbitrary R
functions in parallel
Compute context for the ‘parallel’ package
Compute context for any ‘foreach’ backend
Standard functions and packages in R are pre-
compiled into byte-code using ‘compiler’
package
The benefit in speed depends on the specific
function but code’s performance can improve by a
factor of 2x times or more.
20. High P erformanc e A nalytic s (HPA ) in
R evoS c aleR Revolution Confidential
High Performance Computing + Data
Full-featured, fast, and scalable analysis
functions
Same code works on small and big data
Same code works on a variety of compute
contexts - a laptop, server, cluster, or the cloud
Scales approximately linearly with the number
of observations – without increasing memory
requirements
Revolution R Enterprise 20
21. Direc tly A nalyze E xternal Data S ets with
R evoS c aleR HPA F unc tions NE W Revolution Confidential
The RevoScaleR package provides easy ways to
directly access and analyze external data sets (data
sources)
Delimited ASCII
Fixed format ASCII
SAS data sets (.sas7bdat)
SPSS data sets (.sav)
ODBC connections
No need to have SAS or SPSS installed to access
data in SAS or SPSS file formats.
Get started on analyses without first importing data
Still have the option of importing into efficient .xdf file
format
Revolution R Enterprise 21
22. R evoS c aleR : HPA A lgorithms Revolution Confidential
Descriptive statistics (rxSummary)
Tables and cubes (rxCube, rxCrossTabs)
Correlations/covariances (rxCovCor, rxCor,
rxCov, rxSSCP)
K means clustering (rxKmeans)
Linear regressions (rxLinMod)
Logistic regressions (rxLogit)
Generalized Linear Models (rxGlm) NEW!
Predictions (scoring) (rxPredict)
Revolution R Enterprise 22
23. Tips for Handling B ig Data in R Revolution Confidential
Use algorithms that process data in chunks.
The functions provided with RevoScaleR are
scalable because they process data in ‘chunks.’
If the number of observations doubles, you can still
perform the same data analyses with the same
amount of memory – it will just take longer
Use functions optimized for big data
The implementations of RevoScaleR analysis
algorithms are all optimized for handling big data.
RevoScaleR analysis functions provide significant
speed improvements over alternatives, even if you
can fit all of your data in memory.
Revolution R Enterprise 23
25. B eyond In-Memory Data A nalys is Revolution Confidential
RevoScaleR functions can read from data sets on disk in
chunks, so you can increase the number of observations in
the data set beyond what can be analyzed in memory all at
once
RevoScaleR analysis functions process chunks of data in
parallel, taking greater advantage of your computing
resources (Parallel External Memory Algorithms)
Multiple cores on a desktop/server
Cluster/grids have added advantage of more hard drives
for storing & accessing data
Windows HPC Server Cluster
“Burst” computations to Azure in the cloud NEW
IBM Platform LSF Grid NEW
Revolution R Enterprise 25
26. ‘B ig Data’ G eneralized L inear Models NE W Revolution Confidential
Relaxes the assumptions for a standard
linear model.
Used in insurance, finance, biotech, and
other industries.
Example 1: Count data (Poisson)
Number of vehicles an auto policy holder owns
Number of credit cards a person holds
Number of bacterial colonies in a Petri dish
Revolution R Enterprise 26
27. Revolution Confidential
G L M: Other E xamples
Example 2: Positive values with positive
skew (Gamma)
Value of auto insurance claims for claims filed
Example 3: Positive data that also contains
exact zeros (Tweedie Model)
Data on insured vehicles (claims amount is zero
for many vehicles; range of positive claims
values for others)
Rainfall data
Revolution R Enterprise 27
28. Revolution Confidential
Quic k Demo Inc orporating rxG L M
Use 5% Sample of the U.S. 2000 Census to
look at annual property insurance premiums
Data manipulations: sub-sample data and
modify categorical data
Perform summary statistics; draw histogram
Estimate a Tweedie model using rxGlm
Estimate predictions for targeted demographic
characteristics
Visualize the results
Analyze bigger model using a cluster
Revolution R Enterprise 28
29. C loud C omputing with A zure B urs t NE WRevolution Confidential
Windows Azure is a cloud platform that
enables you to manage computations across
a global network of Microsoft-managed
datacenters
Revolution R Enterprise 6.0 can burst
computations to Windows Azure from
Windows HPC Server
Particularly suited to parallel HPC such as
simulations
29
30. A S imple S imulation E xample Revolution Confidential
For each run:
Generate data with a known distribution
(Using code that accompanies the article "Pure Premium
Regression with the Tweedie Model" by Glenn Meyers,
Actuarial Review, May 2009 )
Estimate the model using rxGLM
Compare the means of the estimated coefficients with
the known parameters of the underlying distribution
Do a small number of runs locally
Do a large number of runs ‘bursting’ to the Azure cloud
(monitor jobs with HPC Job Scheduler, just as with on-
premises nodes)
Revolution R Enterprise 30
32. T hank You! Revolution Confidential
Download slides, replay from today’s webinar
http://bit.ly/z9xUG9
Learn more about Revolution R Enterprise
Overview: revolutionanalytics.com/products
New feature videos:
http://www.revolutionanalytics.com/products/new-features.php
Contact Revolution Analytics
http://bit.ly/hey-revo
June 28: Achieving High-Performing, Simulation-Based
Operational Risk Measurement with RevoScaleR
David Humke, Vice President, The Northern Trust Company
www.revolutionanalytics.com/news-events/free-webinars
32
33. Revolution Confidential
The leading commercial provider of software and support for the
popular open source R statistics language.
www.revolutionanalytics.com
+1 (650) 646 9545
Twitter: @RevolutionR
33