Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Analytics Platform
1. Announcing: Release 7
Revolution R Enterprise
Tuesday, November 5
Michele Chambers, Chief Strategy Officer and VP Product Management
Thomas W. Dinsmore, Director of Product Management
2. Agenda
Introduction
– Demystifying R
– Revolution Analytics at a Glance
– Revolution R Enterprise
– Revolution Analytics Partner Ecosystem
– Customer Testimonials
What‟s New in RRE 7?
More Information
Questions
2
4. R is exploding in popularity & function
Internet Discussion
Package Growth
Mean monthly traffic on email discussion list
Number of R packages listed on CRAN
4,332 as of
Feb 2013
R
Stata
SAS
SPSS
S-Plus
Web Site Popularity
Scholarly Activity
Number of links to main web site
Google Scholar hits (’05-’09 CAGR)
R
R
SAS
SAS
SPSS
SPSS -27%
S-Plus
S-Plus
Stata
Stata
46%
-11%
0%
10%
4
5. Latest survey shows significant growth in R adoption
R Usage Growth
Rexer Data Miner Survey, 2007-2013
70% of data miners report using R
“I’ve been astonished by the rate at
which R has been adopted. Four years
ago, everyone in my economics
department [at the University of
Chicago] was using Stata; now, as far
as I can tell, R is the standard tool, and
students learn it first.”
Deputy Editor for New Products at Forbes
24% use R as primary tool
“A key benefit of R is that it provides
near-instant availability of new and
experimental methods created by its
user base — without waiting for the
development/release cycle of
commercial software. SAS recognizes
the value of R to our customer base…”
Source: www.rexeranalytics.com
Product Marketing Manager SAS Institute, Inc
5
6. Revolution Analytics at a Glance
Who We Are
Only provider of commercial big data big analytics platform based
on open source R statistical computing language
Customers
200+ Global 2000
Our Software Delivers
Global Presence
North America / EMEA / APAC
Scalable Performance: Distributed & parallelized analytics
Cross Platform: Write once, deploy anywhere
Productivity: Easily build & deploy with latest modern analytics
Our Services Deliver
Knowledge: Our experts enable you to be experts
Time-to-Value: Our Quickstart program gives you a jumpstart
Guidance: Our customer support team is here to help you
Global Industries Served
Financial Services
Digital Media
Government
Health & Life Sciences
High Tech
Manufacturing
Retail
Telco
6
7. Revolution R Enterprise
is….
the only big data big analytics platform
based on open source R
the defacto statistical computing language for
modern analytics
High Performance, Scalable Analytics
Portable Across Enterprise Platforms
Easier to Build & Deploy Analytics
7
8. R is open source and drives analytic innovation
but….has some limitations for Enterprises
Big Data
In-memory bound
Hybrid memory & disk
scalability
Operates on bigger
volumes & factors
Speed of
Analysis
Single threaded
Parallel threading
Shrinks analysis time
Enterprise
Readiness
Community support
Commercial support
Delivers full service
production support
Analytic
Breadth &
Depth
5000+ innovative
analytic packages
Leverage open source
packages plus Big Data
ready packages
Supercharges R
Commercial
Viability
Risk of deployment of
open source
Commercial license
Eliminate risk with open
source
8
9. Introducing Revolution R Enterprise (RRE)
The Big Data Big Analytics Platform
Big Data Big Analytics Ready
– Enterprise readiness
DevelopR
ConnectR
ScaleR
DistributedR
DeployR
– High performance analytics
– Multi-platform architecture
– Data source integration
– Development tools
– Deployment tools
9
10. The Platform Step by Step:
R Capabilities
R+CRAN
RevoR
•
•
•
•
•
• Performance enhanced R interpreter
• Based on open source R
• Adds high-performance math
Open source R interpreter
Freely-available R algorithms
Algorithms callable by RevoR
Embeddable in R scripts
100% Compatible with existing
R scripts, functions and
packages
10
11. The Platform Step by Step:
Parallelization & Data Sourcing
ConnectR
• High-speed & direct connectors
ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical
tests
• Correlation & covariance matrices
• Predictive Models – linear, logistic,
GLM
• Machine learning
• Monte Carlo simulation
• Tools for distributing customized
algorithms across nodes
DistributedR
• Distributed computing framework
• Delivers portability across platforms
11
12. The Platform Step by Step:
Tools & Deployment
DevelopR
DeployR
• Integrated development
environment for R
• Visual „step-into‟ debugger
• Web services software
development kit for integration
analytics via Java, JavaScript or
.NET APIs
• Integrates R Into application
infrastructures
DevelopR
DeployR
Capabilities:
• Invokes R Scripts from
web services calls
• RESTful interface for
easy integration
• Works with web & mobile apps,
leading BI & Visualization tools and
business rules engines
12
13. Write Once. Deploy Anywhere.
Hadoop
Hortonworks
Cloudera
EDW
IBM Netezza
Teradata
Clustered Systems
IBM Platform LSF
Microsoft HPC
Workstations & Servers
Desktop
Server
In the Cloud
Microsoft Azure Burst
Amazon AWS
DeployR
ConnectR
ScaleR
DistributedR
DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE
13
14. The Power of Revolution R Enterprise
Performance & Scalability
ScaleR
ScaleR
Moves computation to data
ScaleR
V
a
l
u
e
Moves computation to data
Leverage CRAN
ScaleR
Labor saving power
DistributedR
Maximizes computation
DistributedR
Powerful divide & conquer
DistributedR
Effective memory utilization
RevoR
3-50X faster
Open Source
Leverage latest innovation
14
16. Revolution R Enterprise Revo R
Performance Enhanced R
Open
Source R
Customers reportRevolution R
3-50x
Enterprise
performance improvements
compared to Open Source R —
without changing any code
Computation (4-core laptop)
Open Source R
Revolution R
Speedup
Matrix Multiply
176 sec
9.3 sec
18x
Cholesky Factorization
25.5 sec
1.3 sec
19x
Linear Discriminant Analysis
189 sec
74 sec
3x
R Benchmarks (Matrix Functions)
22 sec
3.5 sec
5x
R Benchmarks (Program Control)
5.6 sec
5.4 sec
Not appreciable
Linear Algebra1
General R Benchmarks2
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/
16
17. RRE ScaleR outperforms SAS HPA – at a fraction of the cost
Logistic Regression:
Rows of data
1 billion
1 billion
Parameters
“just a few”
Double
7
Time
80 seconds
45%
44 seconds
Data location
In memory
Nodes
32
1/6th
5
Cores
384
5%
20
RAM
1,536 GB
5%
On disk
Revolution R is faster on the same amount of data, despite using approximately a
6th as many nodes, and not pre-loading data into RAM.
80 GB
20th
as many cores, a 20th as much RAM, a
Bottom Line: Revolution R Enterprise Performance = Greatly Reduced TCO
*As published by SAS in HPC Wire, April 21, 2011
17
18. R + Revolution R Enterprise
Unequaled Big Data Big Analytics
Deploy Analytics
Web, Mobile, Data Visualization, BI
Big Data Distributed Analytics
Big Data
Distributed
Analytics
Performance
Enhanced R
Performance Enhanced R
18
19. Revolution R Enterprise Ecosystem
Power of Integration
SI / Service
Deployment / Consumption
MSP / DSP
Advanced Analytics
ETL
Corios
Data / Infrastructure
19
20. Customers Revolutionize their Business
Power
4X performance
50M records scored daily
“We‟ve combined Revolution R
Enterprise and Hadoop to build and
deploy customized exploratory data
analysis and GAM survival models for
our marketing performance
management and attribution platform.
Given that our data sets are already in
the terabytes and are growing rapidly,
we depend on Revolution R Enterprise‟s
scalability and power – we saw about
a 4x performance improvement on 50
million records. It works brilliantly.”
- CEO, John Wallace, DataSong
Scalability
Performance
TB’s data from 200+ data sources
10’s thousands attributes
100’s millions of scores daily
2X data
2X attributes
no impact on performance
“We‟ve been able to scale our solution to a
problem that‟s so big that most companies could
not address it. If we had to go with a different
solution we wouldn‟t be as efficient as we are
now.”
- SVP Analytics, Kevin Lyons, eXelate
“We need a high-performance analytics
infrastructure because marketing optimization is a
lot like a financial trading. By watching the market
constantly for data or market condition updates,
we can now identify opportunities for our
clients that would otherwise be lost.”
- Chief Analytics Officer, Leon Zemel, [x+1]
20
31. Multi-Node Package Manager
HDFS
Name Node
MapReduce
Data Node
Data Node
Data Node
Data Node
Data Node
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Job
Tracker
31
32. ScaleR in Hadoop
HDFS
Name Node
MapReduce
Data Node
Data Node
Data Node
Data Node
Data Node
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Job
Tracker
32
Enterprise readinessPerformance architectureBig Data analyticsData source integrationDevelopment toolsDeployment tools
RRE license is a combo of GPL v2 license (which guarantees commercial usage of R) plus a proprietary license to our proprietary components.
Enterprise readinessBuild assurance: Continuous testing, custom validationImplementation tools: validation utilityTechnical support, documentation, trainingPerformance architectureFast math librariesBetter memory managementMulti-core processingDistributed computing architectureBig Data analyticsDescriptive StatisticsCross TabulationStatistical TestsCorrelation, Covariance and SSCP MatricesLinear RegressionLogistic RegressionGeneralized Linear ModelsDecision TreesK-Means ClusteringData source integrationODBCTeradata (high speed)Text Files: Delimited & Fixed formatSASSPSSHadoop:HDFS & HbaseDevelopment toolsVisual DebuggerScript EditorR SnippetsObject BrowserSolution ExplorerCustomizable WorkspaceVersion Control Plug-InDeployment toolsR objects as JSON, XMLSupports Java, JavaScript, .NETRESTful web services APISecurity: LDAP, SSOBuilt-In load balancingAsynchronous schedulingManagement consoleAccelerators: Jaspersoft, Qlikview
A Revolution R Enterprise ScaleR analytic is provided a data source as inputThe analytic loops over data, reading a block at a time. Blocks of data are read by a separate worker thread (Thread 0).Worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memoryWhen all of the data is processed a master results object is created from the intermediate results objects
Most current stable release~150 new featuresSupport for long vectors~100 bug fixes and performance improvements83 miscellaneous enhancements (installation, utilities, internationalization etc
Semi-automatic modelingIdeal for variable selectionMethods:ForwardBackwardBidirectionalSelection criteria:AICBICMallows’ Cp
“Random ForestsTM”Ensemble learning methodClassificationRegressionTrains many treesOutput is mode of classesVariety of use cases
“Random ForestsTM”Ensemble learning methodClassificationRegressionTrains many treesOutput is mode of classesVariety of use cases
“Random ForestsTM”Ensemble learning methodClassificationRegressionTrains many treesOutput is mode of classesVariety of use cases