Presented by David Smith, Chief Community Officer, Revolution Analytics at Garner Business Intelligence and Analytics Summit, April 2014.
In this presentation, I'll introduce the open source R language — the modern standard for Data Science — and the enhanced performance, scalability and ease-of-use capabilities of Revolution R Enterprise. Customer case studies will illustrate Revolution R Enterprise as a component of the real-time analytics deployment process, via integration with Hadoop, database warehousing systems and Cloud platforms, to implement data-driven end-user applications.
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit 2014)
1. Big Data Predictive Analytics
with Revolution R Enterprise
David Smith
Gartner BI Conference, April 2014
Chief Community Officer
@revodavid
2. 2
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR SOFTWARE
The only Big Data, Big
Analytics software platform
based on the data science
language R
KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
3. What is R?
Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity
Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data
Thriving open-source community
• Leading edge of analytics research
Fills the talent gap
• New graduates prefer R
R is Hot
bit.ly/r-is-hot
WHITE PAPER
4. Exploding growth and demand for R
R is the highest paid IT skill
R most-used data science language
after SQL
R is used by 70% of data miners
R is #15 of all programming languages
R growing faster than any other data
science language
R is the #1 Google Search for
Advanced Analytics software
R has more than 2 million users
worldwide
R Usage Growth
Rexer Data Miner Survey, 2007-2013
70% of data miners report using R
R is the first choice of more
data miners than any other
software
Source: www.rexeranalytics.com
5. 5
Technical Support for Open Source R
AdviseR™ from Revolution Analytics
Technical support for open source R, from the R experts.
24x7 email and phone support
On-line case management and knowledgebase
Access to technical resources, documentation and user forums
Exclusive on-line webinars from community experts
Guaranteed response times
Also available: expert hands-on and on-line training for R, from
Revolution Analytics AcademyR.
www.revolutionanalytics.com/AdviseR
www.revolutionanalytics.com/AcademyR
6. Revolution R Enterprise
High Performance, Scalable Analytics
Portable Across Enterprise Platforms
Easier to Build & Deploy Analytics
is….
the only big data big analytics platform
based on open source R
6
7. Big Data In-memory bound Hybrid memory & disk
scalability
Operates on bigger
volumes & factors
Speed of
Analysis
Single threaded Parallel threading Shrinks analysis time
Enterprise
Readiness
Community support Commercial support Delivers full service
production support
Analytic
Breadth &
Depth
5000+ innovative
analytic packages
Leverage open source
packages plus Big Data
ready packages
Supercharges R
Commercial
Viability
Risk of deployment
of open source
GPL-compatible
licensing
Eliminate risk with open
source
Enhancing Open Source R for the Enterprise
7
9. Unique PEMAs: Parallel,
external-memory algorithms
High-performance, scalable
replacements for R/SAS
analytic functions
Parallel/distributed
processing eliminates CPU
bottleneck
Data streaming eliminates
memory size limitations
Works with in-memory and
disk-based architectures
9
Eliminates Performance and Capacity
Limits of Open Source R and Legacy SAS
10. All of Open Source R plus:
Big Data scalability
High-performance analytics
Development and deployment
tools
Data source connectivity
Application integration framework
Multi-platform architecture
Support, Training and Services
10
is the
Big Data Big Analytics Platform
11. DistributedR
ScaleR
ConnectR
DeployR
DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE
In the Cloud Amazon AWS
Workstations & Servers Windows
Red Hat and SUSE Linux
Clustered Systems IBM Platform LSF
Microsoft HPC
EDW IBM Netezza
Teradata
Hadoop Hortonworks
Cloudera
11
Write Once.
Deploy Anywhere.
12. Write Once Deploy Anywhere
rxSetComputeContext("local") # DEFAULT
rxSetComputeContext(RxHadoopMR(<data, server environment arguments>))
# Summarize and calculate descriptive statistics from the data airDS data set
adsSummary = rxSummary(~ArrDelay+CRSDepTime+DayOfWeek, data = airDS)
# Fit Linear Regression Model
arrDelayLm1 = rxLinMod(ArrDelay ~ DayOfWeek, data = airDS); summary(arrDelayLm1)
rxSetComputeContext(RxHpcServer(<data, server environment arguments>))
rxSetComputeContext(RxLsfCluster(<data, server environment arguments>))
Same code to be run anywhere …..
Local System
(default)
Set the desired compute context for code execution…..
rxSetComputeContext(RxTeradata(<data, server environment arguments>))
13. 13
In-Hadoop Big Data Big Analytics
Eliminate data
movement latency
Speed model
development
Use commodity
Hadoop nodes as
analytics engine
Name Node
Data NodeData Node Data NodeData Node Data Node
Job
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
MapReduce
HDFS
14. 14
Revolution Analytics coupled with the Teradata Unified Data Architecture accelerates
big data analytics with the R language.
+
In-Database Analytics:
Parallel R in-database for big
data analytics on Teradata
Build parallel R models
completely in R
Use Teradata appliance as
analytics engine
No need to move data
Teradata
14.10
+
Revolution R
Enterprise V7
15. 15
RRE7 in the Cloud
Revolution R Enterprise 7, on the industry-leading cloud platform
Pay as you go, priced by cores x hours
– No long-term commitment required
Launch Windows and Linux servers on demand
– Windows 2008 R2 with DevelopR
– RHEL 6 with RStudio Server Professional
– Server instances from 2 – 32 cores
– Analyze data sets up to 2 TB
Convenient, consistent and reliable
– Available globally, accessible anywhere
– Forum-based support with registration
Free 14-day trial available
CLOUD SERVERS
$0.70
PER CORE/HOUR
PLUS AWS INFRASTRUCTURE COSTS
16. Revolution R Enterprise Ecosystem
Integration with the Big Data Analytics Stack
Deployment / Consumption
Data / Infrastructure
Advanced Analytics
ETL
SI / Service MSP / DSP
16
17. How Customers Revolutionize their Business
Power
“We’ve combined Revolution R
Enterprise and Hadoop to build and
deploy customized exploratory data
analysis and GAM survival models for
our marketing performance
management and attribution platform.
Given that our data sets are already in
the terabytes and are growing rapidly,
we depend on Revolution R Enterprise’s
scalability and power – we saw about
a 4x performance improvement on 50
million records. It works brilliantly.”
- CEO, John Wallace, DataSong
4X performance
50M records scored daily
Scalability
“We’ve been able to scale our solution to a
problem that’s so big that most companies could
not address it. If we had to go with a different
solution we wouldn’t be as efficient as we are
now.”
- SVP Analytics, Kevin Lyons, eXelate
TB’s data from 200+ data sources
10’s thousands attributes
100’s millions of scores daily
2X data
2X attributes
no impact on performance
Performance
“We need a high-performance analytics
infrastructure because marketing optimization is a
lot like a financial trading. By watching the market
constantly for data or market condition updates,
we can now identify opportunities for our
clients that would otherwise be lost.”
- Chief Analytics Officer, Leon Zemel, [x+1]
18. Why Revolution R Enterprise?
18
Platform
Independence
Take Big Cost Out
of Big Data
Supercharge R for
Massive Data
Power R for the
Enterprise