This document provides an overview of Revolution R Enterprise for IBM Netezza, a high-performance in-database analytics platform. It discusses how Revolution R leverages the massively parallel processing of Netezza to deliver faster analytics. Key features highlighted include running R code and advanced statistical models directly on Netezza clusters, accessing over 2,500 R packages, and integrating with front-end applications through web services. The document also demonstrates how to deploy Revolution R on Netezza through examples of predictive modeling tasks like decision trees and Naive Bayes classification.
5. Revolution Confidential
Most advanced statistical
analysis software available
The professor who invented analytic software for
Half the cost of the experts now wants to take it to the masses
commercial alternatives
2M+ Users
Power
2,500+ Applications
Finance
Statistics
Life Sciences
Predictive Manufacturing
Analytics Productivity
Retail
Data Mining Telecom Enterprise
Social Media Readiness
Visualization
Government
5
6. R evolution R E nterpris e has the Open-
S ourc e R E ngine at the c ore Revolution Confidential
2,500 community packages and growing exponentially
Multi-Threaded Technology Web Services Big Data Parallel
Math Libraries Partners API Analysis Tools
Revolution
Technical Productivity
Support Environment
Open Source R Build
Packages R Engine Assurance
Language Libraries
6
22. Revolution Confidential
Turbo-C harge Your
A nalytic s with IB M
Netezza and R evolution
R E nterpris e
P res ented by:
Derek M Norton, S enior S ales E ngineer
23. Us e C as e – C redit R is k Revolution Confidential
We have a dataset comprised of individuals
and their credit risk
stored on the Netezza Appliance
The goal is to model if someone is
“approvable” for a loan.
This use case will follow a modeling process
(though condensed) from start to finish.
I will discuss each of the parts and at the end
there will be a demo of the code
24. Modeling E xerc is e Revolution Confidential
1. Learning more about the data
2. Prepare the data for modeling
3. Fit models to the data
4. Model Performance
25. 1. L earning more about the data Revolution Confidential
Connect to the IBM Netezza appliance
Summarize the data
Visualize the data
Continuous Variable Discrete Varible
300
300
250
250
Frequency
200
200
150
150
100
100
50
50
0
0
0 5 10 15 20 25 High School Diploma Bachelors Degree Masters Degree Professional Degree PhD
x
26. 2. P repare the data for modeling Revolution Confidential
Split the data in to 70/30 Training/Test sets
Transform some variables
Discretize numeric variables for later use
27. 3. F it models to the data Revolution Confidential
Build two different models to predict if an
individual is “approvable”
Decision Tree
Naïve Bayes
28. 4. Model P erformanc e Revolution Confidential
Examine confusion matrices to determine:
Training performance
Test performance