SlideShare a Scribd company logo
1 of 30
Download to read offline
Big Data Analytics with R
Derek McCrae Norton, Senior Sales Engineer
April 2, 2014
Agenda
 Introduction
 Big Data
 Analytics
 R
 Revolution R Enterprise
 Synergy
 Conclusion
© 2013 Revolution Analytics
Who are you anyway?
 Statistician
– My degrees are all in statistics.
 Consultant
– My experience has been mostly in Marketing Analytics focusing on Predictive
Analytics.
 Sales Engineer
– Still consulting, just with a much heavier emphasis on client interaction.
 Founder/Director Atlanta R Users Group.
– Shameless plug. Please join if interested.
– http://www.meetup.com/R-Users-Atlanta/
 Husband, Father, Outdoorsman, Serial Hobbyist, …
© 2013 Revolution Analytics
Big Data
© 2013 Revolution Analytics
Big Data and Big Opportunities
© 2013 Revolution Analytics
“Big data is data that
exceeds the processing
capability of conventional
database systems”
Edd Dumbill
O’Reilly Radar*, Jan 2012
Worldwide data created and replicated, Zettabytes
1
2
35
* radar.oreilly.com/2012/01/what-is-big-data.html
What is Big Data?
Big Data is a loosely defined term used to describe
data sets so large and complex that they become
awkward to work with using standard statistical
software.
© 2013 Revolution Analytics
Snijders, Matzat, & Reips (2012)
Does Big Data Mean Hadoop?
 The short answer is no.
 The longer answer is maybe.
 Hadoop adoption is
turning that maybe
into a probably.
© 2013 Revolution Analytics
?
Analytics
© 2013 Revolution Analytics
What is Analytics?
Analytics is the combination of mathematical,
statistical, and heuristic techniques to glean useful
insights from data and to implement actions derived
from those insights.
© 2013 Revolution Analytics
Derek McCrae Norton
Analytics
 The current buzzword is “Data Science,” but I
don’t really agree with that nomenclature.
– What statistician, analyst, (data scientist) actually
follows the scientific method?
 That being said, the current definition of “Data Science”
is a pretty good surrogate for what we are discussing.
 Whatever descriptors you use, one thing is clear… You must use
something to help you carry out the actual work.
– R, Python, SAS, etc.
– RDBMS, Hadoop, etc.
© 2013 Revolution Analytics
© 2013 Revolution Analytics
What is the R language?
 A Platform…
– A Procedural Language for Stats, Math and Data Science
– A Complete Data Visualization Framework
– Provided as Open Source
 A Community…
– 2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and
Machine Learning Projects
– Active User Groups Across the World
 An Ecosystem
– CRAN: 5000+ Freely Available Packages
– Applicable to Big Data if scaled
© 2013 Revolution Analytics
THE R USER COMMUNITY
A brief history of R
 1993: Research project in Auckland, NZ
– Ross Ihaka and Robert Gentlemen
 1995: Released as open-source software
– Generally compatible with the “S” language
 1997: R core group formed
 2000: R 1.0.0 released
 2004: First international
user conference in Vienna
 2013: R 3.0.0 released
© 2013 Revolution Analytics
R is Free
 Open Source, licensed under GPL (like Linux!)
– Free as in beer
– Free as in freedom
 Flexible
 Open for integration
– Data (SAS, SPSS, Excel, SQL Server, Oracle, …)
– Systems (applications, webservers, …)
 Broad user-base
– De-facto standard for data analysis teaching
© 2013 Revolution Analytics
16
R is exploding in popularity & function
Web Site Popularity
Number of links to main web site
R
SAS
SPSS
S-Plus
Stata
Scholarly Activity
Google Scholar hits (’05-’09 CAGR)
R 46%
SAS -11%
SPSS -27%
S-Plus 0%
Stata 10%
Internet Discussion
Mean monthly traffic on email discussion list
R
SAS
Stata
SPSS
S-Plus
Package Growth
Number of R packages listed on CRAN
4,332 as of
Feb 2013
© 2013 Revolution Analytics
So why isn’t everyone using R?
“The best thing about R is that it was developed by
statisticians. The worst thing about R is that it was
developed by statisticians.”
© 2013 Revolution Analytics
Bo Cowgill
Google (at SF R Meetup)
Otherwise R is Great! Right?
 Who here has used R?
– Thoughts?
 Who has never seen this?
 Who here has more than 1 core/processor?
 Who has ever used r-help?
– ’They’ did write documentation that told you that Perl was needed, but
‘they’ can’t read it for you. - Brian D. Ripley, R-help (February 2001)
– This is all documented in TFM. Those who WTFM don’t want to have to
WTFM again on the mailing list. RTFM. - Barry Rowlingson, R-help
(October 2003)
© 2013 Revolution Analytics
What is Revolution R
Enterprise?
© 2013 Revolution Analytics
Motivators
© 2013 Revolution Analytics
Big Data In-memory bound Hybrid memory & disk
scalability
Operates on bigger
volumes & factors
Speed of
Analysis
Single threaded Parallel threading Shrinks analysis time
Enterprise
Readiness
Community support Commercial support Delivers full service
production support
Analytic
Breadth &
Depth
5000+ innovative
analytic packages
Leverage open source
packages plus Big Data
ready packages
Supercharges R
Commercial
Viability
Risk of deployment of
open source
Commercial license Eliminate risk with open
source
Introducing Revolution R Enterprise
(RRE)
The Big Data Big Analytics Platform
DistributedR
DevelopR DeployR
ScaleR
ConnectR
 Big Data Big Analytics Ready
– Enterprise readiness
– High performance analytics
– Multi-platform architecture
– Data source integration
– Development tools
– Deployment tools
© 2013 Revolution Analytics
The Platform Step by Step:
R Capabilities
R+CRAN
• Open source R interpreter
• UPDATED R 3.0.2
• Freely-available R algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing
R scripts, functions and
packages
RevoR
• Performance enhanced R interpreter
• Based on open source R
• Adds high-performance math
Available On:
• PlatformTM LSFTM Linux®
• Microsoft® HPC Clusters
• Windows® & Linux Servers
• Windows & Linux Workstations
• IBM® Netezza®
• NEW Cloudera Hadoop®
• NEW Hortonworks Hadoop
• NEW Teradata® Database
• Intel® Hadoop
• IBM BigInsightsTM
© 2013 Revolution Analytics
The Platform Step by Step:
Parallelization & Data Sourcing ConnectR
• High-speed & direct connectors
Available for:
• High-performance XDF
• SAS, SPSS, delimited & fixed format
text data files
• Hadoop HDFS (text & XDF)
• Teradata Database & Aster
• EDWs and ADWs
• ODBC
ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical
tests
• Correlation & covariance matrices
• Predictive Models – linear, logistic,
GLM
• Machine learning
• Monte Carlo simulation
• NEW Tools for distributing
customized algorithms across nodes
DistributedR
• Distributed computing framework
• Delivers portability across platforms
Available on:
• Windows Servers
• Red Hat and NEW SuSE Linux Servers
• IBM Platform LSF Linux
• Microsoft HPC Clusters
• NEW Teradata Database
• NEW Cloudera Hadoop
• NEW Hortonworks Hadoop
© 2013 Revolution Analytics
A single package
(RevoScaleR)
DeployR
• Web services software
development kit for integration
analytics via Java, JavaScript or
.NET APIs
• Integrates R Into application
infrastructures
Capabilities:
• Invokes R Scripts from
web services calls
• RESTful interface for
easy integration
• Works with web & mobile apps,
leading BI & Visualization tools and
business rules engines
DevelopR
• Integrated development
environment for R
• Visual ‘step-into’ debugger
Available on:
• Windows
The Platform Step by Step:
Tools & Deployment
DevelopR DeployR
© 2013 Revolution Analytics
DistributedR
ScaleR
ConnectR
DeployR
Write Once. Deploy Anywhere.
DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE
In the Cloud Amazon AWS
Workstations & Servers Desktop
Server
Clustered Systems IBM Platform LSF
Microsoft HPC
EDW Teradata
Hadoop Hortonworks
Cloudera
© 2013 Revolution Analytics
Synergy
© 2013 Revolution Analytics
Put it all together
 Talent fresh out of school knows R.
 RRE is R plus more.
 RRE provides a unified way of carrying out analytics (small or big).
 RRE code is portable…
© 2013 Revolution Analytics
Scale and Portability
 Set “compute context” to define hardware (one line of code)
– Native job-scheduler handles distribution, monitoring, failover etc.
 Same code runs on other supported architectures
– Just change compute context
© 2013 Revolution Analytics
42 seconds instead of 6 minutes on the local machine
References
1. Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of
knowledge in the field of Internet. International Journal of Internet
Science, 7, 1-5. http://www.ijis.net/ijis7_1/ijis7_1_editorial.html
2. Conway, D, THE DATA SCIENCE VENN DIAGRAM
© 2013 Revolution Analytics

More Related Content

What's hot

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 

What's hot (20)

Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Data models
Data modelsData models
Data models
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Kdd process
Kdd processKdd process
Kdd process
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Big data storage
Big data storageBig data storage
Big data storage
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 

Viewers also liked

A brief history of Lua - Roberto Ierusalimschy (PUC Rio)
A brief history of Lua - Roberto  Ierusalimschy (PUC Rio)A brief history of Lua - Roberto  Ierusalimschy (PUC Rio)
A brief history of Lua - Roberto Ierusalimschy (PUC Rio)
Kore VM
 
Intoroduction of Pandas with Python
Intoroduction of Pandas with PythonIntoroduction of Pandas with Python
Intoroduction of Pandas with Python
Atsushi Hayakawa
 

Viewers also liked (20)

Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
 
Data Analytics using R
Data Analytics using RData Analytics using R
Data Analytics using R
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
A brief history of Lua - Roberto Ierusalimschy (PUC Rio)
A brief history of Lua - Roberto  Ierusalimschy (PUC Rio)A brief history of Lua - Roberto  Ierusalimschy (PUC Rio)
A brief history of Lua - Roberto Ierusalimschy (PUC Rio)
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...
OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...
OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
Python for Data Anaysis第2回勉強会4,5章
Python for Data Anaysis第2回勉強会4,5章Python for Data Anaysis第2回勉強会4,5章
Python for Data Anaysis第2回勉強会4,5章
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
Intoroduction of Pandas with Python
Intoroduction of Pandas with PythonIntoroduction of Pandas with Python
Intoroduction of Pandas with Python
 
RHadoop
RHadoopRHadoop
RHadoop
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
 
SAS Modernization architectures - Big Data Analytics
SAS Modernization architectures - Big Data AnalyticsSAS Modernization architectures - Big Data Analytics
SAS Modernization architectures - Big Data Analytics
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Language R
Language RLanguage R
Language R
 

Similar to Big Data Analytics with R

Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
Revolution Analytics
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
Revolution Analytics
 
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
Revolution Analytics
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Willy Marroquin (WillyDevNET)
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
Andy Lathrop
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
Revolution Analytics
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
Revolution Analytics
 

Similar to Big Data Analytics with R (20)

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans Excel
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
 
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobs
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Revolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute historyRevolution Analytics: a 5-minute history
Revolution Analytics: a 5-minute history
 
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
 

More from Great Wide Open

More from Great Wide Open (20)

The Little Meetup That Could
The Little Meetup That CouldThe Little Meetup That Could
The Little Meetup That Could
 
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your DreamsLightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
 
Breaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational PullBreaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational Pull
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
You Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core FeaturesYou Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core Features
 
Hidden Features in HTTP
Hidden Features in HTTPHidden Features in HTTP
Hidden Features in HTTP
 
Using Cryptography Properly in Applications
Using Cryptography Properly in ApplicationsUsing Cryptography Properly in Applications
Using Cryptography Properly in Applications
 
Lightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open SourceLightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open Source
 
You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?
 
How Constraints Cultivate Growth
How Constraints Cultivate GrowthHow Constraints Cultivate Growth
How Constraints Cultivate Growth
 
Inner Source 101
Inner Source 101Inner Source 101
Inner Source 101
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
Search is the new UI
Search is the new UISearch is the new UI
Search is the new UI
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
The Current Messaging Landscape
The Current Messaging LandscapeThe Current Messaging Landscape
The Current Messaging Landscape
 
Apache httpd v2.4
Apache httpd v2.4Apache httpd v2.4
Apache httpd v2.4
 
Understanding Open Source Class 101
Understanding Open Source Class 101Understanding Open Source Class 101
Understanding Open Source Class 101
 
Thinking in Git
Thinking in GitThinking in Git
Thinking in Git
 
Antifragile Design
Antifragile DesignAntifragile Design
Antifragile Design
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Big Data Analytics with R

  • 1. Big Data Analytics with R Derek McCrae Norton, Senior Sales Engineer April 2, 2014
  • 2. Agenda  Introduction  Big Data  Analytics  R  Revolution R Enterprise  Synergy  Conclusion © 2013 Revolution Analytics
  • 3. Who are you anyway?  Statistician – My degrees are all in statistics.  Consultant – My experience has been mostly in Marketing Analytics focusing on Predictive Analytics.  Sales Engineer – Still consulting, just with a much heavier emphasis on client interaction.  Founder/Director Atlanta R Users Group. – Shameless plug. Please join if interested. – http://www.meetup.com/R-Users-Atlanta/  Husband, Father, Outdoorsman, Serial Hobbyist, … © 2013 Revolution Analytics
  • 4. Big Data © 2013 Revolution Analytics
  • 5. Big Data and Big Opportunities © 2013 Revolution Analytics “Big data is data that exceeds the processing capability of conventional database systems” Edd Dumbill O’Reilly Radar*, Jan 2012 Worldwide data created and replicated, Zettabytes 1 2 35 * radar.oreilly.com/2012/01/what-is-big-data.html
  • 6. What is Big Data? Big Data is a loosely defined term used to describe data sets so large and complex that they become awkward to work with using standard statistical software. © 2013 Revolution Analytics Snijders, Matzat, & Reips (2012)
  • 7. Does Big Data Mean Hadoop?  The short answer is no.  The longer answer is maybe.  Hadoop adoption is turning that maybe into a probably. © 2013 Revolution Analytics ?
  • 9. What is Analytics? Analytics is the combination of mathematical, statistical, and heuristic techniques to glean useful insights from data and to implement actions derived from those insights. © 2013 Revolution Analytics Derek McCrae Norton
  • 10. Analytics  The current buzzword is “Data Science,” but I don’t really agree with that nomenclature. – What statistician, analyst, (data scientist) actually follows the scientific method?  That being said, the current definition of “Data Science” is a pretty good surrogate for what we are discussing.  Whatever descriptors you use, one thing is clear… You must use something to help you carry out the actual work. – R, Python, SAS, etc. – RDBMS, Hadoop, etc. © 2013 Revolution Analytics
  • 11. © 2013 Revolution Analytics
  • 12. What is the R language?  A Platform… – A Procedural Language for Stats, Math and Data Science – A Complete Data Visualization Framework – Provided as Open Source  A Community… – 2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects – Active User Groups Across the World  An Ecosystem – CRAN: 5000+ Freely Available Packages – Applicable to Big Data if scaled © 2013 Revolution Analytics
  • 13. THE R USER COMMUNITY
  • 14. A brief history of R  1993: Research project in Auckland, NZ – Ross Ihaka and Robert Gentlemen  1995: Released as open-source software – Generally compatible with the “S” language  1997: R core group formed  2000: R 1.0.0 released  2004: First international user conference in Vienna  2013: R 3.0.0 released © 2013 Revolution Analytics
  • 15. R is Free  Open Source, licensed under GPL (like Linux!) – Free as in beer – Free as in freedom  Flexible  Open for integration – Data (SAS, SPSS, Excel, SQL Server, Oracle, …) – Systems (applications, webservers, …)  Broad user-base – De-facto standard for data analysis teaching © 2013 Revolution Analytics
  • 16. 16 R is exploding in popularity & function Web Site Popularity Number of links to main web site R SAS SPSS S-Plus Stata Scholarly Activity Google Scholar hits (’05-’09 CAGR) R 46% SAS -11% SPSS -27% S-Plus 0% Stata 10% Internet Discussion Mean monthly traffic on email discussion list R SAS Stata SPSS S-Plus Package Growth Number of R packages listed on CRAN 4,332 as of Feb 2013 © 2013 Revolution Analytics
  • 17. So why isn’t everyone using R? “The best thing about R is that it was developed by statisticians. The worst thing about R is that it was developed by statisticians.” © 2013 Revolution Analytics Bo Cowgill Google (at SF R Meetup)
  • 18. Otherwise R is Great! Right?  Who here has used R? – Thoughts?  Who has never seen this?  Who here has more than 1 core/processor?  Who has ever used r-help? – ’They’ did write documentation that told you that Perl was needed, but ‘they’ can’t read it for you. - Brian D. Ripley, R-help (February 2001) – This is all documented in TFM. Those who WTFM don’t want to have to WTFM again on the mailing list. RTFM. - Barry Rowlingson, R-help (October 2003) © 2013 Revolution Analytics
  • 19. What is Revolution R Enterprise? © 2013 Revolution Analytics
  • 20. Motivators © 2013 Revolution Analytics Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commercial Viability Risk of deployment of open source Commercial license Eliminate risk with open source
  • 21. Introducing Revolution R Enterprise (RRE) The Big Data Big Analytics Platform DistributedR DevelopR DeployR ScaleR ConnectR  Big Data Big Analytics Ready – Enterprise readiness – High performance analytics – Multi-platform architecture – Data source integration – Development tools – Deployment tools © 2013 Revolution Analytics
  • 22. The Platform Step by Step: R Capabilities R+CRAN • Open source R interpreter • UPDATED R 3.0.2 • Freely-available R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages RevoR • Performance enhanced R interpreter • Based on open source R • Adds high-performance math Available On: • PlatformTM LSFTM Linux® • Microsoft® HPC Clusters • Windows® & Linux Servers • Windows & Linux Workstations • IBM® Netezza® • NEW Cloudera Hadoop® • NEW Hortonworks Hadoop • NEW Teradata® Database • Intel® Hadoop • IBM BigInsightsTM © 2013 Revolution Analytics
  • 23. The Platform Step by Step: Parallelization & Data Sourcing ConnectR • High-speed & direct connectors Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBC ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • NEW Tools for distributing customized algorithms across nodes DistributedR • Distributed computing framework • Delivers portability across platforms Available on: • Windows Servers • Red Hat and NEW SuSE Linux Servers • IBM Platform LSF Linux • Microsoft HPC Clusters • NEW Teradata Database • NEW Cloudera Hadoop • NEW Hortonworks Hadoop © 2013 Revolution Analytics A single package (RevoScaleR)
  • 24. DeployR • Web services software development kit for integration analytics via Java, JavaScript or .NET APIs • Integrates R Into application infrastructures Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with web & mobile apps, leading BI & Visualization tools and business rules engines DevelopR • Integrated development environment for R • Visual ‘step-into’ debugger Available on: • Windows The Platform Step by Step: Tools & Deployment DevelopR DeployR © 2013 Revolution Analytics
  • 25. DistributedR ScaleR ConnectR DeployR Write Once. Deploy Anywhere. DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE In the Cloud Amazon AWS Workstations & Servers Desktop Server Clustered Systems IBM Platform LSF Microsoft HPC EDW Teradata Hadoop Hortonworks Cloudera © 2013 Revolution Analytics
  • 27. Put it all together  Talent fresh out of school knows R.  RRE is R plus more.  RRE provides a unified way of carrying out analytics (small or big).  RRE code is portable… © 2013 Revolution Analytics
  • 28. Scale and Portability  Set “compute context” to define hardware (one line of code) – Native job-scheduler handles distribution, monitoring, failover etc.  Same code runs on other supported architectures – Just change compute context © 2013 Revolution Analytics 42 seconds instead of 6 minutes on the local machine
  • 29.
  • 30. References 1. Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of knowledge in the field of Internet. International Journal of Internet Science, 7, 1-5. http://www.ijis.net/ijis7_1/ijis7_1_editorial.html 2. Conway, D, THE DATA SCIENCE VENN DIAGRAM © 2013 Revolution Analytics