R is a popular open-source language and environment for statistical analysis and visualization. It allows users to perform a wide range of statistical and predictive modeling techniques on data. Many companies use R as their standard tool for analytics due to its extensive library of packages and ability to handle large datasets. R can interface with other languages and platforms, making it a versatile scripting language for data science tasks.
1. R as Supporting Tool for
Analytics and Simulation
Alvaro Gil
Simulation & Optimization Consultant
http://agiltools.com
June 2016
2. Agenda
Introduction
What is R? Why use it?
What to Install
Example of Companies Using R
Some Facts About R
Interesting R Applications
R: The Generic Scripting Language
R + AnyLogic
Interfacing Programming Languages from R
R and IoT
Useful Links
3. Introduction
Pre and post processing of information is a necessary step for modeling and
simulation
Information processing is part of the Analytics field.
◦ Analytics is a discipline which combines: Descriptive, Predictive and Prescriptive
techniques on all type of data (INFORMS).
Applying Analytics requires special skills as well as knowledge of specialized
software (SPSS, SAS, R, Python, JMP, Stata, etc.).
Several specialists are promoting the use of R as the standard language for
data analysis (reasons to come in the following slides)
This presentation is an overview of R and what we expect to achieve with it.
4. What is R? Why use it?
R is a high level matrix programming language for statistical and data analysis.
It runs on multiple platforms including Windows, MacOS and Linux.
R is an interpreted language, meaning that user gets an immediate response of the tools without the
need of program compilation.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-
series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.
Free and open source
R’s main selling point is the massive amount of libraries allowing you to perform almost any statistical
procedure in a single command
◦ There are more than 8000 available packages on CRAN, all independently tested, and generally peer reviewed.
R is great for performing analysis on a dataset, and presenting findings in a static set of graphics
R is very useful to perform distributed automatized data analysis process
6. What to Install?
R Language
◦ R CRAN or
◦ Microsoft Open R (MRO)
An IDE
◦ R-Studio
◦ Red-R
◦ Rattle
◦ EMACS + Emacs Speaks Statistics (ESS)
◦ Eclipse (StatET)
◦ Visual Studio
A full set of packages
7. Microsoft R
MRO: Microsoft R Open (personal version link)
MRS: Microsoft R Server (professional version link)
Enhanced distribution of R from Microsoft Corporation.
It includes the R languages plus additional capabilities for improved performance, reproducibility
and platform support.
◦ The installation of many packages include all base and recommended R packages plus a set of specialized
packages released by Microsoft Corporation to further enhance your Microsoft R Open experience
◦ Multi-threaded math libraries (Math Kernel Library MKL)
◦ A high-performance default CRAN repository that provide a consistent and static set of packages to all
Microsoft R Open users.
◦ The checkpoint package that make it easy to share R code and replicate results using specific R package
versions.
◦ Platforms: Windows, Mac OS X, and Linux
◦ MRS also includes specialized packages for big data.
Visit https://mran.microsoft.com/open/ for more info
9. R Packages
More than 8,000 available
packages
source('http://agiltools.com/R/rp.R')
10. Examples of Companies Using R
http://www.revolutionanalytics.com/companies-using-r
http://www.r-bloggers.com/airbnb-uses-r-to-scale-data-science/
http://data-informed.com/companies-use-r-compete-data-driven-world/
http://www.r-bloggers.com/companies-using-open-source-r-in-2013/
Sources
11. Some Facts About R
R is the highest paid IT skill (Dice.com survey, January 2014)
R most-used data science language after SQL (O'Reilly survey, January 2014)
R is used as Analytics tool by 75% of professionals (Rexer survey, October 2015)
R is #13 of all programming languages (RedMonk language rankings, June 2015)
R growing faster than any other data science language (KDNuggets survey, August 2014)
R is the #1 Google Search for Advanced Analytics software (Google Trends April 2016)
R has more than 2 million users worldwide (Oracle estimate, February 2012)
12. Interesting R Applications
Complete Libraries Specialized by Topic e.g.:
◦ Econometrics
◦ Finances (e.g. actuar, fPortfolio, financial, etc.)
◦ Machine Learning (e.g. nnet, neuralnet, RSNNS, deepnet, darch, h2o, etc.)
◦ Optimization (e.g. Rquadprog, optmix, etc.)
◦ Simulation (e.g. simmer)
◦ Social Sciences
◦ Spatial (e.g. maps)
◦ See more at https://cran.r-project.org/web/views/
Markdown (R-Studio)
Shiny (R-Studio)
Big Data (e.g. bigmemory, ff, RevoScaleR)
13. Interesting R Applications: Markdown
Markdown is a text-to-HTML conversion tool for reporting.
It allows users to share and/or present their work.
External examples:
◦ 1 (pdf): https://github.com/yihui/knitr/releases/download/doc/knitr-minimal.pdf
◦ 2 (html): https://rawgit.com/yihui/knitr-examples/master/003-minimal.html
◦ 3 (knitr + googleVis): https://cran.r-project.org/web/packages/googleVis/vignettes/Using_googleVis_with_knitr.html
◦ 4 (with Shiny): https://cpsievert.shinyapps.io/animintRmarkdown/
◦ 5 (combined with JavaScript): http://www.nytimes.com/interactive/2014/01/23/business/case-shiller-
slider.html?_r=0
14. Interesting R Applications: Shiny
Web application for R.
Interactive visualization tool based on JavaScript
libraries like d3, Leaflet and Google Charts.
This reporting tool runs in all type of devices
Can be connected to R to perform any kind of data
analysis in real time (data mining, optimization, etc.)
See some examples at: Shiny User Showcase
Shiny + javascript
(https://frissdemo.shinyapps.io/FrissDashboard/)
Shiny can be embedded in individual servers to add
security and increase performance.
Shiny is available at Predix through cf-buildpack-r
(check link)
15. Interesting R Applications: Big Data
Specialized libraries to manipulate big data
◦ bigmemory+ biganalytics (article)
◦ ff+ffbase (article)
R has proven to be very effective to manipulate
millions of rows in short time (e.g. less than 30
seconds to perform a linear regression of a
sample of 10M).
Machine learning algorithms with millions of
rows can run in seconds with the right libraries
and configuration
MRS implements RevoScaleR to
manipulate big data and handle
parallelism
16. R: The Generic Scripting Language
Given the popularity and versatility of R, many companies are adapting its
services to be compatibles with R
Oracle, Microsoft, GE among others
Since 2016 SQL Server has the ability to run R scripts directly in database
using SQL Server R Services. This means the R code will run directly on the
server, as opposed to first extracting the data to a local R session.
In the words of Joseph Sirosh, corporate VP at Microsoft Data Group, “[Microsoft R Server
enables] enterprise customers to standardize advanced analytics on one core tool, regardless
of whether they are using Hadoop (Hortonworks, Cloudera and MapR), Linux (Red Hat and
SUSE) or Teradata. [We are committed to] building R and Revolution’s technology into our
broader database, big data and business intelligence offerings and to bring these benefits to
customers and students – on-premises, in the Azure cloud and to new platforms.”
Forbes January 2016 https://t.co/AJicDBqv47
17. R: The Generic Scripting Language
R and Azure
Microsoft is adapting services like Azure to include R as the scripting language for data analysis
18. Calling R from AnyLogic
AnyLogic can work with R by using the Java library Rcaller.
Rcaller is a software library which is developed to simplify calling R from Java (see link)
It successfully simplifies and wraps type conversations and makes variables in each languages
accessible between platforms
multiple R processes can be created and handled by multiple RCaller instances in Java
20. Interfacing Programming Languages from
R
The R environment can interface with other programming languages, such as Fortran, C and
Java.
Examples of interfaces with C and Java can be found in:
C: http://adv-r.had.co.nz/C-interface.html
Java: http://rforge.net/rJava/
21. R and IoT
R can be executed inside Internet of Things (IoT)
platforms like Bluemix, Amazon Web Services, Azure and
Predix
Libraries like cf-buildpack-r allows users to execute
Rscripts in cloudfoundry based plaforms and even embed
Shiny applications.
In Microsoft platforms Rscripts are already embedded in
Azure
22. Useful Links
R Project
CRAN
R Packages
Books and Tutorials
http://www.statmethods.net/
R Bloggers
R Journal
R Graphical Manuals