Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Through the firewall with miniCRAN
1. Through the fire wall with
miniCRAN
Andrie de Vries
andrie@revolutionanalytics.com
@RevoAndrie
2. OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR SOFTWARE
The only Big Data, Big Analytics
software platform based on the
data science language R
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
4. Overview
Situation
– CRAN and other package repositories are a wonderful source of innovation
– It is very easy to create a complete mirror of CRAN
Complication
– Organisations want to control what sits behind the firewall
– Rationale: licensing as well as security concerns
Critical question
– How to manage an internally consistent set of package in your organisation?
5. Enterprise requires CRAN behind the firewall
Security
Separated from Internet
Virus and/or malicious code detection
License compliance
Subset of approved packages
CRAN
Local CRAN mirror
Internally approved
subset of CRAN
mirror
R
Mirror
Publish internally
R
R users
Scan, virus check
and quarantine
6. Solutions:
Use rsync to create a full mirror
– Described in the Revolution R
installation manual
OR
Use the miniCRAN package
– Specify packages
– Download to local repository
– Create additional repository files
– To support available.packages() and
install.packages()
– Repeat for each version of R
CRAN
Partial CRAN mirror
Scan, virus check and
Internally approved
subset of CRAN
mirror
R
Mirror
Publish internally
R
R users
quarantine
8. Terminology
Repository
– Specific file structure with package source and/or binaries as
well as PACKAGES metadata
– For example CRAN or BioConductor
Library
– A folder on your machine containing packages
– Separate folder for each installed version of R
Package
– The actual package
– For example ggplot2 or MASS
9. Anatomy of a CRAN mirror
A repository contains packages in both source and binary format (for
multiple versions of R)
Root
∟ src
∟ contrib
∟ bin
∟ windows/contrib/
∟ macosx/contrib/
∟ macosx/mavericks/contrib
∟ macosx/leopard/contrib
∟ PACKAGES
Source packages
Binary packages
(multiple folders for
each R version)
Index file
10. Step by step guide
List desired packages
Determine all dependencies
– (recursively)
Download source and binaries
– For every version of R you want to support
Create index file: PACKAGES
Make your local repo available in the organisation
12. Using miniCRAN
The miniCRAN package is available at:
– CRAN
– github (development version)
library(miniCRAN)
vignette("miniCRAN")
13. Using miniCRAN
library(miniCRAN)
# Specify list of packages to download
pkgs <- c("foreach")
# Specify CRAN mirror to use
revolution <- c(CRAN="http://cran.revolutionanalytics.com")
pkgList <- pkgDep(pkgs, repos=revolution, type="source")
# Make repo for source and win.binary
makeRepo(pkgList, path=pth, repos=revolution,
type="source")
14. Referring to repo on local file system
Use file:///
install.packages("ggplot2",
repos="file:///path/to/file/")
15. Making it stick
Configure Rprofile for every user
To set your repo permanently, add
options(repos=c(CRAN="file:///path/to/repo"))
16. Example code
Find an example session at gist
– https://gist.github.com/andrie/d68834d68f4724432929
18. Package Hell
I heard you need to create
a TPS Report. Here, I’ve
got an R script that does
that already.
Oh, you need to
download these 5
packages first.
I did, and
it still
doesn’t
work!
Well, it worked when I
wrote it 3 weeks ago.
YOUR
Grr.
Package
updates…
19. Sharing a script reproducibly … and simply
# Run with R 3.1.0
require(RRT)
checkpoint(snapshot="2014-06-27")
# find packages used in this project
# install packages in checkpoint folder
# set library path to use checkpointed packages
require(ggplot2)
require(data.table)
require(knitr)
...
20. The R reproducibility toolkit
Server-side solution: MRAN
Client side R package: RRT
CRAN
MRAN RRT package
RRDaily
snapshots
require(RRT)
checkpoint("2014-06-
27")
21. MRAN and RRT is actively developed
Try the development version by installing from github:
install.packages("devtools")
library("devtools")
devtools::install_github("RevolutionAnalytics/RRT")
library("RRT")
23. Conclusion
To use a private version of CRAN in your organisation, either:
– Use rsync to create a full CRAN mirror
– Use the miniCRAN package to selectively create a mini version of CRAN
For reproducible research, look out for imminent announcements about
RRT and MRAN