5. What is
• The most popular statistical programming language
• A data visualization tool
• Open source
• 3+ Million users
• Taught in most universities
• Thriving user groups worldwide
• 9000+ contributed packages
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration
7. • Any code/package that works today with R will work in R Server.
• Ideal for parameter sweeps, simulation, scoring.
• Transformations: rxDataStep(), Statistics: rxChiSquaredTest(), Algorithms: rxLinMod(), Parallelism: rxSetComputeContext()
8.
9. • Provisions Azure
compute resources with
Spark installed and
configured.
• Data is stored in Azure
Blob storage (wasb://) or
Azure Data Lake Store
(adl://)
10. R
R Server
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
11. R R R R R
R R R R R
R Server
Master R process on Edge Node
Apache YARN and Spark
Worker R processes on Data Nodes
Data in Distributed Storage
R process on Edge Node
HDInsight Gateway
RStudio
12.
13.
14. R server (single thread on local) R Server on HDInsight (4 nodes)
471 sec 144 sec (-70%)