SlideShare ist ein Scribd-Unternehmen logo
1 von 88
Downloaden Sie, um offline zu lesen
Disclaimer:
This is NOT an R , Python ,
Hadoop or Spark Overview
• From its inception, R was designed to use only a
single thread (processor) at a time. Even today, R
works that way unless linked with multi-threaded
BLAS/LAPACK libraries.
• Microsoft R Open : automatically use all available
cores and processors to significantly reduce
computation times — without the need to change a
line of your R code.
broad range of scalable and distributed R & Python functions
Platforms & Data
Tools
Languages
Algorithms
Data Sources
Rattle Mrsdeploy
RESTful API
deployment
Real-Time
Scoring
Visualization
Tool
Integration
.csv Microsoft .XDF
In-database
deployment
Operationalization
Distributed Parallelized Algorithms:
•RevoScaleR library
•MicrosoftML library
•Custom parallelization frameworks
Open source R algorithms
& visualizations:
•CRAN
•bioconductor
Plus:
•Deep Learning
•Pretrained Models
•Prebuilt Featurizers
ODBC/JDBC
• Small data many models
• Hybrid
• Large scale machine learning
• Batch training and scoring
• Model deployment for LOB applications
Machine Learning
and Analytics
Big Data StoresInformation
Management
Big Data as a cornerstone for Azure Data Services
Transform data into intelligent actions and predictions
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Event Hubs
Data Catalog
Data Factory
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Cosmos DB
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Service
Cognitive
Services
Power BI
Azure Analysis
Services
SQL Database
Batch AI
Blob Store
Databricks
•
•
• Support:
• Inadequacy of Community Support
• Application Governance:
• Commonly prohibited in production systems without commercial vendor
•
•
•
•
•
•
•
•
•
•
• Support for 1.6, 2.0, 2.1 and 2.2
• Support for all three distributions of Hadoop on all three flavors of Linux
• Inherits high-performance big-data processing characteristics from Spark
• Easily integrate with different data sources e.g. HIVE, Parquet , Orc, XDF, csv, etc…
• E2E framework for running parallel workloads
• Interoperability with Sparklyr and H2O
• Best in-class operationalization
• Guaranteed backward compatibility
R User
Workstation
MLS Server for Hadoop
Data
Frames
YARN Resource
Management
Spark
Executor
Worker
Task
Spark
Executor
Worker
Task
Spark
Executor
Worker
Task
.csv, xdf, hive,
parquet, ORC
ScaleR
Master Task
Finalizer
Initiator
Edge Node
Spark
Driver
Ver. 1.6 or 2.0
RStudio Desktop/Server
Microsoft R Server
Data
Frames
Worker
Task
Worker
Task
Worker
Task
ScaleR
Master Task
Finalizer
Initiator
Remote Execution:
ssh
Web Services
MRSDeplo
y
R Tools (VSCode,
Vstudio,Rstudio, …)
BI Tools &
Applications
Jupyter Notebooks
Thin Client IDEs
https://
https://
Edge Node
RStudio
Rstudio Server
Built-in remote execute
functions in R Client/R Server
Tools to reconcile local and
remote
Execute .R script or
interactive R commands
Return results to local
Generate working snapshots
for resume and reuse
IDE agnostic
MLS Server
configured to
(Support Window Server, Linux
Server, Hadoop )
Remote Execute R Scripts
 Execute R Scripts
 Snapshot remote env.
 Logout remote server
 Login remote server
 Generate Diff report
 Reconcile Environment
DEMO :
Interacting with MLS Server
Rstudio Desktop (Remote Context),
Rstudio Server
### SETUP HADOOP ENVIRONMENT VARIABLES ###
myHadoopCluster <- RxSpark()
### HADOOP COMPUTE CONTEXT ###
rxSetComputeContext(myHadoopCluster)
### CREATE HDFS, DIRECTORY AND FILE OBJECTS ###
hdfsFS <- RxHdfsFileSystem()
### ANALYTICAL PROCESSING ###
### Statistical Summary of the data
rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)
### CrossTab the data
rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)
### Linear Model and plot
hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet)
plot(hdfsXdfArrLateLinMod$coefficients)
### LOCAL COMPUTE CONTEXT ###
rxSetComputeContext("local")
### CREATE, REFERENCE FILE OBJECTS ###
AirlineDataSet <-
RxTextData(file.path(dataRead,"AirlineDemoSmall.csv")
)
Local Parallel processing – Linux or Windows In – Hadoop (Spark / MR) or SqlServer
ScaleR models can be deployed from a server or edge node to run in Hadoop
without any functional R model re-coding for map-reduce
Functional model R
script – does not
need to change to
run in Hadoop
Compute
context R script
– sets where the
model will run
• Defines where the processing happens
• Current set compute context determines processing
location
• Write Once Deploy Anywhere (WODA) by changing
compute context
"/data/admin.csv"
"/data/admin.csv"
“admintable”
rxSetComputeContext(RxSpark(…))
mydata<-RxTextData("/data/admin.csv", fileSystem = RxHdfsFileSystem())
mylogIt <- rxLogit(admin~ gre + gpa + rank, data = mydata)
Using a Spark
compute context
Keep other code
unchanged
Rx…..Data
DEMO :
Switching Context
single_code_base.R
• Scale via parallel & distributed computation
• Scales via in-cluster & in-database execution
• Escapes R’s memory limitations
• Speed analysis by reducing data
movement
• Stop security risks with in-database & in-
Hadoop
• Deploys R to multiple platforms
f(x)
Task
Task
Task
RevoScaleR Algorithms & Functions
Task
Task
Task
Finalizer
Initiator
Data Larger
than RAM
Internal Flow:
1. Algorithm begins initiator process
2. Initiator distributes work to nodes
3. Finalizer collects results
4. Finalizer iterates or continues
5. Finalizer evaluates final model
6. Returns single model to calling script
… load a large dataset
Model_obj <- rxLinMod(…)
… run a RevoScaleR algorithm
Typical Characteristics
 One call, one answer
 Arbitrarily large data sets
 Arbitrarily large worker task set
 Mathematically the same as single-threaded
 Platform independent
 Most are written in C++ for speed
Remote RevoScaleR Algorithm
Internal Flow:
1. Algorithm on local checks compute context
2. If set remote, packages and ships request
3. Local script blocks (by default) awaiting response
4. Remote unpacks and executes in parallel
5. Remote returns results to local interface
6. Local interface returns results to script
… set compute context to Spark
rxSetComputeContext(RxSpark(…))
Model_obj <- rxLinMod(…)
… run a RevoScaleR algorithm
Big
data
Remote Execution:
 Executes RevoScaleR algos on remote data & CPUs
 “rxSetComputeContext” redirects to remote
 Algorithms in RevoScaleR library redirect as set
 Results are returned to script as though local
Local
RevoScaleR
Algorithm
Algorithm
Interface
Remote
Local
Task
Task
Task
Finalizer
Initiator
SQL Server, Hadoop, Hadoop MapReduce, HDInsight
RevoScaleR Remote Execution
RevoScaleR Spark + Sparklyr + H20
* Interoperability with Sparklyr and H2O
* Best in-class operationalization
* Guaranteed backward compatibility
DEMO :
Data manipulation using sparklyr,
Interoperability between sparklyr and RevoScaleR
IntelligentApps
• Directly integrate intelligence into apps.
• Streamlined real-time scoring:
• Web Services operationalization
Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
Mrsdeploy
RESTful API
deployment
Real-Time
Scoring
Visualization
Tool
Integration
In-database
deployment
https://www.r-bloggers.com/deploying-a-car-price-model-using-
r-and-azureml/
• Seamless integration
with authentication
solution:
LDAP/AD/AAD
• Secure connection:
HTTPS encrypted by
TLS 1.2/SSL
• Compliance with
Microsoft Security
Development
Lifecycle
R
Client
Load
Balancer
• Server level HA:
Introduce multiple
Web Nodes for
Active-Active backup
/ recovery, via load
balancer
• Data Store HA:
leverage Enterprise
grade DB, SQL Server
and Postgres’ HA
capabilities
• Easily create web services from R scripts & models
Build the model first Deploy as a web service instantly
# Run the following code in R
swagger <- api$swagger()
cat(swagger, file = "swagger.json",
append = FALSE)
Generate Swagger
Docs for Web Services
Popular Swagger Tools:
AutoRest or Code Generator
AutoRest.exe -CodeGenerator
CSharp -Modeler Swagger -
Input swagger.json -
Namespace Mynamespace
Run Swagger tools to
generate code
Write a few code to
consume the service
Data Scientist DeveloperDeveloper
DEMO :
Publish Webservice
THANK YOU!
Choice • Open Source: Microsoft R Open:
• Microsoft innovations: Microsoft R Extensions:
Distributed Parallelized Algorithms:
•RevoScaleR library
•MicrosoftML library
•Custom parallelization frameworks
Open source R algorithms
& visualizations:
•CRAN
•bioconductor
Plus:
•Deep Learning
•Pretrained Models
•Prebuilt Featurizers
Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
Variable Selection
Stepwise Regression
Simulation
Simulation (e.g. Monte Carlo)
Parallel Random Number Generation
Cluster Analysis
K-Means
Classification
Decision Trees
Decision Forests
Gradient Boosted Decision Trees
Naïve Bayes
Custom Parallel
rxDataStep
rxExec
rxExecBy – Many-Models
PEMA-R API Custom Algorithms
Data Step
Data import – Delimited, Fixed, SAS, SPSS, ODBC
Variable creation & transformation
Recode variables
Factor variables
Missing value handling
Sort, Merge, Split
Aggregate by category (means, sums)
Descriptive Statistics
Min / Max, Mean, Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product matrix for set variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data (standard tables & long form)
Marginal Summaries of Cross Tabulations
Statistical Tests
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
Sampling
Subsample (observations & variables)
Random Sampling
Predictive Models
Sum of Squares (cross product matrix for set variables)
Quantiles (approx.)
Generalized Linear Models (GLM) exponential family
distributions: binomial, Gaussian, inverse Gaussian, Poisson,
Tweedie. Standard link functions: cauchit, identity, log, logit,
probit. User defined distributions & link functions.
Covariance & Correlation Matrices
Logistic Regression
Linear regression
Predictions/scoring for models
Residuals for all models
Learner Strength Learning tasks supported Sample Applications
Deep Neural Network Supports custom, multi-layer network
topology with filtered, convolutional,
and pooling bundles
Binary classification
Multi-class classification
Regression
Bing Ads Click Prediction
($50M per year revenue
gain);
Image Classification
Logistic regression L1, L2 regularization Binary classification
Multi-class classification.
Classifying user feedback
One-class SVM Easy to train learner for anomaly
detection
Anomaly Detection Fraud detection
Fast Tree Boosted decision tree. Similar to
XGBoost. Supports upto 1B features!
Binary classification
Regression
One of the most popular and
best performing learners
inside Microsoft.
Fast Forest state-of-the-art tree ensembles
(Random Forest)
Binary classification
Regression
Churn Prediction
Fast Linear (SDCA) Speed, scalability and supports L1,L2
regularization
Binary classification,
Regression
Outlook used for email spam
filtering
Transform Strength Description Applications
Text Battle tested, large language
support, performant (Bing, Office)
Performs natural language
processing of free text into
numerical representation.
Support ticket
classification,
Sentiment analysis
Categorical /
Categorical hash
Ease of use; 1 line of code to set Converts categories into
numerical data.
Ad Click Prediction
Feature Selection Ease of use Selects a subset of features to
speed up training time.
Sentiment analysis,
Ad Click Prediction
Pretrained
Image
Featurization
 FeaturizeImage()
– Used to identify parts of
images
– People, things, animals,
etc.
 FeaturizeText()
– Returns Ngram digest
& counts from many
partitions of text data
Text
Featurization
Featurizer
Featurizer
Ngrams
(phrases)
counts
Text
Data
Sets Featurizer
ngram
ngram
ngram
Image
Data
Sets
Image
contents
Featurizer
Featurizer
Image
contents
found
Featurizer
 GetSentiment()
– Pretrained to return
sentiment score (0-1)
– English only for now
Pretrained
Sentiment
Analysis
Featurizer
Featurizer
getSentime
nt()
Text
Data
Sets Featurizer
Sentiment
Score
DistributedModeling
• “Many Models” on Partitioned Data
• Parallelized ensemble modeling
• New rxExecBy framework:
• Superior to Spark gapply:
Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
 rxEnsemble:
– Returns ensembled model
combining multiple types
– Ensembling settings
balance speed & accuracy
Many
Small
Models
Ensemble
Learning
Model 1
Model 2
rxEnsemble
Single or
Distributed
Data Sets
 ManyModels:
– Used to run model on
each of many partitions.
– Returns one model trained
per cohort (partition) of
data.
P3
Model P1
P2
P1
Model P2
Returns a
set of
Models
Data
Partitioned
by Cohort
Model P3
Model P1
Model P2
Model P3
Ensemble
Model
Scale • RevoScaleR
• Microsoft Machine Learning Library
• Create custom parallel algorithms and functions:
Distributed Parallelized Algorithms:
•RevoScaleR library
•MicrosoftML library
•Custom parallelization frameworks
Open source R algorithms
& visualizations:
•CRAN
•bioconductor
Plus:
•Deep Learning
•Pretrained Models
•Prebuilt Featurizers
Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
SQLServer • Fast modeling platform with operationalization in one.
• Minimum data movement speeds & secures
• Direct T-SQL Integration
• Real-Time Scoring Support
• SQL Server security & administration
• Rapid deployment of R models
as stored procedures.
• Python support. Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
Distributed Parallelized Algorithms:
•RevoScaleR library
•MicrosoftML library
•Custom parallelization frameworks
Open source R algorithms
& visualizations:
•CRAN
•bioconductor
Plus:
•Deep Learning
•Pretrained Models
•Prebuilt Featurizers
.csv Microsoft .XDF ODBC/JDBC
(SQL Server 2017)
• Python support
• Microsoft Machine Learning package included
• Process multiple related models in parallel with the rxExecBy function
• Create a shared script library with R script package management
• Native scoring with T-SQL PREDICT
• In-place upgrade of R components
Spark • Distribute RevoScaleR algorithms using Spark
• Spark 1.6 or 2.0 Compatible
• Load Spark Dataframes from:
• Interop with SparkETL, SparkSQL
• Support for HDInsight, Hortonworks, Cloudera
& MapR Hadoop Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
Distributed Parallelized Algorithms:
•RevoScaleR library
•MicrosoftML library
•Custom parallelization frameworks
Open source R algorithms
& visualizations:
•CRAN
•bioconductor
Plus:
•Deep Learning
•Pretrained Models
•Prebuilt Featurizers
.csv Microsoft .XDF ODBC/JDBC
Configuration:
• HDI cluster size: 100 nodes
- Edge node: D14 V2 (16 cores, 112GB)
- Worker Nodes: D12 (4 cores, 28GB)
• Dataset: NYC Taxi dataset, filtered,
transformed, and duplicated
• Number of columns: 19
• Format: CSV
• fs.azure.selfthrottling.read.factor=1
-200
300
800
1300
1800
0 1 2 3 4 5 6 7 8 9 10 11 12 13
ElapsedTime(seconds)
Billions of rows
rxLogit on a 100 node HDInsight cluster
2.2 TB
IntelligentApps
• Directly integrate intelligence into apps.
• Streamlined real-time scoring:
• Web Services operationalization
Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
Mrsdeploy
RESTful API
deployment
Real-Time
Scoring
Visualization
Tool
Integration
In-database
deployment
Built-in remote execute
functions in R Client/R Server
Tools to reconcile local and
remote
Execute .R script or
interactive R commands
Return results to local
Generate working snapshots
for resume and reuse
IDE agnostic
R Server
configured to
(Support Window Server, Linux
Server, Hadoop )
Remote Execute R Scripts
 Execute R Scripts
 Snapshot remote env.
 Logout remote server
 Login remote server
 Generate Diff report
 Reconcile Environment
Agility • Visual Studio
• Support for open source R Studio GUI
• Support for JupyterR
• Microsoft contributing to Rattle
Rattle
Platforms and Data
Tools
Language
Algorithms
Data Sources
Operationalization
recognized leader
broadest R capabilities
performance and scale
architectural flexibility
Rapid deployment
Harnesses the hybrid cloud
Scales R for big data workloads
Blends best of open source and Microsoft technologies
platform choice
Machine
Learning
Services
In-Database analytics with SQL Server
In SQL Server 2016, Microsoft launched two server platforms for integrating
the popular open source R language with business applications:
• SQL Server R Services (In-Database), for integration with SQL Server
• Machine Learning Server, for enterprise-level R deployments on Windows and Linux servers
In SQL Server 2017, the name has been changed to reflect support for the
popular Python language:
• SQL Server Machine Learning Services (In-Database) supports both R and Python for in-
database analytics
• Microsoft Machine Learning Server supports R and Python deployments on Windows
servers—expansion to other supported platforms is planned for late 2017
Capability
• Extensible in-database analytics, integrated with R,
exposed through T-SQL
• Centralized enterprise library for analytic models
Benefits
SQL Server
Analytical engines
Integrate with R/Python
Data management layer
Relational data
Use T-SQL interface
Stream data in-memory
Analytics library
Share and collaborate
Manage and deploy
R
Data scientists
Business
analysts
Publish algorithms, interact
directly with data
Analyze through T-SQL,
tools, and vetted algorithms
DBAs
Manage storage and
analytics together
Machine Learning Services
SQL Server
2017 setup
Install Machine
Learning Services
(In-Database)
Consent to install
Microsoft R
Open/Python
Optional: Install R
packages
on SQL Server
2017 machine
Database
configuration
Enable R
language extension
in database
Configure path
for RRO runtime
in database
Grant EXECUTE
EXTERNAL SCRIPT
permission to users
CREATE EXTERNAL EXTENSION [R]
USING SYSTEM LAUNCHER
WITH (RUNTIME_PATH =
'c:revolutionbin‘)
GRANT EXECUTE SCRIPT ON
EXTERNAL EXTENSION::R TO
DataScientistsRole;
/* User-defined role / users */
ML runtime
usage
Resource governance
via resource pool
Monitoring via DMVs
Troubleshooting via
XEvents/ DMVs
CREATE RESOURCE POOL ML_runtimes FOR
EXTERNAL EXTENSION
WITH MAX_CPU_PERCENT = 20,
MAX_MEMORY_PERCENT = 10;
select * from
sys.dm_resource_governor_resouce_pools
where name = ‘ML_runtimes';
• Original R script:
• IrisPredict <- function(data, model){
• library(e1071)
• predicted_species <- predict(model, data)
• return(predicted_species)
• }
•
•
• library(RODBC)
• conn <- odbcConnect("MySqlAzure", uid = myUser, pwd =
myPassword);
• Iris_data <-sqlFetch(conn, "Iris_Data");
• Iris_model <-sqlQuery(conn, "select model from
my_iris_model");
• IrisPredict (Iris_data, model);
• Calling R script from SQL Server:
• /* Input table schema */
• create table Iris_Data (name varchar(100), length int, width
int);
• /* Model table schema */
• create table my_iris_model (model varbinary(max));
•
• declare @iris_model varbinary(max) = (select model from
my_iris_model);
• exec sp_execute_external_script
• @language = 'R'
• , @script = '
• IrisPredict <- function(data, model){
• library(e1071)
• predicted_species <- predict(model, data)
• return(predicted_species)
• }
• IrisPredict(input_data_1, model);
• '
• , @parallel = default
• , @input_data_1 = N'select * from Iris_Data'
• , @params = N'@model varbinary(max)'
• , @model = @iris_model
• with result sets ((name varchar(100), length int, width int
• , species varchar(30)));
• Values highlighted in yellow are SQL queries embedded in the original R script
• Values highlighted in aqua are R variables that bind to SQL variables by name
launchpad.exe
sp_execute_external_script
sqlservr.exe Named pipe
SQLOS
XEvent
MSSQLSERVER Service MSSQLLAUNCHPAD Service
(one per SQL Server instance)
What and how
to launch
“launcher”
Bxlserver.exe
sqlsatellite.dll
Bxlserver.exe
sqlsatellite.dll
Windows Satellite
Process
sqlsatellite.dll
Run query
• More efficient than standalone clients
• Data does not all have to fit in memory
• Reduced data transmission over the network
• Most R Open (and Python R) functions are single threaded
• We can stream data in parallel and batches from SQL Server to/from script
• Use the power of SQL Server and ML to develop, train, and operationalize
• SQL Server compute context (remote compute context)
• T-SQL queries
• Memory-optimized tables
• Columnstore indexes
• Data compression
• Parallel query execution
• Stored procedures
Reduced surface area
and isolation
“external scripts enabled”
is required
Script execution outside of
SQL Server process space
Script execution
requires explicit
permission
sp_execute_external_script requires
EXECUTE ANY EXTERNAL SCRIPT
for non-admins
SQL Server login/user required
and db/table access
Satellite processes have
limited privileges
Satellite processes run under low
privileged, local user accounts
in the SQLRUserGroup
Each execution is isolated
— different users with
different accounts
Windows firewall rules
block outbound traffic
MicrosoftML is a package for Machine Learning Server, Microsoft R Client, and
SQL Server Machine Learning Services that adds state-of-the-art data
transforms, machine learning algorithms, and pretrained models to Microsoft R
functionality.
• Data transforms helps you to compose, in a pipeline, a custom set of transforms that are
applied to your data before training or testing. The primary purpose of these transforms is
to allow you to featurize your data.
• Machine learning algorithms enable you to tackle common machine learning tasks such as
classification, regression and anomaly detection. You run these high-performance functions
locally on Windows or Linux machines or on Azure HDInsight (Hadoop/Spark) clusters.
• Pretrained models for sentiment analysis and image featurization can also be installed and
deployed with the MicrosoftML package.
Identify Server
Set Source
Set Context
Use
• Run R or Python inside SQL Server 2017 with ML Services
• Existing SQL 2016 clients can R using SQL R Services
SQL
Server
2017 Run R engine
from within
the Query
Processor
SQL
Server
2016/17 Run R From
within the
Query
Processor
Move
BIG
Work to
the Data
T-SQL
Apps
T-SQL
Script
T-SQL
Stored
ProcedureSQL
Server
2016/17
Enable smart
non-R apps
BI & Reporting;
Web apps
T-SQL
Script
R
Engine
R
script
Large Data Sets in Chunks
Remote
Execution
Context
Results
Parallel
Worker
Tasks
Parallel
Algorithm
Iterate/
Sequence
SQL
ApplicationsT-SQL + SQL
Server
2016/17
Events
T-SQL
Production
Apps
Models
SQL
Server
2017
Real time
scoring
engine
Stored Proc’s
and Triggers
Events
Use familiar T-SQL
stored procedures to
invoke R scripts from
your application
• Turn R analytics  Web
services in one line of
code;
• Swagger-based REST
APIs, easy to consume,
with any programming
languages, including R!
• Deploying web service
server to any platform:
Windows, SQL,
Linux/Hadoop
• On-prem or in cloud
• Fast scoring, real time
and batch
• Scaling to a grid for
powerful computing with
load balancing
• Diagnostic and capacity
evaluation tools
• Enterprise
authentication:
AD/LDAP or AAD
• Secure connection:
HTTPS with SSL/TLS 1.2
• Enterprise grade high
availability
Instant Deployment Deploy to Anywhere Fast and Scalable Secure and Reliable
Data Scientist
Developer
Easy Integration
Easy Deployment
Easy Setup
 In-cloud or on-prem
 Adding nodes to scale
 High availability & load balancing
 Remote execution server
Machine Learning
Server
configured for
operationalizing R analytics
Microsoft R Client
(mrsdeploy package)
Easy Consumption
publishServiceMicrosoft R Client
(mrsdeploy package)
Data Scientist
• Seamless integration
with authentication
solution:
LDAP/AD/AAD
• Secure connection:
HTTPS encrypted by
TLS 1.2/SSL
• Compliance with
Microsoft Security
Development
Lifecycle
R
Client
Load
Balancer
• Server level HA:
Introduce multiple
Web Nodes for
Active-Active backup
/ recovery, via load
balancer
• Data Store HA:
leverage Enterprise
grade DB, SQL Server
and Postgres’ HA
capabilities
• Easily create web services from R scripts & models
Build the model first Deploy as a web service instantly
Function Description
publishService
Publish a predictive function as a Web
Service
deleteService Delete a Web Service
getService Get a Web Service
ListServices List the different published web services
serviceOption
Retrieve, set, and list the different service
options
updateService Updates a Web Service
# Run the following code in R
swagger <- api$swagger()
cat(swagger, file = "swagger.json",
append = FALSE)
Generate Swagger
Docs for Web Services
Popular Swagger Tools:
AutoRest or Code Generator
AutoRest.exe -CodeGenerator
CSharp -Modeler Swagger -
Input swagger.json -
Namespace Mynamespace
Run Swagger tools to
generate code
Write a few code to
consume the service
Data Scientist DeveloperDeveloper
• Easily scale up a single
server to a grid to
handle more
concurrent requests
• Load balancing cross
compute nodes
• Scale overall
performance with
shared pools of
warmed up R shells.
R
Client
Snapshot Functions
createSnapshot
Create a snapshot of the remote session (workspace and
working directory)
loadSnapshot
Load a snapshot from the server into the remote session
(workspace and working directory)
listSnapshots Get a list of snapshots for the current user
downloadSnapshot Download a snapshot from the server
deleteSnapshot Delete a snapshot from the server
Remote Objects Management
listRemoteFiles
Get a list of files in the working directory of the remote
session
deleteRemoteFile
Delete a file from the working directory of the remote
R session
getRemoteFile
Copy a file from the working directory of the remote R
session
putLocalFile
Copy a file from the local machine to the working
directory of the remote R session
getRemoteObject Get an object from the remote R session
putLocalObject
Put an object from the local R session and load it into
the remote R session
getRemoteWorkspace
Take all objects from the remote R session and load
them into the local R session
putLocalWorkspace
Take all objects from the local R session and load them
into the remote R session
Remote Connection
remoteLogin Remote login to the R Server with AD or admin credentials
remoteLoginAAD Remote login to R Server server using Azure AD
remoteLogout Logout of the remote session on the DeployR Server.
Remote Execution
remoteExecute Remote execution of either R code or an R script
remoteScript Wrapper function for remote script execution
diffLocalRemote Generate a 'diff' report between local and remote
pause Pause remote connection and back to local
resume Return the user to the 'REMOTE >' command prompt
Ml2

Weitere ähnliche Inhalte

Was ist angesagt?

An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckData Con LA
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkTaras Matyashovsky
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer AgarwalDatabricks
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesAnand Narayanan
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchDataWorks Summit/Hadoop Summit
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaAbhinav Singh
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applicationsJoey Echeverria
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processingprajods
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Spark r under the hood with Hossein Falaki
Spark r under the hood with Hossein FalakiSpark r under the hood with Hossein Falaki
Spark r under the hood with Hossein FalakiDatabricks
 

Was ist angesagt? (20)

An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuck
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba Search
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processing
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Spark Working Environment in Windows OS
Spark Working Environment in Windows OSSpark Working Environment in Windows OS
Spark Working Environment in Windows OS
 
Spark r under the hood with Hossein Falaki
Spark r under the hood with Hossein FalakiSpark r under the hood with Hossein Falaki
Spark r under the hood with Hossein Falaki
 

Ähnlich wie Ml2

Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Mark Tabladillo
 
Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)Cheah Eng Soon
 
Introduction to Microsoft R
Introduction to Microsoft RIntroduction to Microsoft R
Introduction to Microsoft RCheah Eng Soon
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개r-kor
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQLMSDEVMTL
 
SQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should KnowSQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should KnowBob Ward
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
Analysing big data with cluster service and R
Analysing big data with cluster service and RAnalysing big data with cluster service and R
Analysing big data with cluster service and RLushi Chen
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 

Ähnlich wie Ml2 (20)

Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)Introduction to Microsoft R (Graph)
Introduction to Microsoft R (Graph)
 
Introduction to Microsoft R
Introduction to Microsoft RIntroduction to Microsoft R
Introduction to Microsoft R
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
 
SQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should KnowSQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should Know
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Analysing big data with cluster service and R
Analysing big data with cluster service and RAnalysing big data with cluster service and R
Analysing big data with cluster service and R
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 

Mehr von poovarasu maniandan (12)

Spark7
Spark7Spark7
Spark7
 
Spark4
Spark4Spark4
Spark4
 
Spark3
Spark3Spark3
Spark3
 
Spark2
Spark2Spark2
Spark2
 
Ml3
Ml3Ml3
Ml3
 
Ml8
Ml8Ml8
Ml8
 
Ml7
Ml7Ml7
Ml7
 
Ml5
Ml5Ml5
Ml5
 
Blue arm
Blue armBlue arm
Blue arm
 
Literature survey
Literature surveyLiterature survey
Literature survey
 
Home security system using internet of things
Home security system using internet of thingsHome security system using internet of things
Home security system using internet of things
 
rescue robot
rescue robotrescue robot
rescue robot
 

Kürzlich hochgeladen

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 

Kürzlich hochgeladen (20)

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 

Ml2

  • 1.
  • 2. Disclaimer: This is NOT an R , Python , Hadoop or Spark Overview
  • 3. • From its inception, R was designed to use only a single thread (processor) at a time. Even today, R works that way unless linked with multi-threaded BLAS/LAPACK libraries. • Microsoft R Open : automatically use all available cores and processors to significantly reduce computation times — without the need to change a line of your R code.
  • 4. broad range of scalable and distributed R & Python functions
  • 5. Platforms & Data Tools Languages Algorithms Data Sources Rattle Mrsdeploy RESTful API deployment Real-Time Scoring Visualization Tool Integration .csv Microsoft .XDF In-database deployment Operationalization Distributed Parallelized Algorithms: •RevoScaleR library •MicrosoftML library •Custom parallelization frameworks Open source R algorithms & visualizations: •CRAN •bioconductor Plus: •Deep Learning •Pretrained Models •Prebuilt Featurizers ODBC/JDBC
  • 6. • Small data many models • Hybrid • Large scale machine learning • Batch training and scoring • Model deployment for LOB applications
  • 7.
  • 8. Machine Learning and Analytics Big Data StoresInformation Management Big Data as a cornerstone for Azure Data Services Transform data into intelligent actions and predictions Action People Automated Systems Apps Web Mobile Bots Event Hubs Data Catalog Data Factory HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data Cosmos DB Intelligence Dashboards & Visualizations Cortana Bot Service Cognitive Services Power BI Azure Analysis Services SQL Database Batch AI Blob Store Databricks
  • 9. • • • Support: • Inadequacy of Community Support • Application Governance: • Commonly prohibited in production systems without commercial vendor
  • 11. • Support for 1.6, 2.0, 2.1 and 2.2 • Support for all three distributions of Hadoop on all three flavors of Linux • Inherits high-performance big-data processing characteristics from Spark • Easily integrate with different data sources e.g. HIVE, Parquet , Orc, XDF, csv, etc… • E2E framework for running parallel workloads • Interoperability with Sparklyr and H2O • Best in-class operationalization • Guaranteed backward compatibility
  • 12. R User Workstation MLS Server for Hadoop Data Frames YARN Resource Management Spark Executor Worker Task Spark Executor Worker Task Spark Executor Worker Task .csv, xdf, hive, parquet, ORC ScaleR Master Task Finalizer Initiator Edge Node Spark Driver Ver. 1.6 or 2.0 RStudio Desktop/Server Microsoft R Server
  • 13. Data Frames Worker Task Worker Task Worker Task ScaleR Master Task Finalizer Initiator Remote Execution: ssh Web Services MRSDeplo y R Tools (VSCode, Vstudio,Rstudio, …) BI Tools & Applications Jupyter Notebooks Thin Client IDEs https:// https:// Edge Node RStudio Rstudio Server
  • 14. Built-in remote execute functions in R Client/R Server Tools to reconcile local and remote Execute .R script or interactive R commands Return results to local Generate working snapshots for resume and reuse IDE agnostic MLS Server configured to (Support Window Server, Linux Server, Hadoop ) Remote Execute R Scripts  Execute R Scripts  Snapshot remote env.  Logout remote server  Login remote server  Generate Diff report  Reconcile Environment
  • 15. DEMO : Interacting with MLS Server Rstudio Desktop (Remote Context), Rstudio Server
  • 16. ### SETUP HADOOP ENVIRONMENT VARIABLES ### myHadoopCluster <- RxSpark() ### HADOOP COMPUTE CONTEXT ### rxSetComputeContext(myHadoopCluster) ### CREATE HDFS, DIRECTORY AND FILE OBJECTS ### hdfsFS <- RxHdfsFileSystem() ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1) ### CrossTab the data rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T) ### Linear Model and plot hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(hdfsXdfArrLateLinMod$coefficients) ### LOCAL COMPUTE CONTEXT ### rxSetComputeContext("local") ### CREATE, REFERENCE FILE OBJECTS ### AirlineDataSet <- RxTextData(file.path(dataRead,"AirlineDemoSmall.csv") ) Local Parallel processing – Linux or Windows In – Hadoop (Spark / MR) or SqlServer ScaleR models can be deployed from a server or edge node to run in Hadoop without any functional R model re-coding for map-reduce Functional model R script – does not need to change to run in Hadoop Compute context R script – sets where the model will run
  • 17. • Defines where the processing happens • Current set compute context determines processing location • Write Once Deploy Anywhere (WODA) by changing compute context
  • 21. rxSetComputeContext(RxSpark(…)) mydata<-RxTextData("/data/admin.csv", fileSystem = RxHdfsFileSystem()) mylogIt <- rxLogit(admin~ gre + gpa + rank, data = mydata) Using a Spark compute context Keep other code unchanged
  • 24. • Scale via parallel & distributed computation • Scales via in-cluster & in-database execution • Escapes R’s memory limitations • Speed analysis by reducing data movement • Stop security risks with in-database & in- Hadoop • Deploys R to multiple platforms f(x) Task Task Task
  • 25. RevoScaleR Algorithms & Functions Task Task Task Finalizer Initiator Data Larger than RAM Internal Flow: 1. Algorithm begins initiator process 2. Initiator distributes work to nodes 3. Finalizer collects results 4. Finalizer iterates or continues 5. Finalizer evaluates final model 6. Returns single model to calling script … load a large dataset Model_obj <- rxLinMod(…) … run a RevoScaleR algorithm Typical Characteristics  One call, one answer  Arbitrarily large data sets  Arbitrarily large worker task set  Mathematically the same as single-threaded  Platform independent  Most are written in C++ for speed
  • 26. Remote RevoScaleR Algorithm Internal Flow: 1. Algorithm on local checks compute context 2. If set remote, packages and ships request 3. Local script blocks (by default) awaiting response 4. Remote unpacks and executes in parallel 5. Remote returns results to local interface 6. Local interface returns results to script … set compute context to Spark rxSetComputeContext(RxSpark(…)) Model_obj <- rxLinMod(…) … run a RevoScaleR algorithm Big data Remote Execution:  Executes RevoScaleR algos on remote data & CPUs  “rxSetComputeContext” redirects to remote  Algorithms in RevoScaleR library redirect as set  Results are returned to script as though local Local RevoScaleR Algorithm Algorithm Interface Remote Local Task Task Task Finalizer Initiator SQL Server, Hadoop, Hadoop MapReduce, HDInsight RevoScaleR Remote Execution
  • 27.
  • 28.
  • 29. RevoScaleR Spark + Sparklyr + H20 * Interoperability with Sparklyr and H2O * Best in-class operationalization * Guaranteed backward compatibility DEMO : Data manipulation using sparklyr, Interoperability between sparklyr and RevoScaleR
  • 30. IntelligentApps • Directly integrate intelligence into apps. • Streamlined real-time scoring: • Web Services operationalization Platforms and Data Tools Language Algorithms Data Sources Operationalization Mrsdeploy RESTful API deployment Real-Time Scoring Visualization Tool Integration In-database deployment
  • 32. • Seamless integration with authentication solution: LDAP/AD/AAD • Secure connection: HTTPS encrypted by TLS 1.2/SSL • Compliance with Microsoft Security Development Lifecycle R Client
  • 33. Load Balancer • Server level HA: Introduce multiple Web Nodes for Active-Active backup / recovery, via load balancer • Data Store HA: leverage Enterprise grade DB, SQL Server and Postgres’ HA capabilities
  • 34. • Easily create web services from R scripts & models Build the model first Deploy as a web service instantly
  • 35. # Run the following code in R swagger <- api$swagger() cat(swagger, file = "swagger.json", append = FALSE) Generate Swagger Docs for Web Services Popular Swagger Tools: AutoRest or Code Generator AutoRest.exe -CodeGenerator CSharp -Modeler Swagger - Input swagger.json - Namespace Mynamespace Run Swagger tools to generate code Write a few code to consume the service Data Scientist DeveloperDeveloper
  • 37.
  • 39. Choice • Open Source: Microsoft R Open: • Microsoft innovations: Microsoft R Extensions: Distributed Parallelized Algorithms: •RevoScaleR library •MicrosoftML library •Custom parallelization frameworks Open source R algorithms & visualizations: •CRAN •bioconductor Plus: •Deep Learning •Pretrained Models •Prebuilt Featurizers Platforms and Data Tools Language Algorithms Data Sources Operationalization
  • 40. Variable Selection Stepwise Regression Simulation Simulation (e.g. Monte Carlo) Parallel Random Number Generation Cluster Analysis K-Means Classification Decision Trees Decision Forests Gradient Boosted Decision Trees Naïve Bayes Custom Parallel rxDataStep rxExec rxExecBy – Many-Models PEMA-R API Custom Algorithms Data Step Data import – Delimited, Fixed, SAS, SPSS, ODBC Variable creation & transformation Recode variables Factor variables Missing value handling Sort, Merge, Split Aggregate by category (means, sums) Descriptive Statistics Min / Max, Mean, Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix for set variables) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data (standard tables & long form) Marginal Summaries of Cross Tabulations Statistical Tests Chi Square Test Kendall Rank Correlation Fisher’s Exact Test Student’s t-Test Sampling Subsample (observations & variables) Random Sampling Predictive Models Sum of Squares (cross product matrix for set variables) Quantiles (approx.) Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. Covariance & Correlation Matrices Logistic Regression Linear regression Predictions/scoring for models Residuals for all models
  • 41. Learner Strength Learning tasks supported Sample Applications Deep Neural Network Supports custom, multi-layer network topology with filtered, convolutional, and pooling bundles Binary classification Multi-class classification Regression Bing Ads Click Prediction ($50M per year revenue gain); Image Classification Logistic regression L1, L2 regularization Binary classification Multi-class classification. Classifying user feedback One-class SVM Easy to train learner for anomaly detection Anomaly Detection Fraud detection Fast Tree Boosted decision tree. Similar to XGBoost. Supports upto 1B features! Binary classification Regression One of the most popular and best performing learners inside Microsoft. Fast Forest state-of-the-art tree ensembles (Random Forest) Binary classification Regression Churn Prediction Fast Linear (SDCA) Speed, scalability and supports L1,L2 regularization Binary classification, Regression Outlook used for email spam filtering Transform Strength Description Applications Text Battle tested, large language support, performant (Bing, Office) Performs natural language processing of free text into numerical representation. Support ticket classification, Sentiment analysis Categorical / Categorical hash Ease of use; 1 line of code to set Converts categories into numerical data. Ad Click Prediction Feature Selection Ease of use Selects a subset of features to speed up training time. Sentiment analysis, Ad Click Prediction
  • 42. Pretrained Image Featurization  FeaturizeImage() – Used to identify parts of images – People, things, animals, etc.  FeaturizeText() – Returns Ngram digest & counts from many partitions of text data Text Featurization Featurizer Featurizer Ngrams (phrases) counts Text Data Sets Featurizer ngram ngram ngram Image Data Sets Image contents Featurizer Featurizer Image contents found Featurizer  GetSentiment() – Pretrained to return sentiment score (0-1) – English only for now Pretrained Sentiment Analysis Featurizer Featurizer getSentime nt() Text Data Sets Featurizer Sentiment Score
  • 43. DistributedModeling • “Many Models” on Partitioned Data • Parallelized ensemble modeling • New rxExecBy framework: • Superior to Spark gapply: Platforms and Data Tools Language Algorithms Data Sources Operationalization
  • 44.  rxEnsemble: – Returns ensembled model combining multiple types – Ensembling settings balance speed & accuracy Many Small Models Ensemble Learning Model 1 Model 2 rxEnsemble Single or Distributed Data Sets  ManyModels: – Used to run model on each of many partitions. – Returns one model trained per cohort (partition) of data. P3 Model P1 P2 P1 Model P2 Returns a set of Models Data Partitioned by Cohort Model P3 Model P1 Model P2 Model P3 Ensemble Model
  • 45. Scale • RevoScaleR • Microsoft Machine Learning Library • Create custom parallel algorithms and functions: Distributed Parallelized Algorithms: •RevoScaleR library •MicrosoftML library •Custom parallelization frameworks Open source R algorithms & visualizations: •CRAN •bioconductor Plus: •Deep Learning •Pretrained Models •Prebuilt Featurizers Platforms and Data Tools Language Algorithms Data Sources Operationalization
  • 46.
  • 47. SQLServer • Fast modeling platform with operationalization in one. • Minimum data movement speeds & secures • Direct T-SQL Integration • Real-Time Scoring Support • SQL Server security & administration • Rapid deployment of R models as stored procedures. • Python support. Platforms and Data Tools Language Algorithms Data Sources Operationalization Distributed Parallelized Algorithms: •RevoScaleR library •MicrosoftML library •Custom parallelization frameworks Open source R algorithms & visualizations: •CRAN •bioconductor Plus: •Deep Learning •Pretrained Models •Prebuilt Featurizers .csv Microsoft .XDF ODBC/JDBC
  • 48.
  • 49. (SQL Server 2017) • Python support • Microsoft Machine Learning package included • Process multiple related models in parallel with the rxExecBy function • Create a shared script library with R script package management • Native scoring with T-SQL PREDICT • In-place upgrade of R components
  • 50. Spark • Distribute RevoScaleR algorithms using Spark • Spark 1.6 or 2.0 Compatible • Load Spark Dataframes from: • Interop with SparkETL, SparkSQL • Support for HDInsight, Hortonworks, Cloudera & MapR Hadoop Platforms and Data Tools Language Algorithms Data Sources Operationalization Distributed Parallelized Algorithms: •RevoScaleR library •MicrosoftML library •Custom parallelization frameworks Open source R algorithms & visualizations: •CRAN •bioconductor Plus: •Deep Learning •Pretrained Models •Prebuilt Featurizers .csv Microsoft .XDF ODBC/JDBC
  • 51. Configuration: • HDI cluster size: 100 nodes - Edge node: D14 V2 (16 cores, 112GB) - Worker Nodes: D12 (4 cores, 28GB) • Dataset: NYC Taxi dataset, filtered, transformed, and duplicated • Number of columns: 19 • Format: CSV • fs.azure.selfthrottling.read.factor=1 -200 300 800 1300 1800 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ElapsedTime(seconds) Billions of rows rxLogit on a 100 node HDInsight cluster 2.2 TB
  • 52.
  • 53. IntelligentApps • Directly integrate intelligence into apps. • Streamlined real-time scoring: • Web Services operationalization Platforms and Data Tools Language Algorithms Data Sources Operationalization Mrsdeploy RESTful API deployment Real-Time Scoring Visualization Tool Integration In-database deployment
  • 54. Built-in remote execute functions in R Client/R Server Tools to reconcile local and remote Execute .R script or interactive R commands Return results to local Generate working snapshots for resume and reuse IDE agnostic R Server configured to (Support Window Server, Linux Server, Hadoop ) Remote Execute R Scripts  Execute R Scripts  Snapshot remote env.  Logout remote server  Login remote server  Generate Diff report  Reconcile Environment
  • 55. Agility • Visual Studio • Support for open source R Studio GUI • Support for JupyterR • Microsoft contributing to Rattle Rattle Platforms and Data Tools Language Algorithms Data Sources Operationalization
  • 56.
  • 57.
  • 58. recognized leader broadest R capabilities performance and scale architectural flexibility Rapid deployment Harnesses the hybrid cloud Scales R for big data workloads Blends best of open source and Microsoft technologies platform choice
  • 59.
  • 61. In-Database analytics with SQL Server In SQL Server 2016, Microsoft launched two server platforms for integrating the popular open source R language with business applications: • SQL Server R Services (In-Database), for integration with SQL Server • Machine Learning Server, for enterprise-level R deployments on Windows and Linux servers In SQL Server 2017, the name has been changed to reflect support for the popular Python language: • SQL Server Machine Learning Services (In-Database) supports both R and Python for in- database analytics • Microsoft Machine Learning Server supports R and Python deployments on Windows servers—expansion to other supported platforms is planned for late 2017
  • 62. Capability • Extensible in-database analytics, integrated with R, exposed through T-SQL • Centralized enterprise library for analytic models Benefits SQL Server Analytical engines Integrate with R/Python Data management layer Relational data Use T-SQL interface Stream data in-memory Analytics library Share and collaborate Manage and deploy R Data scientists Business analysts Publish algorithms, interact directly with data Analyze through T-SQL, tools, and vetted algorithms DBAs Manage storage and analytics together Machine Learning Services
  • 63. SQL Server 2017 setup Install Machine Learning Services (In-Database) Consent to install Microsoft R Open/Python Optional: Install R packages on SQL Server 2017 machine Database configuration Enable R language extension in database Configure path for RRO runtime in database Grant EXECUTE EXTERNAL SCRIPT permission to users CREATE EXTERNAL EXTENSION [R] USING SYSTEM LAUNCHER WITH (RUNTIME_PATH = 'c:revolutionbin‘) GRANT EXECUTE SCRIPT ON EXTERNAL EXTENSION::R TO DataScientistsRole; /* User-defined role / users */
  • 64. ML runtime usage Resource governance via resource pool Monitoring via DMVs Troubleshooting via XEvents/ DMVs CREATE RESOURCE POOL ML_runtimes FOR EXTERNAL EXTENSION WITH MAX_CPU_PERCENT = 20, MAX_MEMORY_PERCENT = 10; select * from sys.dm_resource_governor_resouce_pools where name = ‘ML_runtimes';
  • 65. • Original R script: • IrisPredict <- function(data, model){ • library(e1071) • predicted_species <- predict(model, data) • return(predicted_species) • } • • • library(RODBC) • conn <- odbcConnect("MySqlAzure", uid = myUser, pwd = myPassword); • Iris_data <-sqlFetch(conn, "Iris_Data"); • Iris_model <-sqlQuery(conn, "select model from my_iris_model"); • IrisPredict (Iris_data, model); • Calling R script from SQL Server: • /* Input table schema */ • create table Iris_Data (name varchar(100), length int, width int); • /* Model table schema */ • create table my_iris_model (model varbinary(max)); • • declare @iris_model varbinary(max) = (select model from my_iris_model); • exec sp_execute_external_script • @language = 'R' • , @script = ' • IrisPredict <- function(data, model){ • library(e1071) • predicted_species <- predict(model, data) • return(predicted_species) • } • IrisPredict(input_data_1, model); • ' • , @parallel = default • , @input_data_1 = N'select * from Iris_Data' • , @params = N'@model varbinary(max)' • , @model = @iris_model • with result sets ((name varchar(100), length int, width int • , species varchar(30))); • Values highlighted in yellow are SQL queries embedded in the original R script • Values highlighted in aqua are R variables that bind to SQL variables by name
  • 66. launchpad.exe sp_execute_external_script sqlservr.exe Named pipe SQLOS XEvent MSSQLSERVER Service MSSQLLAUNCHPAD Service (one per SQL Server instance) What and how to launch “launcher” Bxlserver.exe sqlsatellite.dll Bxlserver.exe sqlsatellite.dll Windows Satellite Process sqlsatellite.dll Run query
  • 67. • More efficient than standalone clients • Data does not all have to fit in memory • Reduced data transmission over the network • Most R Open (and Python R) functions are single threaded • We can stream data in parallel and batches from SQL Server to/from script • Use the power of SQL Server and ML to develop, train, and operationalize • SQL Server compute context (remote compute context) • T-SQL queries • Memory-optimized tables • Columnstore indexes • Data compression • Parallel query execution • Stored procedures
  • 68. Reduced surface area and isolation “external scripts enabled” is required Script execution outside of SQL Server process space Script execution requires explicit permission sp_execute_external_script requires EXECUTE ANY EXTERNAL SCRIPT for non-admins SQL Server login/user required and db/table access Satellite processes have limited privileges Satellite processes run under low privileged, local user accounts in the SQLRUserGroup Each execution is isolated — different users with different accounts Windows firewall rules block outbound traffic
  • 69. MicrosoftML is a package for Machine Learning Server, Microsoft R Client, and SQL Server Machine Learning Services that adds state-of-the-art data transforms, machine learning algorithms, and pretrained models to Microsoft R functionality. • Data transforms helps you to compose, in a pipeline, a custom set of transforms that are applied to your data before training or testing. The primary purpose of these transforms is to allow you to featurize your data. • Machine learning algorithms enable you to tackle common machine learning tasks such as classification, regression and anomaly detection. You run these high-performance functions locally on Windows or Linux machines or on Azure HDInsight (Hadoop/Spark) clusters. • Pretrained models for sentiment analysis and image featurization can also be installed and deployed with the MicrosoftML package.
  • 71. • Run R or Python inside SQL Server 2017 with ML Services • Existing SQL 2016 clients can R using SQL R Services SQL Server 2017 Run R engine from within the Query Processor
  • 72. SQL Server 2016/17 Run R From within the Query Processor Move BIG Work to the Data T-SQL Apps T-SQL Script
  • 73. T-SQL Stored ProcedureSQL Server 2016/17 Enable smart non-R apps BI & Reporting; Web apps T-SQL Script R Engine R script
  • 74. Large Data Sets in Chunks Remote Execution Context Results Parallel Worker Tasks Parallel Algorithm Iterate/ Sequence SQL ApplicationsT-SQL + SQL Server 2016/17
  • 76.
  • 77. Use familiar T-SQL stored procedures to invoke R scripts from your application
  • 78. • Turn R analytics  Web services in one line of code; • Swagger-based REST APIs, easy to consume, with any programming languages, including R! • Deploying web service server to any platform: Windows, SQL, Linux/Hadoop • On-prem or in cloud • Fast scoring, real time and batch • Scaling to a grid for powerful computing with load balancing • Diagnostic and capacity evaluation tools • Enterprise authentication: AD/LDAP or AAD • Secure connection: HTTPS with SSL/TLS 1.2 • Enterprise grade high availability Instant Deployment Deploy to Anywhere Fast and Scalable Secure and Reliable
  • 79. Data Scientist Developer Easy Integration Easy Deployment Easy Setup  In-cloud or on-prem  Adding nodes to scale  High availability & load balancing  Remote execution server Machine Learning Server configured for operationalizing R analytics Microsoft R Client (mrsdeploy package) Easy Consumption publishServiceMicrosoft R Client (mrsdeploy package) Data Scientist
  • 80. • Seamless integration with authentication solution: LDAP/AD/AAD • Secure connection: HTTPS encrypted by TLS 1.2/SSL • Compliance with Microsoft Security Development Lifecycle R Client
  • 81. Load Balancer • Server level HA: Introduce multiple Web Nodes for Active-Active backup / recovery, via load balancer • Data Store HA: leverage Enterprise grade DB, SQL Server and Postgres’ HA capabilities
  • 82. • Easily create web services from R scripts & models Build the model first Deploy as a web service instantly
  • 83. Function Description publishService Publish a predictive function as a Web Service deleteService Delete a Web Service getService Get a Web Service ListServices List the different published web services serviceOption Retrieve, set, and list the different service options updateService Updates a Web Service
  • 84. # Run the following code in R swagger <- api$swagger() cat(swagger, file = "swagger.json", append = FALSE) Generate Swagger Docs for Web Services Popular Swagger Tools: AutoRest or Code Generator AutoRest.exe -CodeGenerator CSharp -Modeler Swagger - Input swagger.json - Namespace Mynamespace Run Swagger tools to generate code Write a few code to consume the service Data Scientist DeveloperDeveloper
  • 85.
  • 86. • Easily scale up a single server to a grid to handle more concurrent requests • Load balancing cross compute nodes • Scale overall performance with shared pools of warmed up R shells. R Client
  • 87. Snapshot Functions createSnapshot Create a snapshot of the remote session (workspace and working directory) loadSnapshot Load a snapshot from the server into the remote session (workspace and working directory) listSnapshots Get a list of snapshots for the current user downloadSnapshot Download a snapshot from the server deleteSnapshot Delete a snapshot from the server Remote Objects Management listRemoteFiles Get a list of files in the working directory of the remote session deleteRemoteFile Delete a file from the working directory of the remote R session getRemoteFile Copy a file from the working directory of the remote R session putLocalFile Copy a file from the local machine to the working directory of the remote R session getRemoteObject Get an object from the remote R session putLocalObject Put an object from the local R session and load it into the remote R session getRemoteWorkspace Take all objects from the remote R session and load them into the local R session putLocalWorkspace Take all objects from the local R session and load them into the remote R session Remote Connection remoteLogin Remote login to the R Server with AD or admin credentials remoteLoginAAD Remote login to R Server server using Azure AD remoteLogout Logout of the remote session on the DeployR Server. Remote Execution remoteExecute Remote execution of either R code or an R script remoteScript Wrapper function for remote script execution diffLocalRemote Generate a 'diff' report between local and remote pause Pause remote connection and back to local resume Return the user to the 'REMOTE >' command prompt