SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Machine Learning
for Big Data in SQL Server
Łukasz Grala
lukasz@tidk.pl | lukasz.grala@cs.put.poznan.pl
Łukasz Grala
• Senior architekt rozwiązań Platformy Danych & Business Intelligence & Zaawansowanej Analityki w TIDK
• Twórca „Data Scientist as as Service”
• Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach
• Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów
• Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP
• Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych, uczenie maszynowe)
• Prelegent na licznych konferencjach w kraju i na świecie
• Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…)
• Członek zarządu Polskiego Towarzystwa Informatycznego Oddział Wielkopolski
• Członek i lider Polish SQL Server User Group (PLSSUG)
• Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu
email lukasz@tidk.pl - lukasz.grala@cs.put.poznan.pl blog: grala.it
Gartner – Magic Quadrant
Business Intelligence and Analytics PlatformsData Management Solutions for Analytics
Gartner
OLTP Databases
[
{
"Number":"SO43659",
"Date":"2011-05-31T00:00:00"
"AccountNumber":"AW29825",
"Price":59.99,
"Quantity":1
},
{
"Number":"SO43661",
"Date":"2011-06-01T00:00:00“
"AccountNumber":"AW73565“,
"Price":24.99,
"Quantity":3
}
]
Number Date Customer Price Quantity
SO43659 2011-05-31T00:00:00 AW29825 59.99 1
SO43661 2011-06-01T00:00:00 AW73565 24.99 3
SELECT * FROM myTable
FOR JSON AUTO
SELECT * FROM
OPENJSON(@json)
Data exchange with JSON
CREATE TABLE SalesOrderRecord (
Id int PRIMARY KEY IDENTITY,
OrderNumber NVARCHAR(25) NOT NULL,
OrderDate DATETIME NOT NULL,
JSalesOrderDetails NVARCHAR(4000)
CONSTRAINT SalesOrderDetails_IS_JSON
CHECK ( ISJSON(JSalesOrderDetails)>0 ),
Quantity AS
CAST(JSON_VALUE(JSalesOrderDetails, '$.Order.Qty') AS int)
)
GO
CREATE INDEX idxJson
ON SalesOrderRecord(Quantity)
INCLUDE (Price);
JSON and relational
JSON is plain text
ISJSON guarantees consistency
Optimize further with
computed column and INDEX
SQL Server and Azure DocumentDB
Schema-free NoSQL
document store
Scalable transactional processing
for rapidly changing apps
Premium relational
DB capable to
exchange data with
modern apps & services
Derives unified insights from
structured/unstructured data
JSON
JS
JS
JSON
Query relational
and non-relational
data, on-premises
and in Azure
Apps
T-SQL query
SQL Server Hadoop
PolyBase
Query relational and non-relational data with T-SQL
Benchmark CNTK
https://github.com/Alexey-Kamenev/Benchmarks
https://github.com/Microsoft/CNTK
Use Microsoft R Open with…
Microsoft R Server Big-data analytics and distributed computing on Linux, Hadoop and Teradata
SQL Server 2016 | 2017 Big-data analytics integrated with SQL Server database
Azure SQL Database Database as a Service - cloud
Azure Data Lake Big-data analytics Solutions - cloud
PowerBI Computations and charts from R scripts in dashboards
Azure ML Studio R Scripts in cloud-based Experiment workflows
Visual Studio R Tools for Visual Studio: integrated development environment for R
HDInsights R integrated with cloud-based Hadoop clusters
Example Solutions
Fraud detection
Salesforecasting
Warehouse efficiency
Predictivemaintenance
Extensibility
Microsoft Azure
Machine Learning Marketplace
New R scripts
010010
100100
010101
010010
100100
010101
010010
100100
010101
010010
100100
010101
Built-in advanced analytics
In-database analytics
Built-in to SQL Server
Analytic Library
T-SQL Interface
Relational Data
010010
100100
010101
010010
100100
010101
Data Scientist
Interact directly
with data
Data Developer/DBA
Manage data and
analytics together
?
R
RIntegration
Scalable in-database analytics
Data Scientist
Interacts directly with data
Creates models
and experiments
Data Analyst/DBA
Manages data and
analytics together
Example Solutions
• Fraud detection
• Sales forecasting
• Warehouse efficiency
• Predictive maintenance
010010
100100
010101
Relational Data
Extensibility
?
R
R Integration
Analytic Library
Open Source R
Revolution PEMA
T-SQL Interface
How is it Integrated?
• T-SQL calls a Stored Procedure
• Script is run in SQL through
extensibility model
• Result sets sent through Web API
to database or applications
Benefits
• Faster deployment of ML models
• Less data movement, faster
insights
• Work with large datasets: mitigate
R memory and scalability
limitations
Microsoft R Server
• Microsoft R Server for Redhat Linux
• Microsoft R Server for SUSE Linux
• Microsoft R Server for Teradata DB
• Microsoft R Server for Hadoop on Redhat
Microsoft R Server
CRAN Time Machine
Easy to use: add 2 lines to the top of each script
library(checkpoint)
checkpoint("2014-09-17")
For the package author:
Use package versions available on the chosen date
Installs packages local to this project
Allows different package versions to be used simultaneously
For a script collaborator:
Automatically installs required packages
Detects required packages (no need to manually install!)
Uses same package versions as script author to ensure reproducibility
Using checkpoint
16
ScaleR – Parallel + “Big Data”
Stream data in to RAM in blocks. “Big Data” can be any data size. We
handle Megabytes to Gigabytes to Terabytes…
Our ScaleR algorithms work
inside multiple cores / nodes in
parallel at high speed
Interim results are collected and
combined analytically to produce
the output on the entire data setXDF file format is optimised to work with the ScaleR library and
significantly speeds up iterative algorithm processing.
DistributedR - In-Hadoop
• Uses Hadoop nodes for R
computations
• Eliminate data movement
latency on very large data
• Remove data duplication
• Faster model development
• No MapReduce R coding
• Develop better models using
all the data
= Microsoft R Server
MRS and Hadoop Architecture options
R R R R R
R R R R R
ScaleR Production
RStudio Server Pro
Microsoft R Server
1. Copy
2. Stream
3. Send
Parallelized Algorithms
• Data import – Delimited, Fixed, SAS, SPSS, OBDC
• Variable creation & transformation
• Recode variables
• Factor variables
• Missing value handling
• Sort, Merge, Split
• Aggregate by category (means, sums)
• Min / Max, Mean, Median (approx.)
• Quantiles (approx.)
• Standard Deviation
• Variance
• Correlation
• Covariance
• Sum of Squares (cross product matrix for set variables)
• Pairwise Cross tabs
• Risk Ratio & Odds Ratio
• Cross-Tabulation of Data (standard tables & long form)
• Marginal Summaries of Cross Tabulations
• Chi Square Test
• Kendall Rank Correlation
• Fisher’s Exact Test
• Student’s t-Test
• Subsample (observations & variables)
• Random Sampling
Data Step Statistical Tests
Sampling
Descriptive Statistics
• Sum of Squares (cross product matrix for set variables)
• Multiple Linear Regression
• Generalized Linear Models (GLM) exponential family
distributions: binomial, Gaussian, inverse Gaussian,
Poisson, Tweedie. Standard link functions: cauchit,
identity, log, logit, probit. User defined distributions & link
functions.
• Covariance & Correlation Matrices
• Logistic Regression
• Classification & Regression Trees
• Predictions/scoring for models
• Residuals for all models
Predictive Models
• K-Means
• Decision Trees
• Decision Forests
• Stochastic Gradient Boosted Decision Trees
Cluster Analysis
Classification
Simulation
Variable Selection
• Stepwise Regression Linear, Logistic
and GLM
• Monte Carlo
• Parallel Random Number Generation
Combination
• Using Revolution rxDataStep and rxExec functions
to combine open source R with Revolution R
• PEMA API
Learners
Algorithms Strengths
rxFastLinear Fast, accurate linear learner with auto L1 & L2
rxLogisticRegression Logistic Regression with L1 & L2
rxFastTree
Boosted Decision tree from Bing. Competitive wth
XGBoost. Most accurate learner for most cases
rxFastForest Random Forest
rxNeuralNet GPU accelereted Net# DNNs with Convolutions
rxOneClassSvm Anomaly or unbalanced binary classification
Learners - Scalability
• Streaming (not RAM bound)
• Billions of features
• Multi-proc
• GPU acceleration for DNNs
• Distributed on Hadoop/Spark via Ensambling
Text Analytics Scenarios
Text Classification
• Email or support ticket routing
• Customer call triage
• Detect illegal trading activity
Sentiment Analysis
• Social media monitoring
• Call center support
Sentiment Analysis
• Pre-trained model
• Cognitive Service Parity
• Uses DNN Embedding
• Domain Adaptation
Image Featurization
• Image similarity search:
• Product Catalog search
• Classification
• Plankton Monitoring
• Galaxy Classification
• Retina Pathology detection
• Anomaly Detection
• Defects detection in manufacturing
Image Featurization
Convolutional DNNs with GPU
Pre-trained Models
• ResNet18
• ResNet 50
• ResNet 101
• AlexNet
Machine Learning Server in SQL Server 2017
•R Server 9.2
•Python (Anaconda)
Question?
lukasz@tidk.pl lukasz.grala@cs.put.poznan.pl
tidk.pl DSaaS.co facebook.com/TIDKpl grala.it

Weitere ähnliche Inhalte

Was ist angesagt?

Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueSql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueLuis Beltran
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueDatabricks
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopDataWorks Summit
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceWei Di
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowDatabricks
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
A High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta LakeA High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta LakeDatabricks
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Databricks
 
Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speedmarkgrover
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 

Was ist angesagt? (20)

Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueSql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
A High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta LakeA High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta Lake
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
 
Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speed
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 

Ähnlich wie DataMass Summit - Machine Learning for Big Data in SQL Server

20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
eRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability ReRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability RŁukasz Grala
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQLMSDEVMTL
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkAlex Zeltov
 
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerThe Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerSarah Dutkiewicz
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Andy Lathrop
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftTravis Wright
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 

Ähnlich wie DataMass Summit - Machine Learning for Big Data in SQL Server (20)

20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
eRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability ReRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability R
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
 
Ml2
Ml2Ml2
Ml2
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerThe Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 

Mehr von Łukasz Grala

Microsoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package RMicrosoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package RŁukasz Grala
 
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...Łukasz Grala
 
AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?Łukasz Grala
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
 
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...Łukasz Grala
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and AnalyticsŁukasz Grala
 
20160405 Cloud Community Poznań - Cloud Analytics on Azure
20160405  Cloud Community Poznań - Cloud Analytics on Azure20160405  Cloud Community Poznań - Cloud Analytics on Azure
20160405 Cloud Community Poznań - Cloud Analytics on AzureŁukasz Grala
 
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine LearningŁukasz Grala
 
20160316 techstolica - cloudstorage -tidk
20160316  techstolica - cloudstorage -tidk20160316  techstolica - cloudstorage -tidk
20160316 techstolica - cloudstorage -tidkŁukasz Grala
 
20160316 techstolica - cloudanalytics -tidk
20160316  techstolica - cloudanalytics -tidk20160316  techstolica - cloudanalytics -tidk
20160316 techstolica - cloudanalytics -tidkŁukasz Grala
 
Prescriptive Analytics
Prescriptive AnalyticsPrescriptive Analytics
Prescriptive AnalyticsŁukasz Grala
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - PolybaseŁukasz Grala
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016Łukasz Grala
 
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...Łukasz Grala
 
Pre mts Sharepoint 2010 i SQL Server 2012
Pre mts   Sharepoint 2010 i SQL Server 2012Pre mts   Sharepoint 2010 i SQL Server 2012
Pre mts Sharepoint 2010 i SQL Server 2012Łukasz Grala
 
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz gralaŁukasz Grala
 
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz gralaŁukasz Grala
 
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas..."SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...Łukasz Grala
 
Reprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCamp
Reprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCampReprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCamp
Reprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCampŁukasz Grala
 

Mehr von Łukasz Grala (20)

Microsoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package RMicrosoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package R
 
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
 
AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
 
20160405 Cloud Community Poznań - Cloud Analytics on Azure
20160405  Cloud Community Poznań - Cloud Analytics on Azure20160405  Cloud Community Poznań - Cloud Analytics on Azure
20160405 Cloud Community Poznań - Cloud Analytics on Azure
 
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
 
20160316 techstolica - cloudstorage -tidk
20160316  techstolica - cloudstorage -tidk20160316  techstolica - cloudstorage -tidk
20160316 techstolica - cloudstorage -tidk
 
20160316 techstolica - cloudanalytics -tidk
20160316  techstolica - cloudanalytics -tidk20160316  techstolica - cloudanalytics -tidk
20160316 techstolica - cloudanalytics -tidk
 
Prescriptive Analytics
Prescriptive AnalyticsPrescriptive Analytics
Prescriptive Analytics
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
 
Pre mts Sharepoint 2010 i SQL Server 2012
Pre mts   Sharepoint 2010 i SQL Server 2012Pre mts   Sharepoint 2010 i SQL Server 2012
Pre mts Sharepoint 2010 i SQL Server 2012
 
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
 
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
 
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas..."SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...
 
Reprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCamp
Reprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCampReprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCamp
Reprezentacja hierarchii w SQL Server 2008/2008R2 - 2nd Silesian CodeCamp
 

Kürzlich hochgeladen

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 

Kürzlich hochgeladen (20)

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 

DataMass Summit - Machine Learning for Big Data in SQL Server

  • 1. Machine Learning for Big Data in SQL Server Łukasz Grala lukasz@tidk.pl | lukasz.grala@cs.put.poznan.pl
  • 2. Łukasz Grala • Senior architekt rozwiązań Platformy Danych & Business Intelligence & Zaawansowanej Analityki w TIDK • Twórca „Data Scientist as as Service” • Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach • Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów • Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP • Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych, uczenie maszynowe) • Prelegent na licznych konferencjach w kraju i na świecie • Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…) • Członek zarządu Polskiego Towarzystwa Informatycznego Oddział Wielkopolski • Członek i lider Polish SQL Server User Group (PLSSUG) • Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu email lukasz@tidk.pl - lukasz.grala@cs.put.poznan.pl blog: grala.it
  • 3. Gartner – Magic Quadrant Business Intelligence and Analytics PlatformsData Management Solutions for Analytics
  • 5. [ { "Number":"SO43659", "Date":"2011-05-31T00:00:00" "AccountNumber":"AW29825", "Price":59.99, "Quantity":1 }, { "Number":"SO43661", "Date":"2011-06-01T00:00:00“ "AccountNumber":"AW73565“, "Price":24.99, "Quantity":3 } ] Number Date Customer Price Quantity SO43659 2011-05-31T00:00:00 AW29825 59.99 1 SO43661 2011-06-01T00:00:00 AW73565 24.99 3 SELECT * FROM myTable FOR JSON AUTO SELECT * FROM OPENJSON(@json) Data exchange with JSON
  • 6. CREATE TABLE SalesOrderRecord ( Id int PRIMARY KEY IDENTITY, OrderNumber NVARCHAR(25) NOT NULL, OrderDate DATETIME NOT NULL, JSalesOrderDetails NVARCHAR(4000) CONSTRAINT SalesOrderDetails_IS_JSON CHECK ( ISJSON(JSalesOrderDetails)>0 ), Quantity AS CAST(JSON_VALUE(JSalesOrderDetails, '$.Order.Qty') AS int) ) GO CREATE INDEX idxJson ON SalesOrderRecord(Quantity) INCLUDE (Price); JSON and relational JSON is plain text ISJSON guarantees consistency Optimize further with computed column and INDEX
  • 7. SQL Server and Azure DocumentDB Schema-free NoSQL document store Scalable transactional processing for rapidly changing apps Premium relational DB capable to exchange data with modern apps & services Derives unified insights from structured/unstructured data JSON JS JS JSON
  • 8. Query relational and non-relational data, on-premises and in Azure Apps T-SQL query SQL Server Hadoop PolyBase Query relational and non-relational data with T-SQL
  • 9.
  • 11. Use Microsoft R Open with… Microsoft R Server Big-data analytics and distributed computing on Linux, Hadoop and Teradata SQL Server 2016 | 2017 Big-data analytics integrated with SQL Server database Azure SQL Database Database as a Service - cloud Azure Data Lake Big-data analytics Solutions - cloud PowerBI Computations and charts from R scripts in dashboards Azure ML Studio R Scripts in cloud-based Experiment workflows Visual Studio R Tools for Visual Studio: integrated development environment for R HDInsights R integrated with cloud-based Hadoop clusters
  • 12. Example Solutions Fraud detection Salesforecasting Warehouse efficiency Predictivemaintenance Extensibility Microsoft Azure Machine Learning Marketplace New R scripts 010010 100100 010101 010010 100100 010101 010010 100100 010101 010010 100100 010101 Built-in advanced analytics In-database analytics Built-in to SQL Server Analytic Library T-SQL Interface Relational Data 010010 100100 010101 010010 100100 010101 Data Scientist Interact directly with data Data Developer/DBA Manage data and analytics together ? R RIntegration
  • 13. Scalable in-database analytics Data Scientist Interacts directly with data Creates models and experiments Data Analyst/DBA Manages data and analytics together Example Solutions • Fraud detection • Sales forecasting • Warehouse efficiency • Predictive maintenance 010010 100100 010101 Relational Data Extensibility ? R R Integration Analytic Library Open Source R Revolution PEMA T-SQL Interface How is it Integrated? • T-SQL calls a Stored Procedure • Script is run in SQL through extensibility model • Result sets sent through Web API to database or applications Benefits • Faster deployment of ML models • Less data movement, faster insights • Work with large datasets: mitigate R memory and scalability limitations
  • 14. Microsoft R Server • Microsoft R Server for Redhat Linux • Microsoft R Server for SUSE Linux • Microsoft R Server for Teradata DB • Microsoft R Server for Hadoop on Redhat Microsoft R Server
  • 16. Easy to use: add 2 lines to the top of each script library(checkpoint) checkpoint("2014-09-17") For the package author: Use package versions available on the chosen date Installs packages local to this project Allows different package versions to be used simultaneously For a script collaborator: Automatically installs required packages Detects required packages (no need to manually install!) Uses same package versions as script author to ensure reproducibility Using checkpoint 16
  • 17. ScaleR – Parallel + “Big Data” Stream data in to RAM in blocks. “Big Data” can be any data size. We handle Megabytes to Gigabytes to Terabytes… Our ScaleR algorithms work inside multiple cores / nodes in parallel at high speed Interim results are collected and combined analytically to produce the output on the entire data setXDF file format is optimised to work with the ScaleR library and significantly speeds up iterative algorithm processing.
  • 18. DistributedR - In-Hadoop • Uses Hadoop nodes for R computations • Eliminate data movement latency on very large data • Remove data duplication • Faster model development • No MapReduce R coding • Develop better models using all the data = Microsoft R Server
  • 19. MRS and Hadoop Architecture options R R R R R R R R R R ScaleR Production RStudio Server Pro Microsoft R Server 1. Copy 2. Stream 3. Send
  • 20. Parallelized Algorithms • Data import – Delimited, Fixed, SAS, SPSS, OBDC • Variable creation & transformation • Recode variables • Factor variables • Missing value handling • Sort, Merge, Split • Aggregate by category (means, sums) • Min / Max, Mean, Median (approx.) • Quantiles (approx.) • Standard Deviation • Variance • Correlation • Covariance • Sum of Squares (cross product matrix for set variables) • Pairwise Cross tabs • Risk Ratio & Odds Ratio • Cross-Tabulation of Data (standard tables & long form) • Marginal Summaries of Cross Tabulations • Chi Square Test • Kendall Rank Correlation • Fisher’s Exact Test • Student’s t-Test • Subsample (observations & variables) • Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics • Sum of Squares (cross product matrix for set variables) • Multiple Linear Regression • Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. • Covariance & Correlation Matrices • Logistic Regression • Classification & Regression Trees • Predictions/scoring for models • Residuals for all models Predictive Models • K-Means • Decision Trees • Decision Forests • Stochastic Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection • Stepwise Regression Linear, Logistic and GLM • Monte Carlo • Parallel Random Number Generation Combination • Using Revolution rxDataStep and rxExec functions to combine open source R with Revolution R • PEMA API
  • 21. Learners Algorithms Strengths rxFastLinear Fast, accurate linear learner with auto L1 & L2 rxLogisticRegression Logistic Regression with L1 & L2 rxFastTree Boosted Decision tree from Bing. Competitive wth XGBoost. Most accurate learner for most cases rxFastForest Random Forest rxNeuralNet GPU accelereted Net# DNNs with Convolutions rxOneClassSvm Anomaly or unbalanced binary classification
  • 22. Learners - Scalability • Streaming (not RAM bound) • Billions of features • Multi-proc • GPU acceleration for DNNs • Distributed on Hadoop/Spark via Ensambling
  • 23. Text Analytics Scenarios Text Classification • Email or support ticket routing • Customer call triage • Detect illegal trading activity Sentiment Analysis • Social media monitoring • Call center support
  • 24. Sentiment Analysis • Pre-trained model • Cognitive Service Parity • Uses DNN Embedding • Domain Adaptation
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Image Featurization • Image similarity search: • Product Catalog search • Classification • Plankton Monitoring • Galaxy Classification • Retina Pathology detection • Anomaly Detection • Defects detection in manufacturing
  • 30. Image Featurization Convolutional DNNs with GPU Pre-trained Models • ResNet18 • ResNet 50 • ResNet 101 • AlexNet
  • 31. Machine Learning Server in SQL Server 2017 •R Server 9.2 •Python (Anaconda)

Hinweis der Redaktion

  1. Source: https://msdn.microsoft.com/en-us/library/dn921882(v=sql.130).aspx Format query results as JSON by adding the FOR JSON clause to a SELECT statement. Use the FOR JSON clause to delegate the formatting of JSON output from your client applications to SQL Server. For more info, see Use JSON output in SQL Server and in client apps (SQL Server).
  2. Source: https://msdn.microsoft.com/en-us/library/dn921882(v=sql.130).aspx
  3. See TechNet Virtual Lab “Exploring SQL Server 2016 support for JSON data” - http://go.microsoft.com/?linkid=9898458
  4. When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.
  5. AI: ML (Uczenie Bayesowskie, Drzew decyzyjnych, Zbioru reguł, Sieci ekspertowe Sieci neuronowe Dowodzenie twierdzeń Podejmowanie decyzji przy braku pełnych danych Rozumowanie logiczne/racjonalne Logika rozmyta Algorytmy ewolucyjne