SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
IBM SparkTechnology Center
Apache SystemML
Declarative Machine Learning
Luciano Resende
IBM | Spark Technology Center
BigDataDevelopersMeetup–Spain/Madrid–Nov2017
Spark Technology Center
lresende@apache.org@
http://lresende.blogspot.com/
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
Luciano Resende
Data Science Platform Architect – IBM – Spark Technology Center
Apache Member and also a SystemML committer and PMC member
Open Source Community Leadership
Spark Technology Center
Founding Partner 188+ Project Committers 77+ Projects
Key Open source steering committee memberships OSS Advisory Board
Open Source
4
IBM
Spark Technology Center
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark
community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications
— http://bigdatauniversity.com
Key statistics:
About 50 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now an Apache Incubator project.
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center
Spark Technology Center
Contributions 46,385 Spark LOC
863 Spark JIRAs
457 SystemML JIRAs
67 Speakers at Events
Spark Technology Center
Focus on meaningful code contributions across
all major Spark projects
863 code contributions (JIRAs) and counting –
Check out http://jiras.spark.tc
Over 422 commits in Spark 2.0 , and
continuing major contributions in 2.x
Contributions by the Spark Technology Center
across almost all components of Spark
— Spark Core, SparkR, SQL, MLlib,
Streaming, PySpark, build and infrastructure,
etc
STC impact on community
Spark Technology Center
Machine Learning
Spark MLLib
R4ML
Online Retraining
Apache Arrow
SystemML
Deep Learning
Consumability
Reference architectures
Spark Notebook stack
Spark Resource optimization
Spark Web UI
Apache Bahir
RedRock
Immersive Insights
SQL
TPC-DS and Performance
Query Pushdown/Federation
Project Focus Areas
6
Spark Technology Center
7
Origins of the SystemML Project
2007-2008: Multiple projects at IBM Research – Almaden involving machine
learning on Hadoop.
2009: We create a dedicated team for scalable ML.
2009-2010: Through engagements with customers, we observe how data scientists
create machine learning algorithms.
State-of-the-Art: Small Data
R or
Python
Data
Scientist
Personal
Computer
Data
Results
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
😞 Days or weeks per
iteration
😞 Errors while translating
algorithms
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
SystemML
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
SystemML
😃 Fast iteration
😃 Same answer
14
Linear Algebra
is the Language of Machine Learning.
Linear algebra is
powerful,
precise,
and high-level.
Express complex transformations over
large arrays of data…
…using a small number of instructions.
…in a clear and unambiguous way
SystemML Provides
Highly Optimized
Distributed Linear
Algebra
Running Example:
Alternating Least Squares
Problem:
Movie Recommendations
Movies
Users
i
j
User i liked movie
j.
Movies Factor
UsersFactor
Multiply these two
factors to produce a
less-sparse matrix.
×
New nonzero values
become movies
suggestions.
Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
Alternating Least Squares (in R)
1. Start with random factors.
2. Hold the Movies factor constant and
find the best value for the Users factor.
(Value that most closely approximates the original matrix)
3. Hold the Users factor constant and find
the best value for the Movies factor.
4. Repeat steps 2-3 until convergence.
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
1
2
2
3
3
4
4
4
Every line has a clear purpose!
Alternating Least Squares (spark.ml)
19
Alternating Least Squares (spark.ml)
20
Alternating Least Squares (spark.ml)
21
Alternating Least Squares (spark.ml)
22
Alternating Least Squares (spark.ml)
25 lines’ worth of algorithm…
…mixed with 800 lines of performance code
Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
Alternating Least Squares (in R)
SystemML can compile and run this algorithm at scale
No additional performance code needed!
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
(in SystemML’s
subset of R)
How fast does it run?
Running time comparisons between machine learning algorithms are problematic
Different, equally-valid answers
Different convergence rates on different data
But we’ll do one anyway
Spark Technology CenterPerformance Comparison: ALS
0
5000
10000
15000
20000
1.2GB (sparse
binary)
12GB 120GB
RunningTime(sec)
R
MLLib
SystemML
>24h>24h
OOM
OOM
Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data,
sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations
tuned so that all algorithms produce comparable result quality.Details:
SystemML runs the R script in parallel
Same answer as original R script
Performance is comparable to a low-level RDD-
based implementation
Also, for python lovers, equivalent python DML
exists!
How does SystemML achieve this result?
Takeaway Points
The SystemML Optimizer and Runtime for Spark
Automates critical performance
decisions
Distributed or local computation?
How to partition the data?
To persist or not to persist?
Distributed vs local: Hybrid runtime
Multithreaded computation in Spark
Driver
Distributed computation in Spark
Executors
Optimizer makes a cost-based choice
28
High-Level Operations (HOPs)
General representation of statements in the data
analysis language
Low-Level Operations (LOPs)
General representation of operations in the
runtime framework
High-level language
front-ends
Multiple execution
environments
Cost
Based
Optimizer
Many other rewrites
Cost-based selection of operators
Dynamic recompilation for accurate stats
Parallel FOR (ParFor) optimizer
Direct operations on RDD partitions
YARN and MapReduce support
New in Next Release: Compressed Linear
Algebra
29
But wait, there’s
more!
Summary
Cost-based compilation of machine learning algorithms generates execution plans
for single-node in-memory, cluster, and hybrid execution
for varying data characteristics:
varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data
for varying cluster characteristics (memory configurations, degree of parallelism)
Out-of-the-box, scalable machine learning algorithms
e.g. descriptive statistics, regression, clustering, and classification
"Roll-your-own" algorithms
Enable programmer productivity (no worry about scalability, numeric stability, and optimizations)
Fast turn-around for new algorithms
Higher-level language shields algorithm development investment from platform
progression
Yarn for resource negotiation and elasticity
Spark for in-memory, iterative processing
Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the system
improves.
31
32
Algorithms
Category Description
Descriptive Statistics
Univariate
Bivariate
Stratified Bivariate
Classification
Logistic Regression (multinomial)
Multi-Class SVM
Naïve Bayes (multinomial)
Decision Trees
Random Forest
Clustering k-Means
Regression
Linear Regression system of equations
CG (conjugate gradient)
Generalized Linear
Models (GLM)
Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial, Bernoulli
Links for all distributions: identity, log, sq. root, inverse, 1/μ2
Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit
Stepwise
Linear
GLM
Dimension Reduction PCA
Matrix Factorization ALS
direct solve
CG (conjugate gradient descent)
Survival Models
Kaplan Meier Estimate
Cox Proportional Hazard Regression
Predict Algorithm-specific scoring
Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation
PMML models lm, kmeans, svm, glm, mlogit
Spark Technology Center
33
What’s new in
Apache SystemML
Expressing Algorithms with SystemML
Gaussian Nonnegative Matrix Factorization
in DML (SystemML’s R-like syntax)
while (i < max_iteration) {
H <- H * ((t(W) %*% V) /
(((t(W) %*% W) %*% H)+Eps))
W <- W * ((V %*% t(H)) /
((W %*% (H %*% t(H)))+Eps))
i <- i + 1
}
Gaussian Nonnegative Matrix Factorization
in PyDML (SystemML’s Python-like syntax)
while (i < max_iteration):
H = H * (dot(W.transpose(), V) /
(dot(dot(W.transpose(), W, H)
+ Eps))
W = W * (dot(V, H.transpose()) /
(dot(W, dot(H,H.transpose()))
+ Eps))
i = i + 1
34
SystemML users write machine learning algorithms in a domain specific language.
SystemML has APIs for embedding these algorithms in Python, Scala, or Java Spark applications
The R4ML project provides similar functionality for SparkR.
Scikit-Learn
Compatibility: The
MLLearn API
Python API designed to be compatible with scikit-
learn and Spark MLPipelines
Algorithms that are currently part of mllearn API:
•LogisticRegression, LinearRegression, SVM, NaiveBayes
and Caffe2DML (discussed later)
Hyperparameter naming/initialization similar to
scikit-learn (penalty, fit_intercept,
normalize, …) to reduce learning curve
Supports loading and saving the model
Linear Regression Example
From http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
Python script using sklearn
Changes required to run on SystemML
Integration with Apache Spark’s ML Pipelines
Changes required to run on SystemML
From https://spark.apache.org/docs/latest/ml-pipeline.html
38
caffe2dml
(experimental)
caffe2dml is a tool that converts the
specification for a Caffe deep learning model
into a SystemML script to perform training or
scoring at scale.
The generated scripts produce TensorBoard-
compatible log output.
Caffe2DML
Caffe
Network
File
Caffe
Solver
File
Log
Generated DML
Script
Apache
SystemML
Example: Training Lenet with Caffe2DML
SystemML Deep
Learning `nn`
Library
• Deep learning library written in DML.
• Multiple layers:
• Core: Affine, 2D Conv, 2D Transpose Conv, 2D
Max Pooling, 1D/2D Batch Norm, RNN, LSTM
• Nonlinearity/Transfer: ReLU, Sigmoid, Tanh,
Softmax
• Regularization: Dropout, L1, L2
• Loss: Log-loss, Cross-entropy, L1, L2
• Multiple optimizers:
• SGD, SGD w/ momentum, SGD w/ Nesterov
momentum, Adagrad, RMSprop, Adam
• Layers have a simple `forward` & `backward` API.
• Optimizers have a simple `update` API.
https://github.com/apache/systemml/tree/master/scripts/nn
(LeNet-like convnet)
41
GPU Support in
SystemML Spark Technology Center
Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the
system improves.
9
42
GPU Support in
SystemML
SystemML’s optimizer can target multiple runtime back
ends:
Single-node SMP
Multi-node Spark
Hybrid: Large SMP plus a pool of Spark workers
We are adding new GPU-accelerated runtimes to SystemML
Single-node single GPU
Single-node multi-GPU
Distributed multi-GPU on Spark
GPU-accelerate an algorithm without changing its code
43
GPU Support in
SystemML:
Current Status
(In Progress) Single Node, Single GPU Support
• Deep Neural Network Operators
conv2d, conv2d_backward_data, conv2d_backward_filter, bias_add, bias_multiply,
max_pooling, max_pooling_backward, relu_max_pooling,
relu_max_pooling_backward
• Unary Aggregates
{All/Row/Col}-Sum, Mean, Variance, Min, Max & All-Product
• Matrix Multiplication
Various shapes & sparsities
• Transpose
• Matrix-Matrix and Matrix-Scalar Element-Wise
+, -, *, /, ^
• Trigonometric & Mathematical Operations (on entire Matrices)
sin, cos, tan, asin, acos, atan, log, sqrt, abs, floor, round, ceil, solve
• Some Fused/Special Case Operators
Ax+y, X*t(X), Max(X, 0.0)
• (In Progress) Automatically determine whether to use the GPU or not
(In Progress) - Single Node, Multiple GPU Support
(Planned) - Multiple Node, Multiple GPU Support
44
Summary:
Cool New Stuff in
Apache
SystemML
Top-level Apache project
API improvements
Deep learning
Code generation
Compressed linear algebra
45
SystemML 1.0 Apache SystemML 1.0
RC1 scheduled for December 2017
Spark Technology Center
46
Apache SystemML
Tutorial
Tutorial hosted at IBM developerWorks Code
Patterns
https://developer.ibm.com/code/patterns/perform-a-machine-learning-
exercise/
Tutorial source code available on GitHub
https://github.com/IBM/SystemML_Usage?cm_sp=Developer-_-
perform-a-machine-learning-exercise-_-Get-the-Code
Try this on DSX/IBM Cloud
https://ibm.biz/BdjJJG
47
SystemML
Tutorial
Spark Technology Center
48
Apache SystemML
References
For
More
Information…
Try Apache SystemML!
http://systemml.apache.org
Read our VLDB 2016 paper on compressed linear algebra:
Best Paper award!
Ahmed Elgohary et al, “Compressed Linear Algebra for Large-
Scale Machine Learning.” VLDB 2016
Read our CIDR 2017 paper on codegen:
Tarek Elgamal et al, “SPOOF: Sum-Product Optimization and
Operator Fusion for Large-Scale Machine Learning,” CIDR
2017
Get the slides for our Strata 2016 talk on deep learning with
SystemML:
Leveraging deep learning to predict breast cancer proliferation
scores with Apache Spark and Apache SystemML49
SystemML
http://systemml.apache.org
SystemML source code (Github)
https://github.com/apache/systemml
DML (R) Language Reference
https://apache.github.io/systemml/dml-language-reference.html
Algorithms Reference
http://systemml.apache.org/algorithms
Runtime Reference
https://apache.github.io/systemml/#running-systemml
50
References
Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif

Weitere ähnliche Inhalte

Was ist angesagt?

Algorithem complexity in data sructure
Algorithem complexity in data sructureAlgorithem complexity in data sructure
Algorithem complexity in data sructureKumar
 
Algorithm chapter 6
Algorithm chapter 6Algorithm chapter 6
Algorithm chapter 6chidabdu
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
A calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesA calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesPolytechnique Montréal
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checkerinfopapers
 
Dsp manual completed2
Dsp manual completed2Dsp manual completed2
Dsp manual completed2bilawalali74
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Shiang-Yun Yang
 
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor DumitrescuThinkOpen
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSGayathri Gaayu
 
Big oh Representation Used in Time complexities
Big oh Representation Used in Time complexitiesBig oh Representation Used in Time complexities
Big oh Representation Used in Time complexitiesLAKSHMITHARUN PONNAM
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesQuoc-Sang Phan
 

Was ist angesagt? (16)

Algorithem complexity in data sructure
Algorithem complexity in data sructureAlgorithem complexity in data sructure
Algorithem complexity in data sructure
 
Algorithm chapter 6
Algorithm chapter 6Algorithm chapter 6
Algorithm chapter 6
 
Lec2
Lec2Lec2
Lec2
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
Sequential Pattern Mining and GSP
Sequential Pattern Mining and GSPSequential Pattern Mining and GSP
Sequential Pattern Mining and GSP
 
A calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesA calculus of mobile Real-Time processes
A calculus of mobile Real-Time processes
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
Lec3
Lec3Lec3
Lec3
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checker
 
Dsp manual completed2
Dsp manual completed2Dsp manual completed2
Dsp manual completed2
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
 
Dsp manual
Dsp manualDsp manual
Dsp manual
 
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
 
Big oh Representation Used in Time complexities
Big oh Representation Used in Time complexitiesBig oh Representation Used in Time complexities
Big oh Representation Used in Time complexities
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 

Ähnlich wie What's new in Apache SystemML - Declarative Machine Learning

SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptxranapoonam1
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxrajalakshmi5921
 
20180420 hk-the powerofmysql8
20180420 hk-the powerofmysql820180420 hk-the powerofmysql8
20180420 hk-the powerofmysql8Ivan Ma
 
Python Programming - IX. On Randomness
Python Programming - IX. On RandomnessPython Programming - IX. On Randomness
Python Programming - IX. On RandomnessRanel Padon
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Languagevsssuresh
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
PHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPatrick Allaert
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler DevelopmentLogan Chien
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
StacksqueueslistsFraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsYoung Alista
 

Ähnlich wie What's new in Apache SystemML - Declarative Machine Learning (20)

SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Inside Apache SystemML
Inside Apache SystemMLInside Apache SystemML
Inside Apache SystemML
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
 
20180420 hk-the powerofmysql8
20180420 hk-the powerofmysql820180420 hk-the powerofmysql8
20180420 hk-the powerofmysql8
 
Python Programming - IX. On Randomness
Python Programming - IX. On RandomnessPython Programming - IX. On Randomness
Python Programming - IX. On Randomness
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
PHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & Pinba
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler Development
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 

Mehr von Luciano Resende

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsLuciano Resende
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksLuciano Resende
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewLuciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsLuciano Resende
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirLuciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceLuciano Resende
 

Mehr von Luciano Resende (20)

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

What's new in Apache SystemML - Declarative Machine Learning

  • 1. IBM SparkTechnology Center Apache SystemML Declarative Machine Learning Luciano Resende IBM | Spark Technology Center BigDataDevelopersMeetup–Spain/Madrid–Nov2017
  • 2. Spark Technology Center lresende@apache.org@ http://lresende.blogspot.com/ https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende Luciano Resende Data Science Platform Architect – IBM – Spark Technology Center Apache Member and also a SystemML committer and PMC member
  • 3. Open Source Community Leadership Spark Technology Center Founding Partner 188+ Project Committers 77+ Projects Key Open source steering committee memberships OSS Advisory Board Open Source
  • 4. 4 IBM Spark Technology Center Founded in 2015. Location: Physical: 505 Howard St., San Francisco CA Web: http://spark.tc Twitter: @apachespark_tc Mission: Contribute intellectual and technical capital to the Apache Spark community. Make the core technology enterprise- and cloud-ready. Build data science skills to drive intelligence into business applications — http://bigdatauniversity.com Key statistics: About 50 developers, co-located with 25 IBM designers. Major contributions to Apache Spark http://jiras.spark.tc Apache SystemML is now an Apache Incubator project. Founding member of UC Berkeley AMPLab and RISE Lab Member of R Consortium and Scala Center Spark Technology Center
  • 5. Contributions 46,385 Spark LOC 863 Spark JIRAs 457 SystemML JIRAs 67 Speakers at Events Spark Technology Center Focus on meaningful code contributions across all major Spark projects 863 code contributions (JIRAs) and counting – Check out http://jiras.spark.tc Over 422 commits in Spark 2.0 , and continuing major contributions in 2.x Contributions by the Spark Technology Center across almost all components of Spark — Spark Core, SparkR, SQL, MLlib, Streaming, PySpark, build and infrastructure, etc STC impact on community
  • 6. Spark Technology Center Machine Learning Spark MLLib R4ML Online Retraining Apache Arrow SystemML Deep Learning Consumability Reference architectures Spark Notebook stack Spark Resource optimization Spark Web UI Apache Bahir RedRock Immersive Insights SQL TPC-DS and Performance Query Pushdown/Federation Project Focus Areas 6
  • 8. Origins of the SystemML Project 2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop. 2009: We create a dedicated team for scalable ML. 2009-2010: Through engagements with customers, we observe how data scientists create machine learning algorithms.
  • 9. State-of-the-Art: Small Data R or Python Data Scientist Personal Computer Data Results
  • 10. State-of-the-Art: Big Data R or Python Data Scientist Results Systems Programmer Scala
  • 11. State-of-the-Art: Big Data R or Python Data Scientist Results Systems Programmer Scala 😞 Days or weeks per iteration 😞 Errors while translating algorithms
  • 12. State-of-the-Art: Big Data R or Python Data Scientist Results SystemML
  • 13. State-of-the-Art: Big Data R or Python Data Scientist Results SystemML 😃 Fast iteration 😃 Same answer
  • 14. 14 Linear Algebra is the Language of Machine Learning. Linear algebra is powerful, precise, and high-level. Express complex transformations over large arrays of data… …using a small number of instructions. …in a clear and unambiguous way SystemML Provides Highly Optimized Distributed Linear Algebra
  • 15. Running Example: Alternating Least Squares Problem: Movie Recommendations Movies Users i j User i liked movie j. Movies Factor UsersFactor Multiply these two factors to produce a less-sparse matrix. × New nonzero values become movies suggestions.
  • 16. Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; }
  • 17. Alternating Least Squares (in R) 1. Start with random factors. 2. Hold the Movies factor constant and find the best value for the Users factor. (Value that most closely approximates the original matrix) 3. Hold the Users factor constant and find the best value for the Movies factor. 4. Repeat steps 2-3 until convergence. U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } 1 2 2 3 3 4 4 4 Every line has a clear purpose!
  • 22. 22 Alternating Least Squares (spark.ml) 25 lines’ worth of algorithm… …mixed with 800 lines of performance code
  • 23. Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; }
  • 24. Alternating Least Squares (in R) SystemML can compile and run this algorithm at scale No additional performance code needed! U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } (in SystemML’s subset of R)
  • 25. How fast does it run? Running time comparisons between machine learning algorithms are problematic Different, equally-valid answers Different convergence rates on different data But we’ll do one anyway
  • 26. Spark Technology CenterPerformance Comparison: ALS 0 5000 10000 15000 20000 1.2GB (sparse binary) 12GB 120GB RunningTime(sec) R MLLib SystemML >24h>24h OOM OOM Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data, sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations tuned so that all algorithms produce comparable result quality.Details:
  • 27. SystemML runs the R script in parallel Same answer as original R script Performance is comparable to a low-level RDD- based implementation Also, for python lovers, equivalent python DML exists! How does SystemML achieve this result? Takeaway Points
  • 28. The SystemML Optimizer and Runtime for Spark Automates critical performance decisions Distributed or local computation? How to partition the data? To persist or not to persist? Distributed vs local: Hybrid runtime Multithreaded computation in Spark Driver Distributed computation in Spark Executors Optimizer makes a cost-based choice 28 High-Level Operations (HOPs) General representation of statements in the data analysis language Low-Level Operations (LOPs) General representation of operations in the runtime framework High-level language front-ends Multiple execution environments Cost Based Optimizer
  • 29. Many other rewrites Cost-based selection of operators Dynamic recompilation for accurate stats Parallel FOR (ParFor) optimizer Direct operations on RDD partitions YARN and MapReduce support New in Next Release: Compressed Linear Algebra 29 But wait, there’s more!
  • 30. Summary Cost-based compilation of machine learning algorithms generates execution plans for single-node in-memory, cluster, and hybrid execution for varying data characteristics: varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data for varying cluster characteristics (memory configurations, degree of parallelism) Out-of-the-box, scalable machine learning algorithms e.g. descriptive statistics, regression, clustering, and classification "Roll-your-own" algorithms Enable programmer productivity (no worry about scalability, numeric stability, and optimizations) Fast turn-around for new algorithms Higher-level language shields algorithm development investment from platform progression Yarn for resource negotiation and elasticity Spark for in-memory, iterative processing
  • 31. Benefits of the SystemML Approach Simplifies algorithm development. Makes experimentation easier. Your code gets faster as the system improves. 31
  • 32. 32 Algorithms Category Description Descriptive Statistics Univariate Bivariate Stratified Bivariate Classification Logistic Regression (multinomial) Multi-Class SVM Naïve Bayes (multinomial) Decision Trees Random Forest Clustering k-Means Regression Linear Regression system of equations CG (conjugate gradient) Generalized Linear Models (GLM) Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial, Bernoulli Links for all distributions: identity, log, sq. root, inverse, 1/μ2 Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit Stepwise Linear GLM Dimension Reduction PCA Matrix Factorization ALS direct solve CG (conjugate gradient descent) Survival Models Kaplan Meier Estimate Cox Proportional Hazard Regression Predict Algorithm-specific scoring Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation PMML models lm, kmeans, svm, glm, mlogit
  • 33. Spark Technology Center 33 What’s new in Apache SystemML
  • 34. Expressing Algorithms with SystemML Gaussian Nonnegative Matrix Factorization in DML (SystemML’s R-like syntax) while (i < max_iteration) { H <- H * ((t(W) %*% V) / (((t(W) %*% W) %*% H)+Eps)) W <- W * ((V %*% t(H)) / ((W %*% (H %*% t(H)))+Eps)) i <- i + 1 } Gaussian Nonnegative Matrix Factorization in PyDML (SystemML’s Python-like syntax) while (i < max_iteration): H = H * (dot(W.transpose(), V) / (dot(dot(W.transpose(), W, H) + Eps)) W = W * (dot(V, H.transpose()) / (dot(W, dot(H,H.transpose())) + Eps)) i = i + 1 34 SystemML users write machine learning algorithms in a domain specific language. SystemML has APIs for embedding these algorithms in Python, Scala, or Java Spark applications The R4ML project provides similar functionality for SparkR.
  • 35. Scikit-Learn Compatibility: The MLLearn API Python API designed to be compatible with scikit- learn and Spark MLPipelines Algorithms that are currently part of mllearn API: •LogisticRegression, LinearRegression, SVM, NaiveBayes and Caffe2DML (discussed later) Hyperparameter naming/initialization similar to scikit-learn (penalty, fit_intercept, normalize, …) to reduce learning curve Supports loading and saving the model
  • 36. Linear Regression Example From http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html Python script using sklearn Changes required to run on SystemML
  • 37. Integration with Apache Spark’s ML Pipelines Changes required to run on SystemML From https://spark.apache.org/docs/latest/ml-pipeline.html
  • 38. 38 caffe2dml (experimental) caffe2dml is a tool that converts the specification for a Caffe deep learning model into a SystemML script to perform training or scoring at scale. The generated scripts produce TensorBoard- compatible log output. Caffe2DML Caffe Network File Caffe Solver File Log Generated DML Script Apache SystemML
  • 39. Example: Training Lenet with Caffe2DML
  • 40. SystemML Deep Learning `nn` Library • Deep learning library written in DML. • Multiple layers: • Core: Affine, 2D Conv, 2D Transpose Conv, 2D Max Pooling, 1D/2D Batch Norm, RNN, LSTM • Nonlinearity/Transfer: ReLU, Sigmoid, Tanh, Softmax • Regularization: Dropout, L1, L2 • Loss: Log-loss, Cross-entropy, L1, L2 • Multiple optimizers: • SGD, SGD w/ momentum, SGD w/ Nesterov momentum, Adagrad, RMSprop, Adam • Layers have a simple `forward` & `backward` API. • Optimizers have a simple `update` API. https://github.com/apache/systemml/tree/master/scripts/nn (LeNet-like convnet)
  • 41. 41 GPU Support in SystemML Spark Technology Center Benefits of the SystemML Approach Simplifies algorithm development. Makes experimentation easier. Your code gets faster as the system improves. 9
  • 42. 42 GPU Support in SystemML SystemML’s optimizer can target multiple runtime back ends: Single-node SMP Multi-node Spark Hybrid: Large SMP plus a pool of Spark workers We are adding new GPU-accelerated runtimes to SystemML Single-node single GPU Single-node multi-GPU Distributed multi-GPU on Spark GPU-accelerate an algorithm without changing its code
  • 43. 43 GPU Support in SystemML: Current Status (In Progress) Single Node, Single GPU Support • Deep Neural Network Operators conv2d, conv2d_backward_data, conv2d_backward_filter, bias_add, bias_multiply, max_pooling, max_pooling_backward, relu_max_pooling, relu_max_pooling_backward • Unary Aggregates {All/Row/Col}-Sum, Mean, Variance, Min, Max & All-Product • Matrix Multiplication Various shapes & sparsities • Transpose • Matrix-Matrix and Matrix-Scalar Element-Wise +, -, *, /, ^ • Trigonometric & Mathematical Operations (on entire Matrices) sin, cos, tan, asin, acos, atan, log, sqrt, abs, floor, round, ceil, solve • Some Fused/Special Case Operators Ax+y, X*t(X), Max(X, 0.0) • (In Progress) Automatically determine whether to use the GPU or not (In Progress) - Single Node, Multiple GPU Support (Planned) - Multiple Node, Multiple GPU Support
  • 44. 44 Summary: Cool New Stuff in Apache SystemML Top-level Apache project API improvements Deep learning Code generation Compressed linear algebra
  • 45. 45 SystemML 1.0 Apache SystemML 1.0 RC1 scheduled for December 2017
  • 47. Tutorial hosted at IBM developerWorks Code Patterns https://developer.ibm.com/code/patterns/perform-a-machine-learning- exercise/ Tutorial source code available on GitHub https://github.com/IBM/SystemML_Usage?cm_sp=Developer-_- perform-a-machine-learning-exercise-_-Get-the-Code Try this on DSX/IBM Cloud https://ibm.biz/BdjJJG 47 SystemML Tutorial
  • 48. Spark Technology Center 48 Apache SystemML References
  • 49. For More Information… Try Apache SystemML! http://systemml.apache.org Read our VLDB 2016 paper on compressed linear algebra: Best Paper award! Ahmed Elgohary et al, “Compressed Linear Algebra for Large- Scale Machine Learning.” VLDB 2016 Read our CIDR 2017 paper on codegen: Tarek Elgamal et al, “SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning,” CIDR 2017 Get the slides for our Strata 2016 talk on deep learning with SystemML: Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML49
  • 50. SystemML http://systemml.apache.org SystemML source code (Github) https://github.com/apache/systemml DML (R) Language Reference https://apache.github.io/systemml/dml-language-reference.html Algorithms Reference http://systemml.apache.org/algorithms Runtime Reference https://apache.github.io/systemml/#running-systemml 50 References Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif