SystemML was designed with the main goal of lowering the complexity required to maintain and scale Machine Learning algorithms. It provides a declarative machine learning (DML) that simplify the specification of machine learning algorithms using an R-like and Python-like that significantly increases the productivity of data scientist as it provides flexibility on how the custom analytics are expressed and also provides data independence from the underlying input formats and physical custom analytics.
This presentation gives a quick introduction to Apache SystemML, provides an updated on the recent areas that are being developed by the project community, and go over a tutorial that enables one to quickly get up to speed in SystemML.
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
What's new in Apache SystemML - Declarative Machine Learning
1. IBM SparkTechnology Center
Apache SystemML
Declarative Machine Learning
Luciano Resende
IBM | Spark Technology Center
BigDataDevelopersMeetup–Spain/Madrid–Nov2017
3. Open Source Community Leadership
Spark Technology Center
Founding Partner 188+ Project Committers 77+ Projects
Key Open source steering committee memberships OSS Advisory Board
Open Source
4. 4
IBM
Spark Technology Center
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark
community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications
— http://bigdatauniversity.com
Key statistics:
About 50 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now an Apache Incubator project.
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center
Spark Technology Center
5. Contributions 46,385 Spark LOC
863 Spark JIRAs
457 SystemML JIRAs
67 Speakers at Events
Spark Technology Center
Focus on meaningful code contributions across
all major Spark projects
863 code contributions (JIRAs) and counting –
Check out http://jiras.spark.tc
Over 422 commits in Spark 2.0 , and
continuing major contributions in 2.x
Contributions by the Spark Technology Center
across almost all components of Spark
— Spark Core, SparkR, SQL, MLlib,
Streaming, PySpark, build and infrastructure,
etc
STC impact on community
6. Spark Technology Center
Machine Learning
Spark MLLib
R4ML
Online Retraining
Apache Arrow
SystemML
Deep Learning
Consumability
Reference architectures
Spark Notebook stack
Spark Resource optimization
Spark Web UI
Apache Bahir
RedRock
Immersive Insights
SQL
TPC-DS and Performance
Query Pushdown/Federation
Project Focus Areas
6
8. Origins of the SystemML Project
2007-2008: Multiple projects at IBM Research – Almaden involving machine
learning on Hadoop.
2009: We create a dedicated team for scalable ML.
2009-2010: Through engagements with customers, we observe how data scientists
create machine learning algorithms.
11. State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
😞 Days or weeks per
iteration
😞 Errors while translating
algorithms
14. 14
Linear Algebra
is the Language of Machine Learning.
Linear algebra is
powerful,
precise,
and high-level.
Express complex transformations over
large arrays of data…
…using a small number of instructions.
…in a clear and unambiguous way
SystemML Provides
Highly Optimized
Distributed Linear
Algebra
15. Running Example:
Alternating Least Squares
Problem:
Movie Recommendations
Movies
Users
i
j
User i liked movie
j.
Movies Factor
UsersFactor
Multiply these two
factors to produce a
less-sparse matrix.
×
New nonzero values
become movies
suggestions.
16. Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
17. Alternating Least Squares (in R)
1. Start with random factors.
2. Hold the Movies factor constant and
find the best value for the Users factor.
(Value that most closely approximates the original matrix)
3. Hold the Users factor constant and find
the best value for the Movies factor.
4. Repeat steps 2-3 until convergence.
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
1
2
2
3
3
4
4
4
Every line has a clear purpose!
23. Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
24. Alternating Least Squares (in R)
SystemML can compile and run this algorithm at scale
No additional performance code needed!
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
(in SystemML’s
subset of R)
25. How fast does it run?
Running time comparisons between machine learning algorithms are problematic
Different, equally-valid answers
Different convergence rates on different data
But we’ll do one anyway
26. Spark Technology CenterPerformance Comparison: ALS
0
5000
10000
15000
20000
1.2GB (sparse
binary)
12GB 120GB
RunningTime(sec)
R
MLLib
SystemML
>24h>24h
OOM
OOM
Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data,
sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations
tuned so that all algorithms produce comparable result quality.Details:
27. SystemML runs the R script in parallel
Same answer as original R script
Performance is comparable to a low-level RDD-
based implementation
Also, for python lovers, equivalent python DML
exists!
How does SystemML achieve this result?
Takeaway Points
28. The SystemML Optimizer and Runtime for Spark
Automates critical performance
decisions
Distributed or local computation?
How to partition the data?
To persist or not to persist?
Distributed vs local: Hybrid runtime
Multithreaded computation in Spark
Driver
Distributed computation in Spark
Executors
Optimizer makes a cost-based choice
28
High-Level Operations (HOPs)
General representation of statements in the data
analysis language
Low-Level Operations (LOPs)
General representation of operations in the
runtime framework
High-level language
front-ends
Multiple execution
environments
Cost
Based
Optimizer
29. Many other rewrites
Cost-based selection of operators
Dynamic recompilation for accurate stats
Parallel FOR (ParFor) optimizer
Direct operations on RDD partitions
YARN and MapReduce support
New in Next Release: Compressed Linear
Algebra
29
But wait, there’s
more!
30. Summary
Cost-based compilation of machine learning algorithms generates execution plans
for single-node in-memory, cluster, and hybrid execution
for varying data characteristics:
varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data
for varying cluster characteristics (memory configurations, degree of parallelism)
Out-of-the-box, scalable machine learning algorithms
e.g. descriptive statistics, regression, clustering, and classification
"Roll-your-own" algorithms
Enable programmer productivity (no worry about scalability, numeric stability, and optimizations)
Fast turn-around for new algorithms
Higher-level language shields algorithm development investment from platform
progression
Yarn for resource negotiation and elasticity
Spark for in-memory, iterative processing
34. Expressing Algorithms with SystemML
Gaussian Nonnegative Matrix Factorization
in DML (SystemML’s R-like syntax)
while (i < max_iteration) {
H <- H * ((t(W) %*% V) /
(((t(W) %*% W) %*% H)+Eps))
W <- W * ((V %*% t(H)) /
((W %*% (H %*% t(H)))+Eps))
i <- i + 1
}
Gaussian Nonnegative Matrix Factorization
in PyDML (SystemML’s Python-like syntax)
while (i < max_iteration):
H = H * (dot(W.transpose(), V) /
(dot(dot(W.transpose(), W, H)
+ Eps))
W = W * (dot(V, H.transpose()) /
(dot(W, dot(H,H.transpose()))
+ Eps))
i = i + 1
34
SystemML users write machine learning algorithms in a domain specific language.
SystemML has APIs for embedding these algorithms in Python, Scala, or Java Spark applications
The R4ML project provides similar functionality for SparkR.
35. Scikit-Learn
Compatibility: The
MLLearn API
Python API designed to be compatible with scikit-
learn and Spark MLPipelines
Algorithms that are currently part of mllearn API:
•LogisticRegression, LinearRegression, SVM, NaiveBayes
and Caffe2DML (discussed later)
Hyperparameter naming/initialization similar to
scikit-learn (penalty, fit_intercept,
normalize, …) to reduce learning curve
Supports loading and saving the model
36. Linear Regression Example
From http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
Python script using sklearn
Changes required to run on SystemML
37. Integration with Apache Spark’s ML Pipelines
Changes required to run on SystemML
From https://spark.apache.org/docs/latest/ml-pipeline.html
38. 38
caffe2dml
(experimental)
caffe2dml is a tool that converts the
specification for a Caffe deep learning model
into a SystemML script to perform training or
scoring at scale.
The generated scripts produce TensorBoard-
compatible log output.
Caffe2DML
Caffe
Network
File
Caffe
Solver
File
Log
Generated DML
Script
Apache
SystemML
40. SystemML Deep
Learning `nn`
Library
• Deep learning library written in DML.
• Multiple layers:
• Core: Affine, 2D Conv, 2D Transpose Conv, 2D
Max Pooling, 1D/2D Batch Norm, RNN, LSTM
• Nonlinearity/Transfer: ReLU, Sigmoid, Tanh,
Softmax
• Regularization: Dropout, L1, L2
• Loss: Log-loss, Cross-entropy, L1, L2
• Multiple optimizers:
• SGD, SGD w/ momentum, SGD w/ Nesterov
momentum, Adagrad, RMSprop, Adam
• Layers have a simple `forward` & `backward` API.
• Optimizers have a simple `update` API.
https://github.com/apache/systemml/tree/master/scripts/nn
(LeNet-like convnet)
41. 41
GPU Support in
SystemML Spark Technology Center
Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the
system improves.
9
42. 42
GPU Support in
SystemML
SystemML’s optimizer can target multiple runtime back
ends:
Single-node SMP
Multi-node Spark
Hybrid: Large SMP plus a pool of Spark workers
We are adding new GPU-accelerated runtimes to SystemML
Single-node single GPU
Single-node multi-GPU
Distributed multi-GPU on Spark
GPU-accelerate an algorithm without changing its code
43. 43
GPU Support in
SystemML:
Current Status
(In Progress) Single Node, Single GPU Support
• Deep Neural Network Operators
conv2d, conv2d_backward_data, conv2d_backward_filter, bias_add, bias_multiply,
max_pooling, max_pooling_backward, relu_max_pooling,
relu_max_pooling_backward
• Unary Aggregates
{All/Row/Col}-Sum, Mean, Variance, Min, Max & All-Product
• Matrix Multiplication
Various shapes & sparsities
• Transpose
• Matrix-Matrix and Matrix-Scalar Element-Wise
+, -, *, /, ^
• Trigonometric & Mathematical Operations (on entire Matrices)
sin, cos, tan, asin, acos, atan, log, sqrt, abs, floor, round, ceil, solve
• Some Fused/Special Case Operators
Ax+y, X*t(X), Max(X, 0.0)
• (In Progress) Automatically determine whether to use the GPU or not
(In Progress) - Single Node, Multiple GPU Support
(Planned) - Multiple Node, Multiple GPU Support
44. 44
Summary:
Cool New Stuff in
Apache
SystemML
Top-level Apache project
API improvements
Deep learning
Code generation
Compressed linear algebra
47. Tutorial hosted at IBM developerWorks Code
Patterns
https://developer.ibm.com/code/patterns/perform-a-machine-learning-
exercise/
Tutorial source code available on GitHub
https://github.com/IBM/SystemML_Usage?cm_sp=Developer-_-
perform-a-machine-learning-exercise-_-Get-the-Code
Try this on DSX/IBM Cloud
https://ibm.biz/BdjJJG
47
SystemML
Tutorial
49. For
More
Information…
Try Apache SystemML!
http://systemml.apache.org
Read our VLDB 2016 paper on compressed linear algebra:
Best Paper award!
Ahmed Elgohary et al, “Compressed Linear Algebra for Large-
Scale Machine Learning.” VLDB 2016
Read our CIDR 2017 paper on codegen:
Tarek Elgamal et al, “SPOOF: Sum-Product Optimization and
Operator Fusion for Large-Scale Machine Learning,” CIDR
2017
Get the slides for our Strata 2016 talk on deep learning with
SystemML:
Leveraging deep learning to predict breast cancer proliferation
scores with Apache Spark and Apache SystemML49