2. Apache
SystemML
1. Background
a. Machine Learning
b. Declarative ML
2. SystemML
a. Overview
b. Language
c. Compiler/Optimizer
d. Runtime
3. Demo
4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda
5. Machine Learning
● Data
○ Multiple “examples”
○ Multiple “features” per “example”
○ “Label(s)” for each “example” (supervised)
● Model
○ Construct/select a model that fits the problem.
○ Examples:
■ Linear/Logistic Regression
■ SVM
■ Neural Networks
● Loss
○ An “evaluation” of how well the model fits the data.
● Optimizer
○ Minimize “loss” by adjusting model to better fit the data.
9. Current Best Practice for Big Data Analysis
Data
Scientist
Data
Scientist
Data
Scientist
Hadoop
Engineer
Spark
Engineer
MPI
Engineer
R
Python
Others
11. Common patterns:
•Changes in feature set
•Changes in data size
•Algorithm customization
•Quick iteration
Declarative Machine Learning
12. Classification by level of abstraction (different target user)
Landscape of Existing Work
Distributed Systems w/ DSLs
Large-Scale ML Libraries (fixed plan)
Declarative ML (fixed algorithm)
Declarative ML++ (fixed task)
Spark, Flink, REEF, GraphLab,
(R, Matlab, SAS)
MLlib, Mahout MR, MADlib, ORE,
Rev R, HP Dist R, Custom alg.
SystemML, (Mahout Samsara,
Tupleware, Cumulon, Dmac, SimSQL)
Mlbase*, Specific sys.
13. Requirements to Support Declarative ML
• Goal: Write ML algorithms independent of input data and cluster characteristics.
• R1: Full flexibility
▪ Specify new / customize existing ML algorithms.
▪ ➔ ML DSL
• R2: Data independence
▪ Hide physical data representation (sparse/dense, row/column-major, blocking
configs, partitioning, caching, compression).
▪ ➔ Abstract data types and coarse-grained logical operations.
• R3: Efficiency and scalability
▪ Very small to very large use-cases.
▪ ➔ Automatic optimization and hybrid runtime plans.
• R4: Specified algorithm semantics
▪ Understand, debug, and control algorithm behavior.
▪ ➔ Optimization for performance only, not accuracy.
15. Sidenote: Fun Stuff - Neural Art
-A Neural Algorithm of Artistic Style, L.A.
Gatys, A.S. Ecker, M. Bethge
-https://github.com/jcjohnson/neural-style
39. Current Work
● Usability / Applications:
○ Deep Learning (SYSTEMML-540)
○ Embedded Scala/Python/R DSL with sufficient optimization scope (SYSTEMML-451)
● Optimizer:
○ Cost-model enhancement (SYSTEMML-416)
○ Global program optimization (SYSTEMML-421)
○ Source code generation for automatic operator fusion (SYSTEMML-448)
● Runtime:
○ Add GPU backend (SYSTEMML-445) => CUDA / OpenCL
○ Frame support / Sparse block representation
○ Integrate Apache Flink as additional backend for SystemML (SYSTEMML-636 / PR-119)
○ NUMA-aware single node backend (SYSTEMML-406)
40. Deep Learning - Plans
● Deep Learning library for SystemML written in DML (SYSTEMML-618).
○ SystemML-NN [https://github.com/dusenberrymw/systemml-nn]
● Built-in DML functions for computationally-intensive layers.
○ Convolution (2D), Max Pooling
● GPU acceleration for these built-in functions (SYSTEMML-445).
● Integration with existing deep learning libraries (Keras, TensorFlow, Torch,
etc.)?
42. Deep Learning - SystemML-NN Library (cont.)
https://github.com/dusenberrymw/systemml-nn
● Each layer type has a simple `forward(...)
` and `backward(...)` API.
○ `forward(...)` computes the output of the
function based on the inputs.
○ `backward(...)`computes the partial
derivatives (gradient) of the inputs to the
function w.r.t. some function deeper in the
network (usually the loss function at the
end).
● Each optimizer has a simple `update(...)`
API.
○ `update(...)` adjusts the given parameters
based on their partial derivatives.
● Includes test code in DML.
○ Gradient checks, unit tests
43. Deep Learning - SystemML-NN Library (cont.)
SystemML-NN
SystemML
Engine
44. Apache
SystemML
1. Background
a. Machine Learning
b. Declarative ML
2. SystemML
a. Overview
b. Language
c. Compiler/Optimizer
d. Runtime
3. Demo
4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda Revisited