SystemML - Datapalooza Denver - 05.17.16 MWD

Apache SystemML
Mike Dusenberry
Engineer, Machine Learning & SystemML
Spark Technology Center
@dusenberrymw
Datapalooza, Denver - 05.19.16

Apache
SystemML
1. Background
a. Machine Learning
b. Declarative ML
2. SystemML
a. Overview
b. Language
c. Compiler/Optimizer
d. Runtime
3. Demo
4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda

Links
● Main Website:
systemml.apache.org
● Code:
github.com/apache/incubator-systemml
● Documentation:
apache.github.io/incubator-systemml
● JIRA:
issues.apache.org/jira/browse/SYSTEMML

Machine Learning
● Data
○ Multiple “examples”
○ Multiple “features” per “example”
○ “Label(s)” for each “example” (supervised)
● Model
○ Construct/select a model that fits the problem.
○ Examples:
■ Linear/Logistic Regression
■ SVM
■ Neural Networks
● Loss
○ An “evaluation” of how well the model fits the data.
● Optimizer
○ Minimize “loss” by adjusting model to better fit the data.

Laptop
Exploratory Data Analysis Today
R
Python
Others
Data
Scientist
DataR
Python
Others
Data
Scientist

Laptop
Exploratory Data Analysis Today
R
Python
Others
Data
Scientist
R
Python
Others
Data
Scientist

Current Best Practice for Big Data Analysis
Data
Scientist
Data
Scientist
Data
Scientist
Hadoop
Engineer
Spark
Engineer
MPI
Engineer
R
Python
Others

Laptop
Data
Scientist
Scale-up
Cluster
R
Python Query
Optimization
Others
Vision: Declarative Machine Learning

Common patterns:
•Changes in feature set
•Changes in data size
•Algorithm customization
•Quick iteration
Declarative Machine Learning

Classification by level of abstraction (different target user)
Landscape of Existing Work
Distributed Systems w/ DSLs
Large-Scale ML Libraries (fixed plan)
Declarative ML (fixed algorithm)
Declarative ML++ (fixed task)
Spark, Flink, REEF, GraphLab,
(R, Matlab, SAS)
MLlib, Mahout MR, MADlib, ORE,
Rev R, HP Dist R, Custom alg.
SystemML, (Mahout Samsara,
Tupleware, Cumulon, Dmac, SimSQL)
Mlbase*, Specific sys.

Requirements to Support Declarative ML
• Goal: Write ML algorithms independent of input data and cluster characteristics.
• R1: Full flexibility
▪ Specify new / customize existing ML algorithms.
▪ ➔ ML DSL
• R2: Data independence
▪ Hide physical data representation (sparse/dense, row/column-major, blocking
configs, partitioning, caching, compression).
▪ ➔ Abstract data types and coarse-grained logical operations.
• R3: Efficiency and scalability
▪ Very small to very large use-cases.
▪ ➔ Automatic optimization and hybrid runtime plans.
• R4: Specified algorithm semantics
▪ Understand, debug, and control algorithm behavior.
▪ ➔ Optimization for performance only, not accuracy.

Sidenote: Fun Stuff - Neural Art
-A Neural Algorithm of Artistic Style, L.A.
Gatys, A.S. Ecker, M. Bethge
-https://github.com/jcjohnson/neural-style

Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine

SystemML - Example: Logistic Regression (DML)

SystemML - Example: Sigmoid Function (DML)

SystemML - Compilation Chain
24

25

26

27

More Fun...
https://github.com/google/deepdream

32

33
Spark
CP + b sb _mVar1
SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE
_mVar2.MATRIX.DOUBLE RIGHT false NONE
CP * y _mVar2 _mVar3

SystemML Architecture (APIs and runtime)
35
Command
Line
JMLC
Spark
MLContext
Spark
ML
APIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control Program
Runtime
Prog
Buffer Pool
ParFor Optimizer/Runtime
MR InstSpark
Inst
CP
Inst
Recompiler
Cost-based
optimizations
DFS IOMem/FS IO
Generic
MR
MatrixBlock Library
(single/multi-threaded)

SystemML Architecture (APIs and runtime)
36
Command
Line
JMLC
Spark
MLContext
Spark
ML
APIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control Program
Runtime
Prog
Buffer Pool
ParFor Optimizer/Runtime
MR InstSpark
Inst
CP
Inst
Recompiler
Cost-based
optimizations
DFS IOMem/FS IO
Generic
MR
MatrixBlock Library
(single/multi-threaded)

Current Work
● Usability / Applications:
○ Deep Learning (SYSTEMML-540)
○ Embedded Scala/Python/R DSL with sufficient optimization scope (SYSTEMML-451)
● Optimizer:
○ Cost-model enhancement (SYSTEMML-416)
○ Global program optimization (SYSTEMML-421)
○ Source code generation for automatic operator fusion (SYSTEMML-448)
● Runtime:
○ Add GPU backend (SYSTEMML-445) => CUDA / OpenCL
○ Frame support / Sparse block representation
○ Integrate Apache Flink as additional backend for SystemML (SYSTEMML-636 / PR-119)
○ NUMA-aware single node backend (SYSTEMML-406)

Deep Learning - Plans
● Deep Learning library for SystemML written in DML (SYSTEMML-618).
○ SystemML-NN [https://github.com/dusenberrymw/systemml-nn]
● Built-in DML functions for computationally-intensive layers.
○ Convolution (2D), Max Pooling
● GPU acceleration for these built-in functions (SYSTEMML-445).
● Integration with existing deep learning libraries (Keras, TensorFlow, Torch,
etc.)?

Deep Learning - SystemML-NN Library
● Deep learning library written in DML (and
PyDML soon…).
● Multiple layers:
○ Core:
■ Affine, 2D Convolution, Max Pooling
○ Nonlinearity/Transfer:
■ Sigmoid, Tanh, Softmax, ReLU
○ Regularization:
■ Dropout, L1, L2
○ Loss:
■ Log-loss, Cross-entropy, L1, L2
● Multiple optimizers:
○ SGD, SGD w/ momentum, SGD w/
Nesterov momentum, Adagrad, RMSprop,
Adam
https://github.com/dusenberrymw/systemml-nn

Deep Learning - SystemML-NN Library (cont.)
https://github.com/dusenberrymw/systemml-nn
● Each layer type has a simple `forward(...)
` and `backward(...)` API.
○ `forward(...)` computes the output of the
function based on the inputs.
○ `backward(...)`computes the partial
derivatives (gradient) of the inputs to the
function w.r.t. some function deeper in the
network (usually the loss function at the
end).
● Each optimizer has a simple `update(...)`
API.
○ `update(...)` adjusts the given parameters
based on their partial derivatives.
● Includes test code in DML.
○ Gradient checks, unit tests

Deep Learning - SystemML-NN Library (cont.)
SystemML-NN
SystemML
Engine

Apache
SystemML
1. Background
a. Machine Learning
b. Declarative ML
2. SystemML
a. Overview
b. Language
c. Compiler/Optimizer
d. Runtime
3. Demo
4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda Revisited

SystemML - Datapalooza Denver - 05.17.16 MWD

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (14)

Ähnlich wie SystemML - Datapalooza Denver - 05.17.16 MWD

Ähnlich wie SystemML - Datapalooza Denver - 05.17.16 MWD (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

SystemML - Datapalooza Denver - 05.17.16 MWD