SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Apache SystemML
Mike Dusenberry
Engineer, Machine Learning & SystemML
Spark Technology Center
@dusenberrymw
Datapalooza, Denver - 05.19.16
Apache
SystemML
1. Background
a. Machine Learning
b. Declarative ML
2. SystemML
a. Overview
b. Language
c. Compiler/Optimizer
d. Runtime
3. Demo
4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda
Links
● Main Website:
systemml.apache.org
● Code:
github.com/apache/incubator-systemml
● Documentation:
apache.github.io/incubator-systemml
● JIRA:
issues.apache.org/jira/browse/SYSTEMML
Machine Learning
Machine Learning
● Data
○ Multiple “examples”
○ Multiple “features” per “example”
○ “Label(s)” for each “example” (supervised)
● Model
○ Construct/select a model that fits the problem.
○ Examples:
■ Linear/Logistic Regression
■ SVM
■ Neural Networks
● Loss
○ An “evaluation” of how well the model fits the data.
● Optimizer
○ Minimize “loss” by adjusting model to better fit the data.
Declarative Machine Learning
Laptop
Exploratory Data Analysis Today
R
Python
Others
Data
Scientist
DataR
Python
Others
Data
Scientist
Laptop
Exploratory Data Analysis Today
R
Python
Others
Data
Scientist
R
Python
Others
Data
Scientist
Current Best Practice for Big Data Analysis
Data
Scientist
Data
Scientist
Data
Scientist
Hadoop
Engineer
Spark
Engineer
MPI
Engineer
R
Python
Others
Laptop
Data
Scientist
Scale-up
Cluster
R
Python Query
Optimization
Others
Vision: Declarative Machine Learning
Common patterns:
•Changes in feature set
•Changes in data size
•Algorithm customization
•Quick iteration
Declarative Machine Learning
Classification by level of abstraction (different target user)
Landscape of Existing Work
Distributed Systems w/ DSLs
Large-Scale ML Libraries (fixed plan)
Declarative ML (fixed algorithm)
Declarative ML++ (fixed task)
Spark, Flink, REEF, GraphLab,
(R, Matlab, SAS)
MLlib, Mahout MR, MADlib, ORE,
Rev R, HP Dist R, Custom alg.
SystemML, (Mahout Samsara,
Tupleware, Cumulon, Dmac, SimSQL)
Mlbase*, Specific sys.
Requirements to Support Declarative ML
• Goal: Write ML algorithms independent of input data and cluster characteristics.
• R1: Full flexibility
▪ Specify new / customize existing ML algorithms.
▪ ➔ ML DSL
• R2: Data independence
▪ Hide physical data representation (sparse/dense, row/column-major, blocking
configs, partitioning, caching, compression).
▪ ➔ Abstract data types and coarse-grained logical operations.
• R3: Efficiency and scalability
▪ Very small to very large use-cases.
▪ ➔ Automatic optimization and hybrid runtime plans.
• R4: Specified algorithm semantics
▪ Understand, debug, and control algorithm behavior.
▪ ➔ Optimization for performance only, not accuracy.
Apache SystemML
Sidenote: Fun Stuff - Neural Art
-A Neural Algorithm of Artistic Style, L.A.
Gatys, A.S. Ecker, M. Bethge
-https://github.com/jcjohnson/neural-style
Apache SystemML
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Example: Logistic Regression (DML)
SystemML - Example: Sigmoid Function (DML)
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Compilation Chain
SystemML - Compilation Chain
24
SystemML - Compilation Chain
25
SystemML - Compilation Chain
26
SystemML - Compilation Chain
27
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
More Fun...
https://github.com/google/deepdream
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Compilation Chain
32
SystemML - Compilation Chain
33
Spark
CP + b sb _mVar1
SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE
_mVar2.MATRIX.DOUBLE RIGHT false NONE
CP * y _mVar2 _mVar3
Apache SystemML
● High-level language
○ DML -> R-like
○ PyDML -> Python-like
○ Focus is on matrices and
linear algebra.
● Engine
○ Compiler/Optimizer
○ Lots of optimizations, such as
rewrites.
● Runtime
○ Laptop
○ Spark
○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML Architecture (APIs and runtime)
35
Command
Line
JMLC
Spark
MLContext
Spark
ML
APIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control Program
Runtime
Prog
Buffer Pool
ParFor Optimizer/Runtime
MR InstSpark
Inst
CP
Inst
Recompiler
Cost-based
optimizations
DFS IOMem/FS IO
Generic
MR
MatrixBlock Library
(single/multi-threaded)
SystemML Architecture (APIs and runtime)
36
Command
Line
JMLC
Spark
MLContext
Spark
ML
APIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control Program
Runtime
Prog
Buffer Pool
ParFor Optimizer/Runtime
MR InstSpark
Inst
CP
Inst
Recompiler
Cost-based
optimizations
DFS IOMem/FS IO
Generic
MR
MatrixBlock Library
(single/multi-threaded)
Demo
Current Work
Current Work
● Usability / Applications:
○ Deep Learning (SYSTEMML-540)
○ Embedded Scala/Python/R DSL with sufficient optimization scope (SYSTEMML-451)
● Optimizer:
○ Cost-model enhancement (SYSTEMML-416)
○ Global program optimization (SYSTEMML-421)
○ Source code generation for automatic operator fusion (SYSTEMML-448)
● Runtime:
○ Add GPU backend (SYSTEMML-445) => CUDA / OpenCL
○ Frame support / Sparse block representation
○ Integrate Apache Flink as additional backend for SystemML (SYSTEMML-636 / PR-119)
○ NUMA-aware single node backend (SYSTEMML-406)
Deep Learning - Plans
● Deep Learning library for SystemML written in DML (SYSTEMML-618).
○ SystemML-NN [https://github.com/dusenberrymw/systemml-nn]
● Built-in DML functions for computationally-intensive layers.
○ Convolution (2D), Max Pooling
● GPU acceleration for these built-in functions (SYSTEMML-445).
● Integration with existing deep learning libraries (Keras, TensorFlow, Torch,
etc.)?
Deep Learning - SystemML-NN Library
● Deep learning library written in DML (and
PyDML soon…).
● Multiple layers:
○ Core:
■ Affine, 2D Convolution, Max Pooling
○ Nonlinearity/Transfer:
■ Sigmoid, Tanh, Softmax, ReLU
○ Regularization:
■ Dropout, L1, L2
○ Loss:
■ Log-loss, Cross-entropy, L1, L2
● Multiple optimizers:
○ SGD, SGD w/ momentum, SGD w/
Nesterov momentum, Adagrad, RMSprop,
Adam
https://github.com/dusenberrymw/systemml-nn
Deep Learning - SystemML-NN Library (cont.)
https://github.com/dusenberrymw/systemml-nn
● Each layer type has a simple `forward(...)
` and `backward(...)` API.
○ `forward(...)` computes the output of the
function based on the inputs.
○ `backward(...)`computes the partial
derivatives (gradient) of the inputs to the
function w.r.t. some function deeper in the
network (usually the loss function at the
end).
● Each optimizer has a simple `update(...)`
API.
○ `update(...)` adjusts the given parameters
based on their partial derivatives.
● Includes test code in DML.
○ Gradient checks, unit tests
Deep Learning - SystemML-NN Library (cont.)
SystemML-NN
SystemML
Engine
Apache
SystemML
1. Background
a. Machine Learning
b. Declarative ML
2. SystemML
a. Overview
b. Language
c. Compiler/Optimizer
d. Runtime
3. Demo
4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda Revisited
Links
● Main Website:
systemml.apache.org
● Code:
github.com/apache/incubator-systemml
● Documentation:
apache.github.io/incubator-systemml
● JIRA:
issues.apache.org/jira/browse/SYSTEMML
Questions?
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PyData
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
kammeyer
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
Tommaso Teofili
 

Was ist angesagt? (20)

Apache Hama at Samsung Open Source Conference
Apache Hama at Samsung Open Source ConferenceApache Hama at Samsung Open Source Conference
Apache Hama at Samsung Open Source Conference
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Cache options for Data Layer
Cache options for Data LayerCache options for Data Layer
Cache options for Data Layer
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymH2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas Nykodym
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Business logic with PostgreSQL and Python
Business logic with PostgreSQL and PythonBusiness logic with PostgreSQL and Python
Business logic with PostgreSQL and Python
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 

Andere mochten auch

SUNWELLS INDIA PRESENTATION.PPTX
SUNWELLS INDIA PRESENTATION.PPTXSUNWELLS INDIA PRESENTATION.PPTX
SUNWELLS INDIA PRESENTATION.PPTX
Perumal Arasu
 
Paul John Trew CV
Paul John Trew CVPaul John Trew CV
Paul John Trew CV
paul trew
 
Vazquez-Elliott,Ignacio 2016 Obispo
Vazquez-Elliott,Ignacio 2016 ObispoVazquez-Elliott,Ignacio 2016 Obispo
Vazquez-Elliott,Ignacio 2016 Obispo
ignacio vazquez
 
_HC-Brochure_122016_sm
_HC-Brochure_122016_sm_HC-Brochure_122016_sm
_HC-Brochure_122016_sm
Sharon Murphy
 

Andere mochten auch (14)

SUNWELLS INDIA PRESENTATION.PPTX
SUNWELLS INDIA PRESENTATION.PPTXSUNWELLS INDIA PRESENTATION.PPTX
SUNWELLS INDIA PRESENTATION.PPTX
 
新鮮・清潔な卵
新鮮・清潔な卵新鮮・清潔な卵
新鮮・清潔な卵
 
Paul John Trew CV
Paul John Trew CVPaul John Trew CV
Paul John Trew CV
 
Métodos de Pesquisa
Métodos de PesquisaMétodos de Pesquisa
Métodos de Pesquisa
 
Экономим вместе
Экономим вместеЭкономим вместе
Экономим вместе
 
Maquinas estocasticas
Maquinas estocasticasMaquinas estocasticas
Maquinas estocasticas
 
Vazquez-Elliott,Ignacio 2016 Obispo
Vazquez-Elliott,Ignacio 2016 ObispoVazquez-Elliott,Ignacio 2016 Obispo
Vazquez-Elliott,Ignacio 2016 Obispo
 
Apresentação perfume Natura Signos Leão
Apresentação perfume Natura Signos LeãoApresentação perfume Natura Signos Leão
Apresentação perfume Natura Signos Leão
 
Chapter 5c -hydrocracking_i
Chapter 5c -hydrocracking_iChapter 5c -hydrocracking_i
Chapter 5c -hydrocracking_i
 
A Fight For Freedom
A Fight For FreedomA Fight For Freedom
A Fight For Freedom
 
Harmonised Classification and Labelling: Data on Glyphosate for Discussion...
Harmonised  Classification and Labelling:  Data on Glyphosate  for Discussion...Harmonised  Classification and Labelling:  Data on Glyphosate  for Discussion...
Harmonised Classification and Labelling: Data on Glyphosate for Discussion...
 
Praktikum biokimia blok 23
Praktikum biokimia blok 23Praktikum biokimia blok 23
Praktikum biokimia blok 23
 
_HC-Brochure_122016_sm
_HC-Brochure_122016_sm_HC-Brochure_122016_sm
_HC-Brochure_122016_sm
 
principles of public finance & public revanue and taxation
principles of public finance & public revanue and taxationprinciples of public finance & public revanue and taxation
principles of public finance & public revanue and taxation
 

Ähnlich wie SystemML - Datapalooza Denver - 05.17.16 MWD

Ähnlich wie SystemML - Datapalooza Denver - 05.17.16 MWD (20)

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Open Platform for AI & ML modeling
Open Platform for AI & ML modelingOpen Platform for AI & ML modeling
Open Platform for AI & ML modeling
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Sequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdfSequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdf
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 

Kürzlich hochgeladen

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Kürzlich hochgeladen (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 

SystemML - Datapalooza Denver - 05.17.16 MWD

  • 1. Apache SystemML Mike Dusenberry Engineer, Machine Learning & SystemML Spark Technology Center @dusenberrymw Datapalooza, Denver - 05.19.16
  • 2. Apache SystemML 1. Background a. Machine Learning b. Declarative ML 2. SystemML a. Overview b. Language c. Compiler/Optimizer d. Runtime 3. Demo 4. Current Work a. Deep Learning: SystemML-NN 5. Questions Agenda
  • 3. Links ● Main Website: systemml.apache.org ● Code: github.com/apache/incubator-systemml ● Documentation: apache.github.io/incubator-systemml ● JIRA: issues.apache.org/jira/browse/SYSTEMML
  • 5. Machine Learning ● Data ○ Multiple “examples” ○ Multiple “features” per “example” ○ “Label(s)” for each “example” (supervised) ● Model ○ Construct/select a model that fits the problem. ○ Examples: ■ Linear/Logistic Regression ■ SVM ■ Neural Networks ● Loss ○ An “evaluation” of how well the model fits the data. ● Optimizer ○ Minimize “loss” by adjusting model to better fit the data.
  • 7. Laptop Exploratory Data Analysis Today R Python Others Data Scientist DataR Python Others Data Scientist
  • 8. Laptop Exploratory Data Analysis Today R Python Others Data Scientist R Python Others Data Scientist
  • 9. Current Best Practice for Big Data Analysis Data Scientist Data Scientist Data Scientist Hadoop Engineer Spark Engineer MPI Engineer R Python Others
  • 11. Common patterns: •Changes in feature set •Changes in data size •Algorithm customization •Quick iteration Declarative Machine Learning
  • 12. Classification by level of abstraction (different target user) Landscape of Existing Work Distributed Systems w/ DSLs Large-Scale ML Libraries (fixed plan) Declarative ML (fixed algorithm) Declarative ML++ (fixed task) Spark, Flink, REEF, GraphLab, (R, Matlab, SAS) MLlib, Mahout MR, MADlib, ORE, Rev R, HP Dist R, Custom alg. SystemML, (Mahout Samsara, Tupleware, Cumulon, Dmac, SimSQL) Mlbase*, Specific sys.
  • 13. Requirements to Support Declarative ML • Goal: Write ML algorithms independent of input data and cluster characteristics. • R1: Full flexibility ▪ Specify new / customize existing ML algorithms. ▪ ➔ ML DSL • R2: Data independence ▪ Hide physical data representation (sparse/dense, row/column-major, blocking configs, partitioning, caching, compression). ▪ ➔ Abstract data types and coarse-grained logical operations. • R3: Efficiency and scalability ▪ Very small to very large use-cases. ▪ ➔ Automatic optimization and hybrid runtime plans. • R4: Specified algorithm semantics ▪ Understand, debug, and control algorithm behavior. ▪ ➔ Optimization for performance only, not accuracy.
  • 15. Sidenote: Fun Stuff - Neural Art -A Neural Algorithm of Artistic Style, L.A. Gatys, A.S. Ecker, M. Bethge -https://github.com/jcjohnson/neural-style
  • 17. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 18. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 19. SystemML - Example: Logistic Regression (DML)
  • 20. SystemML - Example: Sigmoid Function (DML)
  • 21. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 22. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 28. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 29. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 31. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 33. SystemML - Compilation Chain 33 Spark CP + b sb _mVar1 SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
  • 34. Apache SystemML ● High-level language ○ DML -> R-like ○ PyDML -> Python-like ○ Focus is on matrices and linear algebra. ● Engine ○ Compiler/Optimizer ○ Lots of optimizations, such as rewrites. ● Runtime ○ Laptop ○ Spark ○ (also Hadoop) (DML) (PyDML) Engine
  • 35. SystemML Architecture (APIs and runtime) 35 Command Line JMLC Spark MLContext Spark ML APIs High-Level Operators (HOPs) Parser/Language Low-Level Operators (LOPs) Compiler Runtime Control Program Runtime Prog Buffer Pool ParFor Optimizer/Runtime MR InstSpark Inst CP Inst Recompiler Cost-based optimizations DFS IOMem/FS IO Generic MR MatrixBlock Library (single/multi-threaded)
  • 36. SystemML Architecture (APIs and runtime) 36 Command Line JMLC Spark MLContext Spark ML APIs High-Level Operators (HOPs) Parser/Language Low-Level Operators (LOPs) Compiler Runtime Control Program Runtime Prog Buffer Pool ParFor Optimizer/Runtime MR InstSpark Inst CP Inst Recompiler Cost-based optimizations DFS IOMem/FS IO Generic MR MatrixBlock Library (single/multi-threaded)
  • 37. Demo
  • 39. Current Work ● Usability / Applications: ○ Deep Learning (SYSTEMML-540) ○ Embedded Scala/Python/R DSL with sufficient optimization scope (SYSTEMML-451) ● Optimizer: ○ Cost-model enhancement (SYSTEMML-416) ○ Global program optimization (SYSTEMML-421) ○ Source code generation for automatic operator fusion (SYSTEMML-448) ● Runtime: ○ Add GPU backend (SYSTEMML-445) => CUDA / OpenCL ○ Frame support / Sparse block representation ○ Integrate Apache Flink as additional backend for SystemML (SYSTEMML-636 / PR-119) ○ NUMA-aware single node backend (SYSTEMML-406)
  • 40. Deep Learning - Plans ● Deep Learning library for SystemML written in DML (SYSTEMML-618). ○ SystemML-NN [https://github.com/dusenberrymw/systemml-nn] ● Built-in DML functions for computationally-intensive layers. ○ Convolution (2D), Max Pooling ● GPU acceleration for these built-in functions (SYSTEMML-445). ● Integration with existing deep learning libraries (Keras, TensorFlow, Torch, etc.)?
  • 41. Deep Learning - SystemML-NN Library ● Deep learning library written in DML (and PyDML soon…). ● Multiple layers: ○ Core: ■ Affine, 2D Convolution, Max Pooling ○ Nonlinearity/Transfer: ■ Sigmoid, Tanh, Softmax, ReLU ○ Regularization: ■ Dropout, L1, L2 ○ Loss: ■ Log-loss, Cross-entropy, L1, L2 ● Multiple optimizers: ○ SGD, SGD w/ momentum, SGD w/ Nesterov momentum, Adagrad, RMSprop, Adam https://github.com/dusenberrymw/systemml-nn
  • 42. Deep Learning - SystemML-NN Library (cont.) https://github.com/dusenberrymw/systemml-nn ● Each layer type has a simple `forward(...) ` and `backward(...)` API. ○ `forward(...)` computes the output of the function based on the inputs. ○ `backward(...)`computes the partial derivatives (gradient) of the inputs to the function w.r.t. some function deeper in the network (usually the loss function at the end). ● Each optimizer has a simple `update(...)` API. ○ `update(...)` adjusts the given parameters based on their partial derivatives. ● Includes test code in DML. ○ Gradient checks, unit tests
  • 43. Deep Learning - SystemML-NN Library (cont.) SystemML-NN SystemML Engine
  • 44. Apache SystemML 1. Background a. Machine Learning b. Declarative ML 2. SystemML a. Overview b. Language c. Compiler/Optimizer d. Runtime 3. Demo 4. Current Work a. Deep Learning: SystemML-NN 5. Questions Agenda Revisited
  • 45. Links ● Main Website: systemml.apache.org ● Code: github.com/apache/incubator-systemml ● Documentation: apache.github.io/incubator-systemml ● JIRA: issues.apache.org/jira/browse/SYSTEMML