SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Multinomial Logistic Regression
with Apache Spark
DB Tsai
Machine Learning Engineer
May 1st
, 2014
Machine Learning with Big Data
● Hadoop MapReduce solutions
● MapReduce is scaling well for batch processing
● However, lots of machine learning algorithms
are iterative by nature.
● There are lots of tricks people do, like training
with subsamples of data, and then average the
models. Why big data with approximation?
+ =
Lightning-fast cluster computing
● Empower users to iterate through the data by
utilizing the in-memory cache.
● Logistic regression runs
up to 100x faster than
Hadoop M/R in memory.
● We're able to train
exact model without doing any approximation.
Binary Logistic Regression
d =
ax1+bx2+cx0
√a2
+b2
where x0=1
P( y=1∣⃗x , ⃗w)=
exp(d )
1+exp(d )
=
exp(⃗x ⃗w)
1+exp(⃗x ⃗w)
P ( y=0∣⃗x , ⃗w)=
1
1+exp(⃗x ⃗w)
log
P ( y=1∣⃗x , ⃗w)
P( y=0∣⃗x , ⃗w)
=⃗x ⃗w
w0=
c
√a2
+b2
w1=
a
√a2
+b2
w2=
b
√a2
+b2
wherew0 iscalled as intercept
Training Binary Logistic Regression
● Maximum Likelihood estimation
From a training data
and labels
● We want to find that maximizes the
likelihood of data defined by
● We can take log of the equation, and minimize
it instead. The Log-Likelihood becomes the
loss function.
X =( ⃗x1 , ⃗x2 , ⃗x3 ,...)
Y =( y1, y2, y3, ...)
⃗w
L( ⃗w , ⃗x1, ... , ⃗xN )=P ( y1∣⃗x1 , ⃗w)P ( y2∣⃗x2 , ⃗w)... P( yN∣ ⃗xN , ⃗w)
l( ⃗w ,⃗x)=log P(y1∣⃗x1 , ⃗w)+log P( y2∣⃗x2 , ⃗w)...+log P( yN∣ ⃗xN , ⃗w)
Optimization
● First Order Minimizer
Require loss, gradient of loss function
– Gradient Decent is step size
– Limited-memory BFGS (L-BFGS)
– Orthant-Wise Limited-memory Quasi-Newton
(OWLQN)
– Coordinate Descent (CD)
– Trust Region Newton Method (TRON)
● Second Order Minimizer
Require loss, gradient and hessian of loss function
– Newton-Raphson, quadratic convergence. Fast!
● Ref: Journal of Machine Learning Research 11 (2010) 3183-
3234, Chih-Jen Lin et al.
⃗wn+1=⃗wn−γ ⃗G , γ
⃗wn+1=⃗wn−H−1 ⃗G
Problem of Second Order Minimizer
● Scale horizontally (the numbers of training
data) by leveraging Spark to parallelize this
iterative optimization process.
● Don't scale vertically (the numbers of training
features). Dimension of Hessian is
● Recent applications from document
classification and computational linguistics are
of this type.
dim(H )=[(k−1)(n+1)]
2
where k is numof class ,nis num of features
L-BFGS
● It's a quasi-Newton method.
● Hessian matrix of second derivatives doesn't
need to be evaluated directly.
● Hessian matrix is approximated using gradient
evaluations.
● It converges a way faster than the default
optimizer in Spark, Gradient Decent.
● We love open source! Alpine Data Labs
contributed our L-BFGS to Spark, and it's
already merged in Spark-1157.
Training Binary Logistic Regression
l( ⃗w ,⃗x) = ∑
k=1
N
log P( yk∣⃗xk , ⃗w)
= ∑k=1
N
yk log P ( yk=1∣⃗xk , ⃗w)+(1−yk )log P ( yk =0∣⃗xk , ⃗w)
= ∑k=1
N
yk log
exp( ⃗xk ⃗w)
1+exp( ⃗xk ⃗w)
+(1−yk )log
1
1+exp( ⃗xk ⃗w)
= ∑k=1
N
yk ⃗xk ⃗w−log(1+exp( ⃗xk ⃗w))
Gradient : Gi (⃗w ,⃗x)=
∂l( ⃗w ,⃗x)
∂ wi
=∑k=1
N
yk xki−
exp( ⃗xk ⃗w)
1+exp( ⃗xk ⃗w)
xki
Hessian: H ij (⃗w ,⃗x)=
∂∂l( ⃗w ,⃗x)
∂wi ∂w j
=−∑k=1
N
exp( ⃗xk ⃗w)
(1+exp( ⃗xk ⃗w))2
xki xkj
Overfitting
P ( y=1∣⃗x , ⃗w)=
exp(zd )
1+exp(zd )
=
exp(⃗x ⃗w)
1+exp(⃗x ⃗w)
Regularization
● The loss function becomes
● The loss function of regularizer doesn't
depend on data. Common regularizers are
– L2 Regularization:
– L1 Regularization:
● L1 norm is not differentiable at zero!
ltotal (⃗w ,⃗x)=lmodel (⃗w ,⃗x)+lreg( ⃗w)
lreg (⃗w)=λ∑i=1
N
wi
2
lreg (⃗w)=λ∑i=1
N
∣wi∣
⃗G (⃗w ,⃗x)total=⃗G(⃗w ,⃗x)model+⃗G( ⃗w)reg
̄H ( ⃗w ,⃗x)total= ̄H (⃗w ,⃗x)model+ ̄H (⃗w)reg
Mini School of Spark APIs
● map(func) : Return a new distributed dataset
formed by passing each element of the source
through a function func.
● reduce(func) : Aggregate the elements of the
dataset using a function func (which takes two
arguments and returns one). The function
should be commutative and associative so
that it can be computed correctly in parallel.
No “key” thing here compared with Hadoop
M/R.
Example – No More “Word Count”!
Let's compute the mean of numbers.
3.0
2.0
1.0
5.0
Executor 1
Executor 2
map
map
(1.0, 1)
(5.0, 1)
(3.0, 1)
(2.0, 1)
reduce
(5.0, 2)
reduce (11.0, 4)
Shuffle
● The previous example using map(func) will
create new tuple object, which may cause
Garbage Collection issue.
● Like the combiner in Hadoop Map Reduce,
for each tuple emitted from map(func), there is
no guarantee that they will be combined
locally in the same executor by reduce(func).
It may increase the traffic in shuffle phase.
● In Hadoop, we address this by in-mapper
combiner or aggregating the result in global
variables which have scope to entire partition.
The same approach can be used in Spark.
Mini School of Spark APIs
● mapPartitions(func) : Similar to map, but runs
separately on each partition (block) of the
RDD, so func must be of type Iterator[T] =>
Iterator[U] when running on an RDD of type T.
● This API allows us to have global variables on
entire partition to aggregate the result locally
and efficiently.
Better Mean
of Numbers
Implementation
3.0
2.0
Executor 1
reduce (11.0, 4)
Shuffle
var counts = 0
var sum = 0.0
loop
var counts = 2
var sum = 5.0
3.0
2.0
(5.0, 2)
1.0
5.0
Executor 2
var counts = 0
var sum = 0.0
loop
var counts = 2
var sum = 6.0
1.0
5.0
(6.0, 2)
mapPartitions
More Idiomatic Scala Implementation
● aggregate(zeroValue: U)(seqOp: (U, T) => U,
combOp: (U, U) => U): U zeroValue is a neutral "zero value"
with type U for initialization. This is
analogous to
in previous example.
var counts = 0
var sum = 0.0
seqOp is function taking
(U, T) => U, where U is aggregator
initialized as zeroValue. T is each
line of values in RDD. This is
analogous to mapPartition in
previous example.
combOp is function taking
(U, U) => U. This is essentially
combing the results between
different executors. The
functionality is the same as
reduce(func) in previous example.
Approach of Parallelization in Spark
Spark Driver
JVM
Time
Spark Executor 1
JVM
Spark Executor 2
JVM
1) Find available resources in
cluster, and launch executor JVMs.
Also initialize the weights.
2) Ask executors to load the data
into executors' JVMs
2) Trying to load the data into memory. If the data is bigger
than memory, it will be partial cached.
(The data locality from source will be taken care by Spark)
3) Ask executors to compute loss, and
gradient of each training sample
(each row) given the current weights.
Get the aggregated results after
the Reduce Phase in executors.
5) If the regularization is enabled,
compute the loss and gradient of
regularizer in driver since it doesn't
depend on training data but only
depends on weights. Add them into
the results from executors.
3) Map Phase:
Compute the loss and gradient of each row of training data
locally given the weights obtained from the driver. Can either
emit each result or sum them up in local aggregators.
4) Reduce Phase:
Sum up the losses and gradients emitted from the Map Phase
Taking a rest!
6) Plug the loss and gradient from
model and regularizer into optimizer
to get the new weights. If the
differences of weights and losses
are larger than criteria,
GO BACK TO 3)
Taking a rest!
7) Finish the model training! Taking a rest!
Step 3) and 4)
● This is the implementation of step 3) and 4) in
MLlib before Spark 1.0
● gradient can have implementations of
Logistic Regression, Linear Regression, SVM,
or any customized cost function.
● Each training data will create new “grad”
object after gradient.compute.
Step 3) and 4) with mapPartitions
Step 3) and 4) with aggregate
● This is the implementation of step 3) and 4) in
MLlib in coming Spark 1.0
● No unnecessary object creation! It's helpful
when we're dealing with large features training
data. GC will not kick in the executor JVMs.
Extension to Multinomial Logistic Regression
● In Binary Logistic Regression
● For K classes multinomial problem where labels
ranged from [0, K-1], we can generalize it via
● The model, weights becomes
(K-1)(N+1) matrix, where N is number of features.
log
P( y=1∣⃗x , ̄w)
P ( y=0∣⃗x , ̄w)
=⃗x ⃗w1
log
P ( y=2∣⃗x , ̄w)
P ( y=0∣⃗x , ̄w)
=⃗x ⃗w2
...
log
P ( y=K −1∣⃗x , ̄w)
P ( y=0∣⃗x , ̄w)
=⃗x ⃗wK −1
log
P( y=1∣⃗x , ⃗w)
P ( y=0∣⃗x , ⃗w)
=⃗x ⃗w
̄w=(⃗w1, ⃗w2, ... , ⃗wK −1)T
P( y=0∣⃗x , ̄w)=
1
1+∑
i=1
K −1
exp(⃗x ⃗wi )
P( y=1∣⃗x , ̄w)=
exp(⃗x ⃗w2)
1+∑
i=1
K −1
exp(⃗x ⃗wi)
...
P( y=K −1∣⃗x , ̄w)=
exp(⃗x ⃗wK −1)
1+∑i=1
K −1
exp(⃗x ⃗wi )
Training Multinomial Logistic Regression
l( ̄w ,⃗x) = ∑
k=1
N
log P( yk∣⃗xk , ̄w)
= ∑k=1
N
α( yk)log P( y=0∣⃗xk , ̄w)+(1−α( yk ))log P ( yk∣⃗xk , ̄w)
= ∑k=1
N
α( yk)log
1
1+∑
i=1
K −1
exp(⃗x ⃗wi)
+(1−α( yk ))log
exp(⃗x ⃗w yk
)
1+∑
i=1
K−1
exp(⃗x ⃗wi)
= ∑
k=1
N
(1−α( yk ))⃗x ⃗wyk
−log(1+∑
i=1
K−1
exp(⃗x ⃗wi))
Gradient : Gij ( ̄w ,⃗x)=
∂l( ̄w ,⃗x)
∂wij
=∑k=1
N
(1−α( yk )) xkj δi , yk
−
exp( ⃗xk ⃗w)
1+exp( ⃗xk ⃗w)
xkj
α( yk )=1 if yk=0
α( yk )=0 if yk≠0
Define:
Note that the first index “i” is for classes, and the second index “j” is for features.
Hessian: H ij ,lm( ̄w ,⃗x)=
∂∂l ( ̄w ,⃗x)
∂wij ∂wlm
0 5 10 15 20 25 30 35
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
L-BFGS Dense Features
L-BFGS Sparse Features
GD Sparse Features
GD Dense Features
Seconds
Log-Likelihood/NumberofSamples
a9a Dataset Benchmark
a9a Dataset Benchmark
-1 1 3 5 7 9 11 13 15
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
L-BFGS
GD
Iterations
Log-Likelihood/NumberofSamples
0 5 10 15 20 25 30
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Logistic Regression with rcv1 Dataset (6.8M rows, 677,399 features, 0.15% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
LBFGS Sparse Vector
GD Sparse Vector
Second
Log-Likelihood/NumberofSamples
rcv1 Dataset Benchmark
news20 Dataset Benchmark
0 10 20 30 40 50 60 70 80
0
0.2
0.4
0.6
0.8
1
1.2
Logistic Regression with news20 Dataset (0.14M rows, 1,355,191 features, 0.034% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
LBFGS Sparse Vector
GD Sparse Vector
Second
Log-Likelihood/NumberofSamples
Alpine Demo
Alpine Demo
Alpine Demo
Conclusion
● Alpine supports state of the art Cloudera CDH5
● Spark runs programs 100x faster than Hadoop
● Spark turns iterative big data machine learning problems
into single machine problems
● MLlib provides lots of state of art machine learning
implementations, e.g. K-means, linear regression,
logistic regression, ALS, collaborative filtering, and
Naive Bayes, etc.
● Spark provides ease of use APIs in Java, Scala, or
Python, and interactive Scala and Python shell
● Spark is Apache project, and it's the most active big data
open source project nowadays.

Weitere ähnliche Inhalte

Was ist angesagt?

Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...Tomoki Koriyama
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsYoung-Geun Choi
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataJay (Jianqiang) Wang
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control systemAlireza Mirzaei
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Phase Retrieval: Motivation and Techniques
Phase Retrieval: Motivation and TechniquesPhase Retrieval: Motivation and Techniques
Phase Retrieval: Motivation and TechniquesVaibhav Dixit
 
Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Shiang-Yun Yang
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAdelina Ahadova
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embeddingKhang Pham
 
Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Shiang-Yun Yang
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...sleepy_yoshi
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 

Was ist angesagt? (20)

Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control system
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Pytorch meetup
Pytorch meetupPytorch meetup
Pytorch meetup
 
Phase Retrieval: Motivation and Techniques
Phase Retrieval: Motivation and TechniquesPhase Retrieval: Motivation and Techniques
Phase Retrieval: Motivation and Techniques
 
Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)Aaex5 group2(中英夾雜)
Aaex5 group2(中英夾雜)
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main Concepts
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 

Ähnlich wie Alpine Spark Implementation - Technical

Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxCSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxmydrynan
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaAlgorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaPriyanka Rana
 
Basic concept of MATLAB.ppt
Basic concept of MATLAB.pptBasic concept of MATLAB.ppt
Basic concept of MATLAB.pptaliraza2732
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2Naughty Dog
 
Dsp manual completed2
Dsp manual completed2Dsp manual completed2
Dsp manual completed2bilawalali74
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)Yu Liu
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
 
Control system Lab record
Control system Lab record Control system Lab record
Control system Lab record Yuvraj Singh
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e práticaPET Computação
 

Ähnlich wie Alpine Spark Implementation - Technical (20)

Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxCSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
 
Functional go
Functional goFunctional go
Functional go
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaAlgorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
 
Java 8
Java 8Java 8
Java 8
 
Basic concept of MATLAB.ppt
Basic concept of MATLAB.pptBasic concept of MATLAB.ppt
Basic concept of MATLAB.ppt
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Matlab1
Matlab1Matlab1
Matlab1
 
Dafunctor
DafunctorDafunctor
Dafunctor
 
MapReduce
MapReduceMapReduce
MapReduce
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Dsp manual completed2
Dsp manual completed2Dsp manual completed2
Dsp manual completed2
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
 
Control system Lab record
Control system Lab record Control system Lab record
Control system Lab record
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e prática
 

Mehr von alpinedatalabs

Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute OperatorIntegrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operatoralpinedatalabs
 
Alpine innovation final v1.0
Alpine innovation final v1.0Alpine innovation final v1.0
Alpine innovation final v1.0alpinedatalabs
 
Predictive analytics from a to z
Predictive analytics from a to zPredictive analytics from a to z
Predictive analytics from a to zalpinedatalabs
 
Don't Gamble With Your Data
Don't Gamble With Your DataDon't Gamble With Your Data
Don't Gamble With Your Dataalpinedatalabs
 
Steven Hillion Presents, "Why Women are Better Data Scientists."
Steven Hillion Presents, "Why Women are Better Data Scientists."Steven Hillion Presents, "Why Women are Better Data Scientists."
Steven Hillion Presents, "Why Women are Better Data Scientists."alpinedatalabs
 
Strata Big Data Camp 2013
Strata Big Data Camp 2013Strata Big Data Camp 2013
Strata Big Data Camp 2013alpinedatalabs
 

Mehr von alpinedatalabs (6)

Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute OperatorIntegrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
 
Alpine innovation final v1.0
Alpine innovation final v1.0Alpine innovation final v1.0
Alpine innovation final v1.0
 
Predictive analytics from a to z
Predictive analytics from a to zPredictive analytics from a to z
Predictive analytics from a to z
 
Don't Gamble With Your Data
Don't Gamble With Your DataDon't Gamble With Your Data
Don't Gamble With Your Data
 
Steven Hillion Presents, "Why Women are Better Data Scientists."
Steven Hillion Presents, "Why Women are Better Data Scientists."Steven Hillion Presents, "Why Women are Better Data Scientists."
Steven Hillion Presents, "Why Women are Better Data Scientists."
 
Strata Big Data Camp 2013
Strata Big Data Camp 2013Strata Big Data Camp 2013
Strata Big Data Camp 2013
 

Kürzlich hochgeladen

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Alpine Spark Implementation - Technical

  • 1. Multinomial Logistic Regression with Apache Spark DB Tsai Machine Learning Engineer May 1st , 2014
  • 2. Machine Learning with Big Data ● Hadoop MapReduce solutions ● MapReduce is scaling well for batch processing ● However, lots of machine learning algorithms are iterative by nature. ● There are lots of tricks people do, like training with subsamples of data, and then average the models. Why big data with approximation? + =
  • 3. Lightning-fast cluster computing ● Empower users to iterate through the data by utilizing the in-memory cache. ● Logistic regression runs up to 100x faster than Hadoop M/R in memory. ● We're able to train exact model without doing any approximation.
  • 4. Binary Logistic Regression d = ax1+bx2+cx0 √a2 +b2 where x0=1 P( y=1∣⃗x , ⃗w)= exp(d ) 1+exp(d ) = exp(⃗x ⃗w) 1+exp(⃗x ⃗w) P ( y=0∣⃗x , ⃗w)= 1 1+exp(⃗x ⃗w) log P ( y=1∣⃗x , ⃗w) P( y=0∣⃗x , ⃗w) =⃗x ⃗w w0= c √a2 +b2 w1= a √a2 +b2 w2= b √a2 +b2 wherew0 iscalled as intercept
  • 5. Training Binary Logistic Regression ● Maximum Likelihood estimation From a training data and labels ● We want to find that maximizes the likelihood of data defined by ● We can take log of the equation, and minimize it instead. The Log-Likelihood becomes the loss function. X =( ⃗x1 , ⃗x2 , ⃗x3 ,...) Y =( y1, y2, y3, ...) ⃗w L( ⃗w , ⃗x1, ... , ⃗xN )=P ( y1∣⃗x1 , ⃗w)P ( y2∣⃗x2 , ⃗w)... P( yN∣ ⃗xN , ⃗w) l( ⃗w ,⃗x)=log P(y1∣⃗x1 , ⃗w)+log P( y2∣⃗x2 , ⃗w)...+log P( yN∣ ⃗xN , ⃗w)
  • 6. Optimization ● First Order Minimizer Require loss, gradient of loss function – Gradient Decent is step size – Limited-memory BFGS (L-BFGS) – Orthant-Wise Limited-memory Quasi-Newton (OWLQN) – Coordinate Descent (CD) – Trust Region Newton Method (TRON) ● Second Order Minimizer Require loss, gradient and hessian of loss function – Newton-Raphson, quadratic convergence. Fast! ● Ref: Journal of Machine Learning Research 11 (2010) 3183- 3234, Chih-Jen Lin et al. ⃗wn+1=⃗wn−γ ⃗G , γ ⃗wn+1=⃗wn−H−1 ⃗G
  • 7. Problem of Second Order Minimizer ● Scale horizontally (the numbers of training data) by leveraging Spark to parallelize this iterative optimization process. ● Don't scale vertically (the numbers of training features). Dimension of Hessian is ● Recent applications from document classification and computational linguistics are of this type. dim(H )=[(k−1)(n+1)] 2 where k is numof class ,nis num of features
  • 8. L-BFGS ● It's a quasi-Newton method. ● Hessian matrix of second derivatives doesn't need to be evaluated directly. ● Hessian matrix is approximated using gradient evaluations. ● It converges a way faster than the default optimizer in Spark, Gradient Decent. ● We love open source! Alpine Data Labs contributed our L-BFGS to Spark, and it's already merged in Spark-1157.
  • 9. Training Binary Logistic Regression l( ⃗w ,⃗x) = ∑ k=1 N log P( yk∣⃗xk , ⃗w) = ∑k=1 N yk log P ( yk=1∣⃗xk , ⃗w)+(1−yk )log P ( yk =0∣⃗xk , ⃗w) = ∑k=1 N yk log exp( ⃗xk ⃗w) 1+exp( ⃗xk ⃗w) +(1−yk )log 1 1+exp( ⃗xk ⃗w) = ∑k=1 N yk ⃗xk ⃗w−log(1+exp( ⃗xk ⃗w)) Gradient : Gi (⃗w ,⃗x)= ∂l( ⃗w ,⃗x) ∂ wi =∑k=1 N yk xki− exp( ⃗xk ⃗w) 1+exp( ⃗xk ⃗w) xki Hessian: H ij (⃗w ,⃗x)= ∂∂l( ⃗w ,⃗x) ∂wi ∂w j =−∑k=1 N exp( ⃗xk ⃗w) (1+exp( ⃗xk ⃗w))2 xki xkj
  • 10. Overfitting P ( y=1∣⃗x , ⃗w)= exp(zd ) 1+exp(zd ) = exp(⃗x ⃗w) 1+exp(⃗x ⃗w)
  • 11. Regularization ● The loss function becomes ● The loss function of regularizer doesn't depend on data. Common regularizers are – L2 Regularization: – L1 Regularization: ● L1 norm is not differentiable at zero! ltotal (⃗w ,⃗x)=lmodel (⃗w ,⃗x)+lreg( ⃗w) lreg (⃗w)=λ∑i=1 N wi 2 lreg (⃗w)=λ∑i=1 N ∣wi∣ ⃗G (⃗w ,⃗x)total=⃗G(⃗w ,⃗x)model+⃗G( ⃗w)reg ̄H ( ⃗w ,⃗x)total= ̄H (⃗w ,⃗x)model+ ̄H (⃗w)reg
  • 12. Mini School of Spark APIs ● map(func) : Return a new distributed dataset formed by passing each element of the source through a function func. ● reduce(func) : Aggregate the elements of the dataset using a function func (which takes two arguments and returns one). The function should be commutative and associative so that it can be computed correctly in parallel. No “key” thing here compared with Hadoop M/R.
  • 13. Example – No More “Word Count”! Let's compute the mean of numbers. 3.0 2.0 1.0 5.0 Executor 1 Executor 2 map map (1.0, 1) (5.0, 1) (3.0, 1) (2.0, 1) reduce (5.0, 2) reduce (11.0, 4) Shuffle
  • 14. ● The previous example using map(func) will create new tuple object, which may cause Garbage Collection issue. ● Like the combiner in Hadoop Map Reduce, for each tuple emitted from map(func), there is no guarantee that they will be combined locally in the same executor by reduce(func). It may increase the traffic in shuffle phase. ● In Hadoop, we address this by in-mapper combiner or aggregating the result in global variables which have scope to entire partition. The same approach can be used in Spark.
  • 15. Mini School of Spark APIs ● mapPartitions(func) : Similar to map, but runs separately on each partition (block) of the RDD, so func must be of type Iterator[T] => Iterator[U] when running on an RDD of type T. ● This API allows us to have global variables on entire partition to aggregate the result locally and efficiently.
  • 16. Better Mean of Numbers Implementation 3.0 2.0 Executor 1 reduce (11.0, 4) Shuffle var counts = 0 var sum = 0.0 loop var counts = 2 var sum = 5.0 3.0 2.0 (5.0, 2) 1.0 5.0 Executor 2 var counts = 0 var sum = 0.0 loop var counts = 2 var sum = 6.0 1.0 5.0 (6.0, 2) mapPartitions
  • 17. More Idiomatic Scala Implementation ● aggregate(zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): U zeroValue is a neutral "zero value" with type U for initialization. This is analogous to in previous example. var counts = 0 var sum = 0.0 seqOp is function taking (U, T) => U, where U is aggregator initialized as zeroValue. T is each line of values in RDD. This is analogous to mapPartition in previous example. combOp is function taking (U, U) => U. This is essentially combing the results between different executors. The functionality is the same as reduce(func) in previous example.
  • 18. Approach of Parallelization in Spark Spark Driver JVM Time Spark Executor 1 JVM Spark Executor 2 JVM 1) Find available resources in cluster, and launch executor JVMs. Also initialize the weights. 2) Ask executors to load the data into executors' JVMs 2) Trying to load the data into memory. If the data is bigger than memory, it will be partial cached. (The data locality from source will be taken care by Spark) 3) Ask executors to compute loss, and gradient of each training sample (each row) given the current weights. Get the aggregated results after the Reduce Phase in executors. 5) If the regularization is enabled, compute the loss and gradient of regularizer in driver since it doesn't depend on training data but only depends on weights. Add them into the results from executors. 3) Map Phase: Compute the loss and gradient of each row of training data locally given the weights obtained from the driver. Can either emit each result or sum them up in local aggregators. 4) Reduce Phase: Sum up the losses and gradients emitted from the Map Phase Taking a rest! 6) Plug the loss and gradient from model and regularizer into optimizer to get the new weights. If the differences of weights and losses are larger than criteria, GO BACK TO 3) Taking a rest! 7) Finish the model training! Taking a rest!
  • 19. Step 3) and 4) ● This is the implementation of step 3) and 4) in MLlib before Spark 1.0 ● gradient can have implementations of Logistic Regression, Linear Regression, SVM, or any customized cost function. ● Each training data will create new “grad” object after gradient.compute.
  • 20. Step 3) and 4) with mapPartitions
  • 21. Step 3) and 4) with aggregate ● This is the implementation of step 3) and 4) in MLlib in coming Spark 1.0 ● No unnecessary object creation! It's helpful when we're dealing with large features training data. GC will not kick in the executor JVMs.
  • 22. Extension to Multinomial Logistic Regression ● In Binary Logistic Regression ● For K classes multinomial problem where labels ranged from [0, K-1], we can generalize it via ● The model, weights becomes (K-1)(N+1) matrix, where N is number of features. log P( y=1∣⃗x , ̄w) P ( y=0∣⃗x , ̄w) =⃗x ⃗w1 log P ( y=2∣⃗x , ̄w) P ( y=0∣⃗x , ̄w) =⃗x ⃗w2 ... log P ( y=K −1∣⃗x , ̄w) P ( y=0∣⃗x , ̄w) =⃗x ⃗wK −1 log P( y=1∣⃗x , ⃗w) P ( y=0∣⃗x , ⃗w) =⃗x ⃗w ̄w=(⃗w1, ⃗w2, ... , ⃗wK −1)T P( y=0∣⃗x , ̄w)= 1 1+∑ i=1 K −1 exp(⃗x ⃗wi ) P( y=1∣⃗x , ̄w)= exp(⃗x ⃗w2) 1+∑ i=1 K −1 exp(⃗x ⃗wi) ... P( y=K −1∣⃗x , ̄w)= exp(⃗x ⃗wK −1) 1+∑i=1 K −1 exp(⃗x ⃗wi )
  • 23. Training Multinomial Logistic Regression l( ̄w ,⃗x) = ∑ k=1 N log P( yk∣⃗xk , ̄w) = ∑k=1 N α( yk)log P( y=0∣⃗xk , ̄w)+(1−α( yk ))log P ( yk∣⃗xk , ̄w) = ∑k=1 N α( yk)log 1 1+∑ i=1 K −1 exp(⃗x ⃗wi) +(1−α( yk ))log exp(⃗x ⃗w yk ) 1+∑ i=1 K−1 exp(⃗x ⃗wi) = ∑ k=1 N (1−α( yk ))⃗x ⃗wyk −log(1+∑ i=1 K−1 exp(⃗x ⃗wi)) Gradient : Gij ( ̄w ,⃗x)= ∂l( ̄w ,⃗x) ∂wij =∑k=1 N (1−α( yk )) xkj δi , yk − exp( ⃗xk ⃗w) 1+exp( ⃗xk ⃗w) xkj α( yk )=1 if yk=0 α( yk )=0 if yk≠0 Define: Note that the first index “i” is for classes, and the second index “j” is for features. Hessian: H ij ,lm( ̄w ,⃗x)= ∂∂l ( ̄w ,⃗x) ∂wij ∂wlm
  • 24. 0 5 10 15 20 25 30 35 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster L-BFGS Dense Features L-BFGS Sparse Features GD Sparse Features GD Dense Features Seconds Log-Likelihood/NumberofSamples a9a Dataset Benchmark
  • 25. a9a Dataset Benchmark -1 1 3 5 7 9 11 13 15 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster L-BFGS GD Iterations Log-Likelihood/NumberofSamples
  • 26. 0 5 10 15 20 25 30 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Logistic Regression with rcv1 Dataset (6.8M rows, 677,399 features, 0.15% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster LBFGS Sparse Vector GD Sparse Vector Second Log-Likelihood/NumberofSamples rcv1 Dataset Benchmark
  • 27. news20 Dataset Benchmark 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 1.2 Logistic Regression with news20 Dataset (0.14M rows, 1,355,191 features, 0.034% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster LBFGS Sparse Vector GD Sparse Vector Second Log-Likelihood/NumberofSamples
  • 31. Conclusion ● Alpine supports state of the art Cloudera CDH5 ● Spark runs programs 100x faster than Hadoop ● Spark turns iterative big data machine learning problems into single machine problems ● MLlib provides lots of state of art machine learning implementations, e.g. K-means, linear regression, logistic regression, ALS, collaborative filtering, and Naive Bayes, etc. ● Spark provides ease of use APIs in Java, Scala, or Python, and interactive Scala and Python shell ● Spark is Apache project, and it's the most active big data open source project nowadays.