SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Multinomial Logistic Regression
with Apache Spark
DB Tsai
Machine Learning Engineer
May 1st
, 2014
Machine Learning with Big Data
● Hadoop MapReduce solutions
● MapReduce is scaling well for batch processing
● However, lots of machine learning algorithms
are iterative by nature.
● There are lots of tricks people do, like training
with subsamples of data, and then average the
models. Why big data with approximation?
+ =
Lightning-fast cluster computing
● Empower users to iterate through the data by
utilizing the in-memory cache.
● Logistic regression runs
up to 100x faster than
Hadoop M/R in memory.
● We're able to train
exact model without doing any approximation.
Binary Logistic Regression
d =
ax1+bx2+cx0
√a2
+b2
where x0=1
P( y=1∣⃗x , ⃗w)=
exp(d )
1+exp(d )
=
exp(⃗x ⃗w)
1+exp(⃗x ⃗w)
P ( y=0∣⃗x , ⃗w)=
1
1+exp(⃗x ⃗w)
log
P ( y=1∣⃗x , ⃗w)
P( y=0∣⃗x , ⃗w)
=⃗x ⃗w
w0=
c
√a2
+b2
w1=
a
√a2
+b2
w2=
b
√a2
+b2
wherew0 iscalled as intercept
Training Binary Logistic Regression
● Maximum Likelihood estimation
From a training data
and labels
● We want to find that maximizes the
likelihood of data defined by
● We can take log of the equation, and minimize
it instead. The Log-Likelihood becomes the
loss function.
X =( ⃗x1 , ⃗x2 , ⃗x3 ,...)
Y =( y1, y2, y3, ...)
⃗w
L( ⃗w , ⃗x1, ... , ⃗xN )=P ( y1∣⃗x1 , ⃗w)P ( y2∣⃗x2 , ⃗w)... P( yN∣ ⃗xN , ⃗w)
l( ⃗w ,⃗x)=log P(y1∣⃗x1 , ⃗w)+log P( y2∣⃗x2 , ⃗w)...+log P( yN∣ ⃗xN , ⃗w)
Optimization
● First Order Minimizer
Require loss, gradient of loss function
– Gradient Decent is step size
– Limited-memory BFGS (L-BFGS)
– Orthant-Wise Limited-memory Quasi-Newton
(OWLQN)
– Coordinate Descent (CD)
– Trust Region Newton Method (TRON)
● Second Order Minimizer
Require loss, gradient and hessian of loss function
– Newton-Raphson, quadratic convergence. Fast!
● Ref: Journal of Machine Learning Research 11 (2010) 3183-
3234, Chih-Jen Lin et al.
⃗wn+1=⃗wn−γ ⃗G , γ
⃗wn+1=⃗wn−H−1 ⃗G
Problem of Second Order Minimizer
● Scale horizontally (the numbers of training
data) by leveraging Spark to parallelize this
iterative optimization process.
● Don't scale vertically (the numbers of training
features). Dimension of Hessian is
● Recent applications from document
classification and computational linguistics are
of this type.
dim(H )=[(k−1)(n+1)]
2
where k is numof class ,nis num of features
L-BFGS
● It's a quasi-Newton method.
● Hessian matrix of second derivatives doesn't
need to be evaluated directly.
● Hessian matrix is approximated using gradient
evaluations.
● It converges a way faster than the default
optimizer in Spark, Gradient Decent.
● We love open source! Alpine Data Labs
contributed our L-BFGS to Spark, and it's
already merged in Spark-1157.
Training Binary Logistic Regression
l( ⃗w ,⃗x) = ∑
k=1
N
log P( yk∣⃗xk , ⃗w)
= ∑k=1
N
yk log P ( yk=1∣⃗xk , ⃗w)+(1−yk )log P ( yk =0∣⃗xk , ⃗w)
= ∑k=1
N
yk log
exp( ⃗xk ⃗w)
1+exp( ⃗xk ⃗w)
+(1−yk )log
1
1+exp( ⃗xk ⃗w)
= ∑k=1
N
yk ⃗xk ⃗w−log(1+exp( ⃗xk ⃗w))
Gradient : Gi (⃗w ,⃗x)=
∂l( ⃗w ,⃗x)
∂ wi
=∑k=1
N
yk xki−
exp( ⃗xk ⃗w)
1+exp( ⃗xk ⃗w)
xki
Hessian: H ij (⃗w ,⃗x)=
∂∂l( ⃗w ,⃗x)
∂wi ∂w j
=−∑k=1
N
exp( ⃗xk ⃗w)
(1+exp( ⃗xk ⃗w))2
xki xkj
Overfitting
P ( y=1∣⃗x , ⃗w)=
exp(zd )
1+exp(zd )
=
exp(⃗x ⃗w)
1+exp(⃗x ⃗w)
Regularization
● The loss function becomes
● The loss function of regularizer doesn't
depend on data. Common regularizers are
– L2 Regularization:
– L1 Regularization:
● L1 norm is not differentiable at zero!
ltotal (⃗w ,⃗x)=lmodel (⃗w ,⃗x)+lreg( ⃗w)
lreg (⃗w)=λ∑i=1
N
wi
2
lreg (⃗w)=λ∑i=1
N
∣wi∣
⃗G (⃗w ,⃗x)total=⃗G(⃗w ,⃗x)model+⃗G( ⃗w)reg
̄H ( ⃗w ,⃗x)total= ̄H (⃗w ,⃗x)model+ ̄H (⃗w)reg
Mini School of Spark APIs
● map(func) : Return a new distributed dataset
formed by passing each element of the source
through a function func.
● reduce(func) : Aggregate the elements of the
dataset using a function func (which takes two
arguments and returns one). The function
should be commutative and associative so
that it can be computed correctly in parallel.
No “key” thing here compared with Hadoop
M/R.
Example – No More “Word Count”!
Let's compute the mean of numbers.
3.0
2.0
1.0
5.0
Executor 1
Executor 2
map
map
(1.0, 1)
(5.0, 1)
(3.0, 1)
(2.0, 1)
reduce
(5.0, 2)
reduce (11.0, 4)
Shuffle
● The previous example using map(func) will
create new tuple object, which may cause
Garbage Collection issue.
● Like the combiner in Hadoop Map Reduce,
for each tuple emitted from map(func), there is
no guarantee that they will be combined
locally in the same executor by reduce(func).
It may increase the traffic in shuffle phase.
● In Hadoop, we address this by in-mapper
combiner or aggregating the result in global
variables which have scope to entire partition.
The same approach can be used in Spark.
Mini School of Spark APIs
● mapPartitions(func) : Similar to map, but runs
separately on each partition (block) of the
RDD, so func must be of type Iterator[T] =>
Iterator[U] when running on an RDD of type T.
● This API allows us to have global variables on
entire partition to aggregate the result locally
and efficiently.
Better Mean
of Numbers
Implementation
3.0
2.0
Executor 1
reduce (11.0, 4)
Shuffle
var counts = 0
var sum = 0.0
loop
var counts = 2
var sum = 5.0
3.0
2.0
(5.0, 2)
1.0
5.0
Executor 2
var counts = 0
var sum = 0.0
loop
var counts = 2
var sum = 6.0
1.0
5.0
(6.0, 2)
mapPartitions
More Idiomatic Scala Implementation
● aggregate(zeroValue: U)(seqOp: (U, T) => U,
combOp: (U, U) => U): U zeroValue is a neutral "zero value"
with type U for initialization. This is
analogous to
in previous example.
var counts = 0
var sum = 0.0
seqOp is function taking
(U, T) => U, where U is aggregator
initialized as zeroValue. T is each
line of values in RDD. This is
analogous to mapPartition in
previous example.
combOp is function taking
(U, U) => U. This is essentially
combing the results between
different executors. The
functionality is the same as
reduce(func) in previous example.
Approach of Parallelization in Spark
Spark Driver
JVM
Time
Spark Executor 1
JVM
Spark Executor 2
JVM
1) Find available resources in
cluster, and launch executor JVMs.
Also initialize the weights.
2) Ask executors to load the data
into executors' JVMs
2) Trying to load the data into memory. If the data is bigger
than memory, it will be partial cached.
(The data locality from source will be taken care by Spark)
3) Ask executors to compute loss, and
gradient of each training sample
(each row) given the current weights.
Get the aggregated results after
the Reduce Phase in executors.
5) If the regularization is enabled,
compute the loss and gradient of
regularizer in driver since it doesn't
depend on training data but only
depends on weights. Add them into
the results from executors.
3) Map Phase:
Compute the loss and gradient of each row of training data
locally given the weights obtained from the driver. Can either
emit each result or sum them up in local aggregators.
4) Reduce Phase:
Sum up the losses and gradients emitted from the Map Phase
Taking a rest!
6) Plug the loss and gradient from
model and regularizer into optimizer
to get the new weights. If the
differences of weights and losses
are larger than criteria,
GO BACK TO 3)
Taking a rest!
7) Finish the model training! Taking a rest!
Step 3) and 4)
● This is the implementation of step 3) and 4) in
MLlib before Spark 1.0
● gradient can have implementations of
Logistic Regression, Linear Regression, SVM,
or any customized cost function.
● Each training data will create new “grad”
object after gradient.compute.
Step 3) and 4) with mapPartitions
Step 3) and 4) with aggregate
● This is the implementation of step 3) and 4) in
MLlib in coming Spark 1.0
● No unnecessary object creation! It's helpful
when we're dealing with large features training
data. GC will not kick in the executor JVMs.
Extension to Multinomial Logistic Regression
● In Binary Logistic Regression
● For K classes multinomial problem where labels
ranged from [0, K-1], we can generalize it via
● The model, weights becomes
(K-1)(N+1) matrix, where N is number of features.
log
P( y=1∣⃗x , ̄w)
P ( y=0∣⃗x , ̄w)
=⃗x ⃗w1
log
P ( y=2∣⃗x , ̄w)
P ( y=0∣⃗x , ̄w)
=⃗x ⃗w2
...
log
P ( y=K −1∣⃗x , ̄w)
P ( y=0∣⃗x , ̄w)
=⃗x ⃗wK −1
log
P( y=1∣⃗x , ⃗w)
P ( y=0∣⃗x , ⃗w)
=⃗x ⃗w
̄w=(⃗w1, ⃗w2, ... , ⃗wK −1)T
P( y=0∣⃗x , ̄w)=
1
1+∑
i=1
K −1
exp(⃗x ⃗wi )
P( y=1∣⃗x , ̄w)=
exp(⃗x ⃗w2)
1+∑
i=1
K −1
exp(⃗x ⃗wi)
...
P( y=K −1∣⃗x , ̄w)=
exp(⃗x ⃗wK −1)
1+∑i=1
K −1
exp(⃗x ⃗wi )
Training Multinomial Logistic Regression
l( ̄w ,⃗x) = ∑
k=1
N
log P( yk∣⃗xk , ̄w)
= ∑k=1
N
α( yk)log P( y=0∣⃗xk , ̄w)+(1−α( yk ))log P ( yk∣⃗xk , ̄w)
= ∑k=1
N
α( yk)log
1
1+∑
i=1
K −1
exp(⃗x ⃗wi)
+(1−α( yk ))log
exp(⃗x ⃗w yk
)
1+∑
i=1
K−1
exp(⃗x ⃗wi)
= ∑
k=1
N
(1−α( yk ))⃗x ⃗wyk
−log(1+∑
i=1
K−1
exp(⃗x ⃗wi))
Gradient : Gij ( ̄w ,⃗x)=
∂l( ̄w ,⃗x)
∂wij
=∑k=1
N
(1−α( yk )) xkj δi , yk
−
exp( ⃗xk ⃗w)
1+exp( ⃗xk ⃗w)
xkj
α( yk )=1 if yk=0
α( yk )=0 if yk≠0
Define:
Note that the first index “i” is for classes, and the second index “j” is for features.
Hessian: H ij ,lm( ̄w ,⃗x)=
∂∂l ( ̄w ,⃗x)
∂wij ∂wlm
0 5 10 15 20 25 30 35
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
L-BFGS Dense Features
L-BFGS Sparse Features
GD Sparse Features
GD Dense Features
Seconds
Log-Likelihood/NumberofSamples
a9a Dataset Benchmark
a9a Dataset Benchmark
-1 1 3 5 7 9 11 13 15
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
L-BFGS
GD
Iterations
Log-Likelihood/NumberofSamples
0 5 10 15 20 25 30
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Logistic Regression with rcv1 Dataset (6.8M rows, 677,399 features, 0.15% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
LBFGS Sparse Vector
GD Sparse Vector
Second
Log-Likelihood/NumberofSamples
rcv1 Dataset Benchmark
news20 Dataset Benchmark
0 10 20 30 40 50 60 70 80
0
0.2
0.4
0.6
0.8
1
1.2
Logistic Regression with news20 Dataset (0.14M rows, 1,355,191 features, 0.034% non-zero elements)
16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster
LBFGS Sparse Vector
GD Sparse Vector
Second
Log-Likelihood/NumberofSamples
Alpine Demo
Alpine Demo
Alpine Demo
Conclusion
● Alpine supports state of the art Cloudera CDH5
● Spark runs programs 100x faster than Hadoop
● Spark turns iterative big data machine learning problems
into single machine problems
● MLlib provides lots of state of art machine learning
implementations, e.g. K-means, linear regression,
logistic regression, ALS, collaborative filtering, and
Naive Bayes, etc.
● Spark provides ease of use APIs in Java, Scala, or
Python, and interactive Scala and Python shell
● Spark is Apache project, and it's the most active big data
open source project nowadays.

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsNatasha Mandal
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureThanakrit Lersmethasakul
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big DataRobert Keahey
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
DPLYR package in R
DPLYR package in RDPLYR package in R
DPLYR package in RBimba Pawar
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
Business intelligence concepts & application
Business intelligence concepts & applicationBusiness intelligence concepts & application
Business intelligence concepts & applicationnandini patil
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkSlava Kokaev
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 

Was ist angesagt? (20)

Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial Operations
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference Architecture
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
DPLYR package in R
DPLYR package in RDPLYR package in R
DPLYR package in R
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Business intelligence concepts & application
Business intelligence concepts & applicationBusiness intelligence concepts & application
Business intelligence concepts & application
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Information system architecture
Information system architectureInformation system architecture
Information system architecture
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Kdd process
Kdd processKdd process
Kdd process
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data discretization
Data discretizationData discretization
Data discretization
 

Andere mochten auch

Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSSDr Athar Khan
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseLogistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseMichael Lieberman
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsAnirudha si
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Third party logistics
Third party logisticsThird party logistics
Third party logisticsKuldeep Uttam
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Reverse Logistics
Reverse  LogisticsReverse  Logistics
Reverse Logisticssanket_123
 

Andere mochten auch (14)

Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSS
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseLogistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistics management
Logistics managementLogistics management
Logistics management
 
Third party logistics
Third party logisticsThird party logistics
Third party logistics
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Reverse Logistics
Reverse  LogisticsReverse  Logistics
Reverse Logistics
 
Logistic management
Logistic managementLogistic management
Logistic management
 

Ähnlich wie Multinomial Logistic Regression with Apache Spark

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxCSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxmydrynan
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaAlgorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaPriyanka Rana
 
Basic concept of MATLAB.ppt
Basic concept of MATLAB.pptBasic concept of MATLAB.ppt
Basic concept of MATLAB.pptaliraza2732
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2Naughty Dog
 
Dsp manual completed2
Dsp manual completed2Dsp manual completed2
Dsp manual completed2bilawalali74
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)Yu Liu
 

Ähnlich wie Multinomial Logistic Regression with Apache Spark (20)

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxCSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
 
Functional go
Functional goFunctional go
Functional go
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaAlgorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
 
Java 8
Java 8Java 8
Java 8
 
Basic concept of MATLAB.ppt
Basic concept of MATLAB.pptBasic concept of MATLAB.ppt
Basic concept of MATLAB.ppt
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2
 
parallel
parallelparallel
parallel
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Matlab1
Matlab1Matlab1
Matlab1
 
Dafunctor
DafunctorDafunctor
Dafunctor
 
MapReduce
MapReduceMapReduce
MapReduce
 
Pytorch meetup
Pytorch meetupPytorch meetup
Pytorch meetup
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Dsp manual completed2
Dsp manual completed2Dsp manual completed2
Dsp manual completed2
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 

Mehr von DB Tsai

2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...DB Tsai
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 
2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to SparkDB Tsai
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
Unsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkUnsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkDB Tsai
 

Mehr von DB Tsai (7)

2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
Unsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkUnsupervised Learning with Apache Spark
Unsupervised Learning with Apache Spark
 

Kürzlich hochgeladen

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 

Kürzlich hochgeladen (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 

Multinomial Logistic Regression with Apache Spark

  • 1. Multinomial Logistic Regression with Apache Spark DB Tsai Machine Learning Engineer May 1st , 2014
  • 2. Machine Learning with Big Data ● Hadoop MapReduce solutions ● MapReduce is scaling well for batch processing ● However, lots of machine learning algorithms are iterative by nature. ● There are lots of tricks people do, like training with subsamples of data, and then average the models. Why big data with approximation? + =
  • 3. Lightning-fast cluster computing ● Empower users to iterate through the data by utilizing the in-memory cache. ● Logistic regression runs up to 100x faster than Hadoop M/R in memory. ● We're able to train exact model without doing any approximation.
  • 4. Binary Logistic Regression d = ax1+bx2+cx0 √a2 +b2 where x0=1 P( y=1∣⃗x , ⃗w)= exp(d ) 1+exp(d ) = exp(⃗x ⃗w) 1+exp(⃗x ⃗w) P ( y=0∣⃗x , ⃗w)= 1 1+exp(⃗x ⃗w) log P ( y=1∣⃗x , ⃗w) P( y=0∣⃗x , ⃗w) =⃗x ⃗w w0= c √a2 +b2 w1= a √a2 +b2 w2= b √a2 +b2 wherew0 iscalled as intercept
  • 5. Training Binary Logistic Regression ● Maximum Likelihood estimation From a training data and labels ● We want to find that maximizes the likelihood of data defined by ● We can take log of the equation, and minimize it instead. The Log-Likelihood becomes the loss function. X =( ⃗x1 , ⃗x2 , ⃗x3 ,...) Y =( y1, y2, y3, ...) ⃗w L( ⃗w , ⃗x1, ... , ⃗xN )=P ( y1∣⃗x1 , ⃗w)P ( y2∣⃗x2 , ⃗w)... P( yN∣ ⃗xN , ⃗w) l( ⃗w ,⃗x)=log P(y1∣⃗x1 , ⃗w)+log P( y2∣⃗x2 , ⃗w)...+log P( yN∣ ⃗xN , ⃗w)
  • 6. Optimization ● First Order Minimizer Require loss, gradient of loss function – Gradient Decent is step size – Limited-memory BFGS (L-BFGS) – Orthant-Wise Limited-memory Quasi-Newton (OWLQN) – Coordinate Descent (CD) – Trust Region Newton Method (TRON) ● Second Order Minimizer Require loss, gradient and hessian of loss function – Newton-Raphson, quadratic convergence. Fast! ● Ref: Journal of Machine Learning Research 11 (2010) 3183- 3234, Chih-Jen Lin et al. ⃗wn+1=⃗wn−γ ⃗G , γ ⃗wn+1=⃗wn−H−1 ⃗G
  • 7. Problem of Second Order Minimizer ● Scale horizontally (the numbers of training data) by leveraging Spark to parallelize this iterative optimization process. ● Don't scale vertically (the numbers of training features). Dimension of Hessian is ● Recent applications from document classification and computational linguistics are of this type. dim(H )=[(k−1)(n+1)] 2 where k is numof class ,nis num of features
  • 8. L-BFGS ● It's a quasi-Newton method. ● Hessian matrix of second derivatives doesn't need to be evaluated directly. ● Hessian matrix is approximated using gradient evaluations. ● It converges a way faster than the default optimizer in Spark, Gradient Decent. ● We love open source! Alpine Data Labs contributed our L-BFGS to Spark, and it's already merged in Spark-1157.
  • 9. Training Binary Logistic Regression l( ⃗w ,⃗x) = ∑ k=1 N log P( yk∣⃗xk , ⃗w) = ∑k=1 N yk log P ( yk=1∣⃗xk , ⃗w)+(1−yk )log P ( yk =0∣⃗xk , ⃗w) = ∑k=1 N yk log exp( ⃗xk ⃗w) 1+exp( ⃗xk ⃗w) +(1−yk )log 1 1+exp( ⃗xk ⃗w) = ∑k=1 N yk ⃗xk ⃗w−log(1+exp( ⃗xk ⃗w)) Gradient : Gi (⃗w ,⃗x)= ∂l( ⃗w ,⃗x) ∂ wi =∑k=1 N yk xki− exp( ⃗xk ⃗w) 1+exp( ⃗xk ⃗w) xki Hessian: H ij (⃗w ,⃗x)= ∂∂l( ⃗w ,⃗x) ∂wi ∂w j =−∑k=1 N exp( ⃗xk ⃗w) (1+exp( ⃗xk ⃗w))2 xki xkj
  • 10. Overfitting P ( y=1∣⃗x , ⃗w)= exp(zd ) 1+exp(zd ) = exp(⃗x ⃗w) 1+exp(⃗x ⃗w)
  • 11. Regularization ● The loss function becomes ● The loss function of regularizer doesn't depend on data. Common regularizers are – L2 Regularization: – L1 Regularization: ● L1 norm is not differentiable at zero! ltotal (⃗w ,⃗x)=lmodel (⃗w ,⃗x)+lreg( ⃗w) lreg (⃗w)=λ∑i=1 N wi 2 lreg (⃗w)=λ∑i=1 N ∣wi∣ ⃗G (⃗w ,⃗x)total=⃗G(⃗w ,⃗x)model+⃗G( ⃗w)reg ̄H ( ⃗w ,⃗x)total= ̄H (⃗w ,⃗x)model+ ̄H (⃗w)reg
  • 12. Mini School of Spark APIs ● map(func) : Return a new distributed dataset formed by passing each element of the source through a function func. ● reduce(func) : Aggregate the elements of the dataset using a function func (which takes two arguments and returns one). The function should be commutative and associative so that it can be computed correctly in parallel. No “key” thing here compared with Hadoop M/R.
  • 13. Example – No More “Word Count”! Let's compute the mean of numbers. 3.0 2.0 1.0 5.0 Executor 1 Executor 2 map map (1.0, 1) (5.0, 1) (3.0, 1) (2.0, 1) reduce (5.0, 2) reduce (11.0, 4) Shuffle
  • 14. ● The previous example using map(func) will create new tuple object, which may cause Garbage Collection issue. ● Like the combiner in Hadoop Map Reduce, for each tuple emitted from map(func), there is no guarantee that they will be combined locally in the same executor by reduce(func). It may increase the traffic in shuffle phase. ● In Hadoop, we address this by in-mapper combiner or aggregating the result in global variables which have scope to entire partition. The same approach can be used in Spark.
  • 15. Mini School of Spark APIs ● mapPartitions(func) : Similar to map, but runs separately on each partition (block) of the RDD, so func must be of type Iterator[T] => Iterator[U] when running on an RDD of type T. ● This API allows us to have global variables on entire partition to aggregate the result locally and efficiently.
  • 16. Better Mean of Numbers Implementation 3.0 2.0 Executor 1 reduce (11.0, 4) Shuffle var counts = 0 var sum = 0.0 loop var counts = 2 var sum = 5.0 3.0 2.0 (5.0, 2) 1.0 5.0 Executor 2 var counts = 0 var sum = 0.0 loop var counts = 2 var sum = 6.0 1.0 5.0 (6.0, 2) mapPartitions
  • 17. More Idiomatic Scala Implementation ● aggregate(zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): U zeroValue is a neutral "zero value" with type U for initialization. This is analogous to in previous example. var counts = 0 var sum = 0.0 seqOp is function taking (U, T) => U, where U is aggregator initialized as zeroValue. T is each line of values in RDD. This is analogous to mapPartition in previous example. combOp is function taking (U, U) => U. This is essentially combing the results between different executors. The functionality is the same as reduce(func) in previous example.
  • 18. Approach of Parallelization in Spark Spark Driver JVM Time Spark Executor 1 JVM Spark Executor 2 JVM 1) Find available resources in cluster, and launch executor JVMs. Also initialize the weights. 2) Ask executors to load the data into executors' JVMs 2) Trying to load the data into memory. If the data is bigger than memory, it will be partial cached. (The data locality from source will be taken care by Spark) 3) Ask executors to compute loss, and gradient of each training sample (each row) given the current weights. Get the aggregated results after the Reduce Phase in executors. 5) If the regularization is enabled, compute the loss and gradient of regularizer in driver since it doesn't depend on training data but only depends on weights. Add them into the results from executors. 3) Map Phase: Compute the loss and gradient of each row of training data locally given the weights obtained from the driver. Can either emit each result or sum them up in local aggregators. 4) Reduce Phase: Sum up the losses and gradients emitted from the Map Phase Taking a rest! 6) Plug the loss and gradient from model and regularizer into optimizer to get the new weights. If the differences of weights and losses are larger than criteria, GO BACK TO 3) Taking a rest! 7) Finish the model training! Taking a rest!
  • 19. Step 3) and 4) ● This is the implementation of step 3) and 4) in MLlib before Spark 1.0 ● gradient can have implementations of Logistic Regression, Linear Regression, SVM, or any customized cost function. ● Each training data will create new “grad” object after gradient.compute.
  • 20. Step 3) and 4) with mapPartitions
  • 21. Step 3) and 4) with aggregate ● This is the implementation of step 3) and 4) in MLlib in coming Spark 1.0 ● No unnecessary object creation! It's helpful when we're dealing with large features training data. GC will not kick in the executor JVMs.
  • 22. Extension to Multinomial Logistic Regression ● In Binary Logistic Regression ● For K classes multinomial problem where labels ranged from [0, K-1], we can generalize it via ● The model, weights becomes (K-1)(N+1) matrix, where N is number of features. log P( y=1∣⃗x , ̄w) P ( y=0∣⃗x , ̄w) =⃗x ⃗w1 log P ( y=2∣⃗x , ̄w) P ( y=0∣⃗x , ̄w) =⃗x ⃗w2 ... log P ( y=K −1∣⃗x , ̄w) P ( y=0∣⃗x , ̄w) =⃗x ⃗wK −1 log P( y=1∣⃗x , ⃗w) P ( y=0∣⃗x , ⃗w) =⃗x ⃗w ̄w=(⃗w1, ⃗w2, ... , ⃗wK −1)T P( y=0∣⃗x , ̄w)= 1 1+∑ i=1 K −1 exp(⃗x ⃗wi ) P( y=1∣⃗x , ̄w)= exp(⃗x ⃗w2) 1+∑ i=1 K −1 exp(⃗x ⃗wi) ... P( y=K −1∣⃗x , ̄w)= exp(⃗x ⃗wK −1) 1+∑i=1 K −1 exp(⃗x ⃗wi )
  • 23. Training Multinomial Logistic Regression l( ̄w ,⃗x) = ∑ k=1 N log P( yk∣⃗xk , ̄w) = ∑k=1 N α( yk)log P( y=0∣⃗xk , ̄w)+(1−α( yk ))log P ( yk∣⃗xk , ̄w) = ∑k=1 N α( yk)log 1 1+∑ i=1 K −1 exp(⃗x ⃗wi) +(1−α( yk ))log exp(⃗x ⃗w yk ) 1+∑ i=1 K−1 exp(⃗x ⃗wi) = ∑ k=1 N (1−α( yk ))⃗x ⃗wyk −log(1+∑ i=1 K−1 exp(⃗x ⃗wi)) Gradient : Gij ( ̄w ,⃗x)= ∂l( ̄w ,⃗x) ∂wij =∑k=1 N (1−α( yk )) xkj δi , yk − exp( ⃗xk ⃗w) 1+exp( ⃗xk ⃗w) xkj α( yk )=1 if yk=0 α( yk )=0 if yk≠0 Define: Note that the first index “i” is for classes, and the second index “j” is for features. Hessian: H ij ,lm( ̄w ,⃗x)= ∂∂l ( ̄w ,⃗x) ∂wij ∂wlm
  • 24. 0 5 10 15 20 25 30 35 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster L-BFGS Dense Features L-BFGS Sparse Features GD Sparse Features GD Dense Features Seconds Log-Likelihood/NumberofSamples a9a Dataset Benchmark
  • 25. a9a Dataset Benchmark -1 1 3 5 7 9 11 13 15 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Logistic Regression with a9a Dataset (11M rows, 123 features, 11% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster L-BFGS GD Iterations Log-Likelihood/NumberofSamples
  • 26. 0 5 10 15 20 25 30 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Logistic Regression with rcv1 Dataset (6.8M rows, 677,399 features, 0.15% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster LBFGS Sparse Vector GD Sparse Vector Second Log-Likelihood/NumberofSamples rcv1 Dataset Benchmark
  • 27. news20 Dataset Benchmark 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 1.2 Logistic Regression with news20 Dataset (0.14M rows, 1,355,191 features, 0.034% non-zero elements) 16 executors in INTEL Xeon E3-1230v3 32GB Memory * 5 nodes Hadoop 2.0.5 alpha cluster LBFGS Sparse Vector GD Sparse Vector Second Log-Likelihood/NumberofSamples
  • 31. Conclusion ● Alpine supports state of the art Cloudera CDH5 ● Spark runs programs 100x faster than Hadoop ● Spark turns iterative big data machine learning problems into single machine problems ● MLlib provides lots of state of art machine learning implementations, e.g. K-means, linear regression, logistic regression, ALS, collaborative filtering, and Naive Bayes, etc. ● Spark provides ease of use APIs in Java, Scala, or Python, and interactive Scala and Python shell ● Spark is Apache project, and it's the most active big data open source project nowadays.