SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Parallel Coordinate Descent Algorithms
A Quick Review
Chaitanya Prasad, S. Shaleen Kumar Gupta
Nanyang Technological Institute, Singapore
June 7, 2016
Outline
1 Introduction
2 Sequential Coordinate Descent
3 Naive Parallelization
4 Intuition behind Parallelization
5 Expected Separable Over-approximation (ESO)
6 Algorithm for Parallel Coordinate Decent
7 Limitations
Definition
Coordinate wise minimization of the objection function.
Objective function is of the form -
F(x) = f (x) + Ω(x) (1)
where
f(x) = partially separable function
Ω(x) = simple block separable function
Sequential Coordinate Descent (SCD)
Set x = 0 ∈ R2d
+ ;
while not converged do
Choose j ∈ {1, ..., 2d} uniformly at random;
Set δxj ← max{−xj , −( F(x))j /β};
Update xj ← xj + δxj ;
end
Algorithm 1: Shooting: Sequential Coordinate Descent
Approach to Naive Parallelization
Each iteration of SCD minimizes one single coordinate.
We can parallelize by updating multiple coordinates at each
iteration by different processors.
Why Naive Parallelization won’t work
1 Theoretically it is proven that ”one at a time” converges while
”all at once” update may not.
2 Depends on correlation among coordinates.
Intuition behind Parallelization
If ∆x is the collective update to x in one single iteration by a naive
parallel approach then
F(x + ∆x) − F(x) <
−1
2
ij ∈Pt
(δxij
)2
+
1
2
ij ,ik ∈Pt ,j=k
(AT
A)ij ,ik
δxij
δxik
(2)
where A = design matrix for L1 regularised loss function.
Therefore we need to design our step sizes in parallel updates
based on the interference amount.
Expected Separable Over-approximation (ESO)
1 Let the update rule be generally defined as
x ← x +
1
β
i∈ ˆS
hi
ei (3)
where h defines the update rule. Then
E[f (x + h[ ˆS])] ≤ f (x) +
E[ ˆS]
n
(( f (x))T
h) +
β
2
( hw )2
(4)
where h[ ˆS] = i∈ ˆS hi ei
( hw )2 = n
i=1 wi (hi )2
2 We overapproximate function by a quadratic and minimize
that in PCDM1 and PCDM2.
Shotgun: Parallel Coordinate Descent
Choose number of parallel updates P ≥ 1;
Set x = 0 ∈ R2d
+ ;
while not converged do
Choose random subset of P weights in {1,...,2d};
In parallel on P processors
Get assigned weight j;
Set δxj ← max{−xj , −( F(x))j /β};
Update xj ← xj + δxj ;
end
Algorithm 2: Shotgun: Parallel Coordinate Descent
Parallel Coordinate Descent Method 1 (PCDM 1)
Choose initial point x0 ∈ RN
for k = 0, 1, 2, ... do
Randomly generate a set of blocks Sk ⊂ {1, 2, ..., n}
xk+1 ← xk + (h(xk))[Sk ]
end
Algorithm 3: Parallel Coordinate Descent Method 1 (PCDM 1)
Parallel Coordinate Descent Method 2 (PCDM 2)
Choose initial point x0 ∈ RN
for k = 0, 1, 2, ... do
Randomly generate a set of blocks Sk ⊂ {1, 2, ..., n}
xk+1 ← xk + (h(xk))[Sk ]
If F(xk+1) > F(xk), then xk+1 ← xk
end
Algorithm 4: Parallel Coordinate Descent Method 2 (PCDM 2)
Limitations
1 Each iteration has minimal computation while communication
overhead will be large
(Synchronous Vs Asynchronous - Optimally Strong convexity
required for convergence).
2 Convergence cannot be proved if nature of F(x), i.e
separability and smoothness is not known.
References and Further Reading I
[1] Peter Richtarik, Martin Takac
University of Edinburgh, United Kingdom
Parallel Coordinate Descent Methods for Big Data
Optimization
[2] Joseph Bradley, Aapo Kyrola, Danny Bickson, Carlos
Guestrin
Carnegie Mellon University, Pittsburgh, USA
Parallel Coordinate Descent for L1-Regularized Loss
Minimization
[3] Ji Liu, Stephen J. Wright
University of Wisconsin, Madison, USA
Asynchronous Stochastic Descent: Parallelism and
Convergence Properties

Weitere ähnliche Inhalte

Was ist angesagt?

Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...JamesMa54
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theoremJamesMa54
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedPei-Che Chang
 
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient AlgorithmOptimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient AlgorithmTasuku Soma
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeTasuku Soma
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...
Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...
Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...Amro Elfeki
 
Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction Amro Elfeki
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Amro Elfeki
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number GeneratorsDarshini Parikh
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Pei-Che Chang
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyAmro Elfeki
 
SPSF02 - Graphical Data Representation
SPSF02 - Graphical Data RepresentationSPSF02 - Graphical Data Representation
SPSF02 - Graphical Data RepresentationSyeilendra Pramuditya
 

Was ist angesagt? (20)

Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and Related
 
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient AlgorithmOptimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...
Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...
Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...
 
Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
 
Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic Hydrology
 
cheb_conf_aksenov.pdf
cheb_conf_aksenov.pdfcheb_conf_aksenov.pdf
cheb_conf_aksenov.pdf
 
SPSF02 - Graphical Data Representation
SPSF02 - Graphical Data RepresentationSPSF02 - Graphical Data Representation
SPSF02 - Graphical Data Representation
 
DSP 05 _ Sheet Five
DSP 05 _ Sheet FiveDSP 05 _ Sheet Five
DSP 05 _ Sheet Five
 
SPSF03 - Numerical Integrations
SPSF03 - Numerical IntegrationsSPSF03 - Numerical Integrations
SPSF03 - Numerical Integrations
 

Andere mochten auch

Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Dsp and the prediction
Dsp and the predictionDsp and the prediction
Dsp and the predictionSoohan Ahn
 
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...少华 白
 
Parallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNParallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNDataWorks Summit
 
Solution of engineering problems
Solution of engineering problemsSolution of engineering problems
Solution of engineering problemsGiridhar D
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascentjeykottalam
 

Andere mochten auch (7)

Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Dsp and the prediction
Dsp and the predictionDsp and the prediction
Dsp and the prediction
 
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
 
Parallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNParallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARN
 
Solution of engineering problems
Solution of engineering problemsSolution of engineering problems
Solution of engineering problems
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 

Ähnlich wie Parallel Coordinate Descent Algorithms Review

MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodTasuku Soma
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniquesKrishna Gali
 
FEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's MethodFEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's MethodSuddhasheel GHOSH, PhD
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsPantelis Sopasakis
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationAlexander Litvinenko
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Point Collocation Method used in the solving of Differential Equations, parti...
Point Collocation Method used in the solving of Differential Equations, parti...Point Collocation Method used in the solving of Differential Equations, parti...
Point Collocation Method used in the solving of Differential Equations, parti...Suddhasheel GHOSH, PhD
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Leo Asselborn
 

Ähnlich wie Parallel Coordinate Descent Algorithms Review (20)

Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
02 basics i-handout
02 basics i-handout02 basics i-handout
02 basics i-handout
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares Method
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 
FEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's MethodFEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's Method
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
Steven Duplij, "Polyadic rings of p-adic integers"
Steven Duplij, "Polyadic rings of p-adic integers"Steven Duplij, "Polyadic rings of p-adic integers"
Steven Duplij, "Polyadic rings of p-adic integers"
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Point Collocation Method used in the solving of Differential Equations, parti...
Point Collocation Method used in the solving of Differential Equations, parti...Point Collocation Method used in the solving of Differential Equations, parti...
Point Collocation Method used in the solving of Differential Equations, parti...
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
 
MA8353 TPDE
MA8353 TPDEMA8353 TPDE
MA8353 TPDE
 
Secant Method
Secant MethodSecant Method
Secant Method
 

Kürzlich hochgeladen

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Kürzlich hochgeladen (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Parallel Coordinate Descent Algorithms Review

  • 1. Parallel Coordinate Descent Algorithms A Quick Review Chaitanya Prasad, S. Shaleen Kumar Gupta Nanyang Technological Institute, Singapore June 7, 2016
  • 2. Outline 1 Introduction 2 Sequential Coordinate Descent 3 Naive Parallelization 4 Intuition behind Parallelization 5 Expected Separable Over-approximation (ESO) 6 Algorithm for Parallel Coordinate Decent 7 Limitations
  • 3. Definition Coordinate wise minimization of the objection function. Objective function is of the form - F(x) = f (x) + Ω(x) (1) where f(x) = partially separable function Ω(x) = simple block separable function
  • 4. Sequential Coordinate Descent (SCD) Set x = 0 ∈ R2d + ; while not converged do Choose j ∈ {1, ..., 2d} uniformly at random; Set δxj ← max{−xj , −( F(x))j /β}; Update xj ← xj + δxj ; end Algorithm 1: Shooting: Sequential Coordinate Descent
  • 5. Approach to Naive Parallelization Each iteration of SCD minimizes one single coordinate. We can parallelize by updating multiple coordinates at each iteration by different processors.
  • 6. Why Naive Parallelization won’t work 1 Theoretically it is proven that ”one at a time” converges while ”all at once” update may not. 2 Depends on correlation among coordinates.
  • 7. Intuition behind Parallelization If ∆x is the collective update to x in one single iteration by a naive parallel approach then F(x + ∆x) − F(x) < −1 2 ij ∈Pt (δxij )2 + 1 2 ij ,ik ∈Pt ,j=k (AT A)ij ,ik δxij δxik (2) where A = design matrix for L1 regularised loss function. Therefore we need to design our step sizes in parallel updates based on the interference amount.
  • 8. Expected Separable Over-approximation (ESO) 1 Let the update rule be generally defined as x ← x + 1 β i∈ ˆS hi ei (3) where h defines the update rule. Then E[f (x + h[ ˆS])] ≤ f (x) + E[ ˆS] n (( f (x))T h) + β 2 ( hw )2 (4) where h[ ˆS] = i∈ ˆS hi ei ( hw )2 = n i=1 wi (hi )2 2 We overapproximate function by a quadratic and minimize that in PCDM1 and PCDM2.
  • 9. Shotgun: Parallel Coordinate Descent Choose number of parallel updates P ≥ 1; Set x = 0 ∈ R2d + ; while not converged do Choose random subset of P weights in {1,...,2d}; In parallel on P processors Get assigned weight j; Set δxj ← max{−xj , −( F(x))j /β}; Update xj ← xj + δxj ; end Algorithm 2: Shotgun: Parallel Coordinate Descent
  • 10. Parallel Coordinate Descent Method 1 (PCDM 1) Choose initial point x0 ∈ RN for k = 0, 1, 2, ... do Randomly generate a set of blocks Sk ⊂ {1, 2, ..., n} xk+1 ← xk + (h(xk))[Sk ] end Algorithm 3: Parallel Coordinate Descent Method 1 (PCDM 1)
  • 11. Parallel Coordinate Descent Method 2 (PCDM 2) Choose initial point x0 ∈ RN for k = 0, 1, 2, ... do Randomly generate a set of blocks Sk ⊂ {1, 2, ..., n} xk+1 ← xk + (h(xk))[Sk ] If F(xk+1) > F(xk), then xk+1 ← xk end Algorithm 4: Parallel Coordinate Descent Method 2 (PCDM 2)
  • 12. Limitations 1 Each iteration has minimal computation while communication overhead will be large (Synchronous Vs Asynchronous - Optimally Strong convexity required for convergence). 2 Convergence cannot be proved if nature of F(x), i.e separability and smoothness is not known.
  • 13. References and Further Reading I [1] Peter Richtarik, Martin Takac University of Edinburgh, United Kingdom Parallel Coordinate Descent Methods for Big Data Optimization [2] Joseph Bradley, Aapo Kyrola, Danny Bickson, Carlos Guestrin Carnegie Mellon University, Pittsburgh, USA Parallel Coordinate Descent for L1-Regularized Loss Minimization [3] Ji Liu, Stephen J. Wright University of Wisconsin, Madison, USA Asynchronous Stochastic Descent: Parallelism and Convergence Properties