SlideShare a Scribd company logo
1 of 39
Download to read offline
MTH702 Optimization
Nonlinear Optimization
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
1/18
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
candidate solutions, variables, parameters x ∈ Rd
objective function f : Rd
→ R
typically: technical assumption: f is continuous and differentiable
1/18
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
candidate solutions, variables, parameters x ∈ Rd
objective function f : Rd
→ R
typically: technical assumption: f is continuous and differentiable
Q: Is this problem easy? OR When is this easy?
1/18
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
candidate solutions, variables, parameters x ∈ Rd
objective function f : Rd
→ R
typically: technical assumption: f is continuous and differentiable
Q: Is this problem easy? OR When is this easy?
Q: How to find the best solution (optimal solution)?
1/18
Questions
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
Q: why do we study optimization?
Q: what are you hoping to learn?
2/18
Why? And How?
Optimization is everywhere
machine learning, big data, statistics, data analysis of all kinds, finance, logistics, planning, control
theory, mathematics, search engines, simulations, and many other applications ...
Mathematical Modeling:
defining & modeling the optimization problem
Computational Optimization:
running an (appropriate) optimization algorithm
3/18
Optimization for Machine Learning
Mathematical Modeling:
defining & and measuring the machine learning model
Computational Optimization:
learning the model parameters
Theory vs. practice:
libraries are available, algorithms treated as “black box” by most practitioners
4/18
Optimization for Machine Learning
Mathematical Modeling:
defining & and measuring the machine learning model
Computational Optimization:
learning the model parameters
Theory vs. practice:
libraries are available, algorithms treated as “black box” by most practitioners
”.... just use Adam....”
4/18
Optimization for Machine Learning
Mathematical Modeling:
defining & and measuring the machine learning model
Computational Optimization:
learning the model parameters
Theory vs. practice:
libraries are available, algorithms treated as “black box” by most practitioners
”.... just use Adam....”
Not here: we look inside the algorithms and try to understand why and how fast they work!
4/18
Optimization Algorithms
Optimization at large scale: simplicity rules!
In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms:
Gradient Descent
Stochastic Gradient Descent (SGD)
Coordinate Descent
5/18
Optimization Algorithms
Optimization at large scale: simplicity rules!
In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms:
Gradient Descent
Stochastic Gradient Descent (SGD)
Coordinate Descent
History:
1847: Cauchy proposes gradient descent
1950s: Linear Programs, soon followed by non-linear, SGD
1980s: General optimization, convergence theory
2005-today: Large scale optimization, convergence of SGD
5/18
Example: Coordinate Descent
Goal: Find x? ∈ Rd minimizing f(x). (Example: d = 2)
x?
x1
x2
Idea: Update one coordinate at a time, while keeping others fixed.
6/18
Example: Coordinate Descent
Goal: Find x? ∈ Rd minimizing f(x).
x?
x1
x2
Idea: Update one coordinate at a time, while keeping others fixed.
7/18
Example: Coordinate Descent
Goal: Find x? ∈ Rd minimizing f(x).
x?
x1
x2
Idea: Update one coordinate at a time, while keeping others fixed.
Q: How to pick coordinate direction? How to find out how far to go? Does it always
7/18
Oracle
Definitions
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
P is an optimization problem (from class of problems P ∈ P)
Oracle O answers questions for some optimization method M
Q: What kind of questions we would need to be answered?
9/18
Definitions
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
P is an optimization problem (from class of problems P ∈ P)
Oracle O answers questions for some optimization method M
Q: What kind of questions we would need to be answered?
A: what is f(x) for given x? Is the x ∈ X? can we compute ∇f(x) and what is it?
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
9/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
Approximate solution to P
in many areas of numerical analysis, it is impossible to find exact solution
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
Approximate solution to P
in many areas of numerical analysis, it is impossible to find exact solution
relaxed goal: find an approximate solution to P with some accuracy   0!
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
Approximate solution to P
in many areas of numerical analysis, it is impossible to find exact solution
relaxed goal: find an approximate solution to P with some accuracy   0!
let T be some termination criteria
10/18
Complexity of General Iterative Scheme [N+
18]
Analytical complexity
number of calls of the
oracle necessary to solve
problem P to accuracy 
Arithmetical complexity
total number of arithmetic
operations (including the
work of oracle and work of
method) which is
necessary for solving
problem P up to accuracy

11/18
Standard Oracles
Zero-order oracle
returns the function value f(x)
First-order oracle
returns the function value f(x), ∇f(x)
Second-order oracle
returns the function value f(x), ∇f(x), ∇2f(x)
12/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
13/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
We need some assumptions on f to derive some complexity bounds + we need an algorithm!
13/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
We need some assumptions on f to derive some complexity bounds + we need an algorithm!
Lipschitz Continuity of f
The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd
Q: How can it help us?
13/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
We need some assumptions on f to derive some complexity bounds + we need an algorithm!
Lipschitz Continuity of f
The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd
Q: How can it help us?
A: Assume we split Bd into small grid points. Let ∆ is the size of the grid. If we
return the ”best” grid point, what has to be ∆ to guarantee  solutions?
13/18
Uniform Grid Method [N+
18]
note that Nesterov uses n for dimension of the problem
any two neighboring points
x, y in the grid have
kx − yk∞ ≤
1
p
for x∗, there is a grid point
x̄ such that kx∗ − x̄k∞ ≤ 1
p
14/18
Uniform Grid Method [N+
18]
note that Nesterov uses n for dimension of the problem
any two neighboring points
x, y in the grid have
kx − yk∞ ≤
1
p
for x∗, there is a grid point
x̄ such that kx∗ − x̄k∞ ≤ 1
p
|f(x̄) − f(x∗)| ≤
Lkx̄ − x∗k∞ ≤ L
2p
Q: How many Oracle
class does the method
need?
Q: How to pick p to
guarantee  solution?
14/18
Final Complexity
to find  solution, we need
L
2p
≤  ⇒ p =

L
2

+ 1
Analytical Complexity
Q: How many calls of zero-order oracle do we need?
15/18
Final Complexity
to find  solution, we need
L
2p
≤  ⇒ p =

L
2

+ 1
Analytical Complexity
Q: How many calls of zero-order oracle do we need?
A: We need 
L
2

+ 1
d
zero-order oracle calls
15/18
Final Complexity
to find  solution, we need
L
2p
≤  ⇒ p =

L
2

+ 1
Analytical Complexity
Q: How many calls of zero-order oracle do we need?
A: We need 
L
2

+ 1
d
zero-order oracle calls
Q: Is this also the worst-case behaviour (lower-bound) OR we are just using ”very
naı̈ve” algorithm?
15/18
Lower-Bound and Computational Need for Tiny Problem
Lower-Bound
We can build a L-Lipchitz function that requires any method to explore ( L
2 )d points before it
can identify  solution.
Example
Assume L = 2, d = 10 and  = 0.01
If we change d to d + 1, then the
estimate is multiplied by one
hundred
if we multiply  by two, we
reduce the complexity by a
factor of a thousand
if  = 8%, then we need only
two weeks
16/18
Conclusion
a simple example above shows that optimization in hard!
Q: What can save us?
17/18
Conclusion
a simple example above shows that optimization in hard!
Q: What can save us?
we can assume some special properties of the problems
use different oracle (e.g., use gradients)
17/18
Bibliography
Yurii Nesterov et al.
Lectures on convex optimization, volume 137.
Springer, 2018.
Thanks also to Prof. Martin Jaggi and Prof. Mark Schmidt for their slides and lectures and
[N+18].
18/18
mbzuai.ac.ae
Mohamed bin Zayed
University of Artificial Intelligence
Masdar City
Abu Dhabi
United Arab Emirates

More Related Content

Similar to Introduction to optimizxation

Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometrySubhashis Hazarika
 
NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING karishma gupta
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdfAnaNeacsu5
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...PadmaGadiyar
 
Solving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxSolving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxwhitneyleman54422
 
Theory of Computation Introduction Session
Theory of Computation Introduction SessionTheory of Computation Introduction Session
Theory of Computation Introduction SessionRushabh2428
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometryhsubhashis
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationTasuku Soma
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communicationsDeepshika Reddy
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9M S Prasad
 
Kk20503 1 introduction
Kk20503 1 introductionKk20503 1 introduction
Kk20503 1 introductionLow Ying Hao
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
 

Similar to Introduction to optimizxation (20)

Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometry
 
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
 
Algorithms DM
Algorithms DMAlgorithms DM
Algorithms DM
 
NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
 
Approximation
ApproximationApproximation
Approximation
 
Solving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxSolving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docx
 
Theory of Computation Introduction Session
Theory of Computation Introduction SessionTheory of Computation Introduction Session
Theory of Computation Introduction Session
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometry
 
L20 Simplex Method
L20 Simplex MethodL20 Simplex Method
L20 Simplex Method
 
AppsDiff3c.pdf
AppsDiff3c.pdfAppsDiff3c.pdf
AppsDiff3c.pdf
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function Maximization
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communications
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
 
Kk20503 1 introduction
Kk20503 1 introductionKk20503 1 introduction
Kk20503 1 introduction
 
OI.ppt
OI.pptOI.ppt
OI.ppt
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic Notation
 

Recently uploaded

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 

Recently uploaded (20)

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 

Introduction to optimizxation

  • 3. Optimization General optimization problem minimizex∈Rd f(x) with x ∈ X ⊆ Rd candidate solutions, variables, parameters x ∈ Rd objective function f : Rd → R typically: technical assumption: f is continuous and differentiable 1/18
  • 4. Optimization General optimization problem minimizex∈Rd f(x) with x ∈ X ⊆ Rd candidate solutions, variables, parameters x ∈ Rd objective function f : Rd → R typically: technical assumption: f is continuous and differentiable Q: Is this problem easy? OR When is this easy? 1/18
  • 5. Optimization General optimization problem minimizex∈Rd f(x) with x ∈ X ⊆ Rd candidate solutions, variables, parameters x ∈ Rd objective function f : Rd → R typically: technical assumption: f is continuous and differentiable Q: Is this problem easy? OR When is this easy? Q: How to find the best solution (optimal solution)? 1/18
  • 6. Questions minimizex∈Rd f(x) with x ∈ X ⊆ Rd Q: why do we study optimization? Q: what are you hoping to learn? 2/18
  • 7. Why? And How? Optimization is everywhere machine learning, big data, statistics, data analysis of all kinds, finance, logistics, planning, control theory, mathematics, search engines, simulations, and many other applications ... Mathematical Modeling: defining & modeling the optimization problem Computational Optimization: running an (appropriate) optimization algorithm 3/18
  • 8. Optimization for Machine Learning Mathematical Modeling: defining & and measuring the machine learning model Computational Optimization: learning the model parameters Theory vs. practice: libraries are available, algorithms treated as “black box” by most practitioners 4/18
  • 9. Optimization for Machine Learning Mathematical Modeling: defining & and measuring the machine learning model Computational Optimization: learning the model parameters Theory vs. practice: libraries are available, algorithms treated as “black box” by most practitioners ”.... just use Adam....” 4/18
  • 10. Optimization for Machine Learning Mathematical Modeling: defining & and measuring the machine learning model Computational Optimization: learning the model parameters Theory vs. practice: libraries are available, algorithms treated as “black box” by most practitioners ”.... just use Adam....” Not here: we look inside the algorithms and try to understand why and how fast they work! 4/18
  • 11. Optimization Algorithms Optimization at large scale: simplicity rules! In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms: Gradient Descent Stochastic Gradient Descent (SGD) Coordinate Descent 5/18
  • 12. Optimization Algorithms Optimization at large scale: simplicity rules! In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms: Gradient Descent Stochastic Gradient Descent (SGD) Coordinate Descent History: 1847: Cauchy proposes gradient descent 1950s: Linear Programs, soon followed by non-linear, SGD 1980s: General optimization, convergence theory 2005-today: Large scale optimization, convergence of SGD 5/18
  • 13. Example: Coordinate Descent Goal: Find x? ∈ Rd minimizing f(x). (Example: d = 2) x? x1 x2 Idea: Update one coordinate at a time, while keeping others fixed. 6/18
  • 14. Example: Coordinate Descent Goal: Find x? ∈ Rd minimizing f(x). x? x1 x2 Idea: Update one coordinate at a time, while keeping others fixed. 7/18
  • 15. Example: Coordinate Descent Goal: Find x? ∈ Rd minimizing f(x). x? x1 x2 Idea: Update one coordinate at a time, while keeping others fixed. Q: How to pick coordinate direction? How to find out how far to go? Does it always 7/18
  • 17. Definitions minimizex∈Rd f(x) with x ∈ X ⊆ Rd P is an optimization problem (from class of problems P ∈ P) Oracle O answers questions for some optimization method M Q: What kind of questions we would need to be answered? 9/18
  • 18. Definitions minimizex∈Rd f(x) with x ∈ X ⊆ Rd P is an optimization problem (from class of problems P ∈ P) Oracle O answers questions for some optimization method M Q: What kind of questions we would need to be answered? A: what is f(x) for given x? Is the x ∈ X? can we compute ∇f(x) and what is it? The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P 9/18
  • 19. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? 10/18
  • 20. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? 10/18
  • 21. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? Approximate solution to P in many areas of numerical analysis, it is impossible to find exact solution 10/18
  • 22. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? Approximate solution to P in many areas of numerical analysis, it is impossible to find exact solution relaxed goal: find an approximate solution to P with some accuracy 0! 10/18
  • 23. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? Approximate solution to P in many areas of numerical analysis, it is impossible to find exact solution relaxed goal: find an approximate solution to P with some accuracy 0! let T be some termination criteria 10/18
  • 24. Complexity of General Iterative Scheme [N+ 18] Analytical complexity number of calls of the oracle necessary to solve problem P to accuracy Arithmetical complexity total number of arithmetic operations (including the work of oracle and work of method) which is necessary for solving problem P up to accuracy 11/18
  • 25. Standard Oracles Zero-order oracle returns the function value f(x) First-order oracle returns the function value f(x), ∇f(x) Second-order oracle returns the function value f(x), ∇f(x), ∇2f(x) 12/18
  • 26. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? 13/18
  • 27. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? We need some assumptions on f to derive some complexity bounds + we need an algorithm! 13/18
  • 28. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? We need some assumptions on f to derive some complexity bounds + we need an algorithm! Lipschitz Continuity of f The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd Q: How can it help us? 13/18
  • 29. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? We need some assumptions on f to derive some complexity bounds + we need an algorithm! Lipschitz Continuity of f The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd Q: How can it help us? A: Assume we split Bd into small grid points. Let ∆ is the size of the grid. If we return the ”best” grid point, what has to be ∆ to guarantee solutions? 13/18
  • 30. Uniform Grid Method [N+ 18] note that Nesterov uses n for dimension of the problem any two neighboring points x, y in the grid have kx − yk∞ ≤ 1 p for x∗, there is a grid point x̄ such that kx∗ − x̄k∞ ≤ 1 p 14/18
  • 31. Uniform Grid Method [N+ 18] note that Nesterov uses n for dimension of the problem any two neighboring points x, y in the grid have kx − yk∞ ≤ 1 p for x∗, there is a grid point x̄ such that kx∗ − x̄k∞ ≤ 1 p |f(x̄) − f(x∗)| ≤ Lkx̄ − x∗k∞ ≤ L 2p Q: How many Oracle class does the method need? Q: How to pick p to guarantee solution? 14/18
  • 32. Final Complexity to find solution, we need L 2p ≤ ⇒ p = L 2 + 1 Analytical Complexity Q: How many calls of zero-order oracle do we need? 15/18
  • 33. Final Complexity to find solution, we need L 2p ≤ ⇒ p = L 2 + 1 Analytical Complexity Q: How many calls of zero-order oracle do we need? A: We need L 2 + 1 d zero-order oracle calls 15/18
  • 34. Final Complexity to find solution, we need L 2p ≤ ⇒ p = L 2 + 1 Analytical Complexity Q: How many calls of zero-order oracle do we need? A: We need L 2 + 1 d zero-order oracle calls Q: Is this also the worst-case behaviour (lower-bound) OR we are just using ”very naı̈ve” algorithm? 15/18
  • 35. Lower-Bound and Computational Need for Tiny Problem Lower-Bound We can build a L-Lipchitz function that requires any method to explore ( L 2 )d points before it can identify solution. Example Assume L = 2, d = 10 and = 0.01 If we change d to d + 1, then the estimate is multiplied by one hundred if we multiply by two, we reduce the complexity by a factor of a thousand if = 8%, then we need only two weeks 16/18
  • 36. Conclusion a simple example above shows that optimization in hard! Q: What can save us? 17/18
  • 37. Conclusion a simple example above shows that optimization in hard! Q: What can save us? we can assume some special properties of the problems use different oracle (e.g., use gradients) 17/18
  • 38. Bibliography Yurii Nesterov et al. Lectures on convex optimization, volume 137. Springer, 2018. Thanks also to Prof. Martin Jaggi and Prof. Mark Schmidt for their slides and lectures and [N+18]. 18/18
  • 39. mbzuai.ac.ae Mohamed bin Zayed University of Artificial Intelligence Masdar City Abu Dhabi United Arab Emirates