SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Taiji Suzuki
The University of Tokyo / AIP-RIKEN
NeurIPS2020
Generalization bound of globally optimal
non-convex neural network training:
Transportation map estimation by
infinite dimensional Langevin dynamics
1
Summary
Neural network optimization
• We formulate NN training as an infinite dimensional
gradient Langevin dynamics in RKHS.
➢“Lift” of noisy gradient descent trajectory.
• Global optimality is ensured.
➢Geometric ergodicity + time discretization error
• Generalization error bound + Excess risk bound.
➢(i) 1/ 𝑛 gen error. (ii) Fast learning rate of excess risk.
2
• Finite/infinite width can be treated in a unifying manner.
• Good generalization error guarantee
→ Different from NTK and mean field analysis.
Difficulty of NN optimization
Optimization of neural network is “difficult”
because of
3
Nonconvexity High-dimensionality
+
•Neural tangent kernel:
➢ Take infinite width asymptotics as 𝑛 → ∞.
➢ Benefit of NN is lost compared with kernel methods.
•Mean field analysis:
➢ Take infinite width asymptotics to guarantee convergence.
➢ Its generalization error is not well understood.
•(Usual) gradient Langevin dynamics:
➢ Suffer from curse of dimensionality.
Our formulation:
Infinite dimensional gradient Langevin dynamics.
Infinite dim neural network
• 2-layer NN: direct expression
4
(training loss)
• 2-layer NN: transportation map expression
(infinite width)
(integral representation)
Also includes
• DNN
• ResNet
etc.
𝑎𝑚 = 0 (𝑚 > 𝑀)
→ finite width network
Mean field model 5
Expectation w.r.t. prob. density 𝜌 of (𝑎, 𝑤):
Optimization of 𝑓 ⇔ Optimization of 𝜌
Continuity equation
𝑣𝑡: gradient
Convergence is guaranteed for 𝜌𝑡 with density.
(Infinite width)
(movement of
each particle)
(distribution)
[Nitanda&Suzuki, 2017][Chizat&Bach, 2018][Mei, Montanari&Nguyen, 2018]
Each neuron corresponds
to one particle.
One partilce
“Lift” of neural network training 6
Transportation map formulation:
(finite width)
𝜌0 has a finite discrete support
→ finite width network
Finite/Infinite width can be treated
in a unifying manner.
(unlike existing frame-work such as NTK and mean field)
Infinite-dim non-convex optimization 7
Ex.
• ℋ: 𝐿2(𝜌)
• ℋ𝐾: RKHS (e.g., Sobolev sp.)
Optimal solution
nonconvex
We utilize gradient Langevin dynamics
in a Hilbert space to optimize the objective.
Infinite-dim. Langevin dynamics 8
: RKHS with kernel 𝐾.
Cylindrical Brownian motion:
Time discretization
Analogous to Gaussian process estimator.
(Gaussian measure associated with RKHS)
Stationary
distribution
Likelihood Prior
(more precisely we consider semi-implicit Euler scheme)
Infinite dimensional setting
Hilbert space
9
RKHS structure
Assumption (eigenvalue decay)
(not essential, can be relaxed to 𝜇𝑘 ∼ 𝑘−𝑝
for 𝑝 > 1)
Risk bounds of NN training 10
Gen. error: Excess risk:
Time discretization
Optimization method (Infinite dimensional GLD):
Error bound 11
Thm (Generalization error bound)
with probability 1 − 𝛿.
Opt. error:
[Muzellec, Sato, Massias, Suzuki, arXiv:2003.00306 (2020)]
Ο(1/ 𝑛)
PAC-Bayesian stability bound [Rivasplata, Kuzborskij, Szepesvári, and Shawe-Taylor, 2019]
• Loss function ℓ is “sufficiently smooth.”
• Loss and its gradients are bounded:
Assumption
(geometric ergodicity + time discretization)
Λ𝜂
∗ : spectral gap
Fast rate: general result 12
Thm (Excess risk bound: fast rate)
Let and .
Can be faster than Ο(1/ 𝑛)
Example: classification & regression 13
Strong low noise condition:
For sufficiently large 𝑛 and any 𝛽 ≤ 𝑛,
Classification
Regression
Model:
Excess classification error
Summary
Neural network optimization
• We formulate NN training as an infinite dimensional
gradient Langevin dynamics in RKHS.
➢“Lift” of noisy gradient descent trajectory.
• Global optimality is ensured.
➢Geometric ergodicity + time discretization error
• Generalization error bound + Excess risk bound.
➢(i) 1/ 𝑛 gen error. (ii) Fast learning rate of excess risk.
14
• Finite/infinite width can be treated in a unifying manner.
• Good generalization error guarantee
→ Different from NTK and mean field analysis.

Weitere ähnliche Inhalte

Was ist angesagt?

Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 

Was ist angesagt? (20)

Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
딥러닝 논문읽기 모임 - 송헌 Deep sets 슬라이드
딥러닝 논문읽기 모임 - 송헌 Deep sets 슬라이드딥러닝 논문읽기 모임 - 송헌 Deep sets 슬라이드
딥러닝 논문읽기 모임 - 송헌 Deep sets 슬라이드
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 

Ähnlich wie [NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

Ähnlich wie [NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics (20)

Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
08039246
0803924608039246
08039246
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
7926563mocskoff pack method k sampling.ppt
7926563mocskoff pack method k sampling.ppt7926563mocskoff pack method k sampling.ppt
7926563mocskoff pack method k sampling.ppt
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Normalized averaging using adaptive applicability functions with applications...
Normalized averaging using adaptive applicability functions with applications...Normalized averaging using adaptive applicability functions with applications...
Normalized averaging using adaptive applicability functions with applications...
 
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
 
The Traveling Salesman Problem: A Neural Network Perspective
The Traveling Salesman Problem: A Neural Network PerspectiveThe Traveling Salesman Problem: A Neural Network Perspective
The Traveling Salesman Problem: A Neural Network Perspective
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
 
Turbulence numerical modelling
Turbulence numerical modellingTurbulence numerical modelling
Turbulence numerical modelling
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Conformer review
Conformer reviewConformer review
Conformer review
 
Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive MeshEnsemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Sparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and ApplicationsSparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and Applications
 

Mehr von Taiji Suzuki

Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...
Taiji Suzuki
 

Mehr von Taiji Suzuki (12)

深層学習の数理:カーネル法, スパース推定との接点
深層学習の数理:カーネル法, スパース推定との接点深層学習の数理:カーネル法, スパース推定との接点
深層学習の数理:カーネル法, スパース推定との接点
 
数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理
 
深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理
 
はじめての機械学習
はじめての機械学習はじめての機械学習
はじめての機械学習
 
Ibis2016
Ibis2016Ibis2016
Ibis2016
 
Sparse estimation tutorial 2014
Sparse estimation tutorial 2014Sparse estimation tutorial 2014
Sparse estimation tutorial 2014
 
Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論
 
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additi...
 
統計的学習理論チュートリアル: 基礎から応用まで (Ibis2012)
統計的学習理論チュートリアル: 基礎から応用まで (Ibis2012)統計的学習理論チュートリアル: 基礎から応用まで (Ibis2012)
統計的学習理論チュートリアル: 基礎から応用まで (Ibis2012)
 
Jokyokai
JokyokaiJokyokai
Jokyokai
 
Jokyokai2
Jokyokai2Jokyokai2
Jokyokai2
 

Kürzlich hochgeladen

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 

[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

  • 1. Taiji Suzuki The University of Tokyo / AIP-RIKEN NeurIPS2020 Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics 1
  • 2. Summary Neural network optimization • We formulate NN training as an infinite dimensional gradient Langevin dynamics in RKHS. ➢“Lift” of noisy gradient descent trajectory. • Global optimality is ensured. ➢Geometric ergodicity + time discretization error • Generalization error bound + Excess risk bound. ➢(i) 1/ 𝑛 gen error. (ii) Fast learning rate of excess risk. 2 • Finite/infinite width can be treated in a unifying manner. • Good generalization error guarantee → Different from NTK and mean field analysis.
  • 3. Difficulty of NN optimization Optimization of neural network is “difficult” because of 3 Nonconvexity High-dimensionality + •Neural tangent kernel: ➢ Take infinite width asymptotics as 𝑛 → ∞. ➢ Benefit of NN is lost compared with kernel methods. •Mean field analysis: ➢ Take infinite width asymptotics to guarantee convergence. ➢ Its generalization error is not well understood. •(Usual) gradient Langevin dynamics: ➢ Suffer from curse of dimensionality. Our formulation: Infinite dimensional gradient Langevin dynamics.
  • 4. Infinite dim neural network • 2-layer NN: direct expression 4 (training loss) • 2-layer NN: transportation map expression (infinite width) (integral representation) Also includes • DNN • ResNet etc. 𝑎𝑚 = 0 (𝑚 > 𝑀) → finite width network
  • 5. Mean field model 5 Expectation w.r.t. prob. density 𝜌 of (𝑎, 𝑤): Optimization of 𝑓 ⇔ Optimization of 𝜌 Continuity equation 𝑣𝑡: gradient Convergence is guaranteed for 𝜌𝑡 with density. (Infinite width) (movement of each particle) (distribution) [Nitanda&Suzuki, 2017][Chizat&Bach, 2018][Mei, Montanari&Nguyen, 2018] Each neuron corresponds to one particle. One partilce
  • 6. “Lift” of neural network training 6 Transportation map formulation: (finite width) 𝜌0 has a finite discrete support → finite width network Finite/Infinite width can be treated in a unifying manner. (unlike existing frame-work such as NTK and mean field)
  • 7. Infinite-dim non-convex optimization 7 Ex. • ℋ: 𝐿2(𝜌) • ℋ𝐾: RKHS (e.g., Sobolev sp.) Optimal solution nonconvex We utilize gradient Langevin dynamics in a Hilbert space to optimize the objective.
  • 8. Infinite-dim. Langevin dynamics 8 : RKHS with kernel 𝐾. Cylindrical Brownian motion: Time discretization Analogous to Gaussian process estimator. (Gaussian measure associated with RKHS) Stationary distribution Likelihood Prior (more precisely we consider semi-implicit Euler scheme)
  • 9. Infinite dimensional setting Hilbert space 9 RKHS structure Assumption (eigenvalue decay) (not essential, can be relaxed to 𝜇𝑘 ∼ 𝑘−𝑝 for 𝑝 > 1)
  • 10. Risk bounds of NN training 10 Gen. error: Excess risk: Time discretization Optimization method (Infinite dimensional GLD):
  • 11. Error bound 11 Thm (Generalization error bound) with probability 1 − 𝛿. Opt. error: [Muzellec, Sato, Massias, Suzuki, arXiv:2003.00306 (2020)] Ο(1/ 𝑛) PAC-Bayesian stability bound [Rivasplata, Kuzborskij, Szepesvári, and Shawe-Taylor, 2019] • Loss function ℓ is “sufficiently smooth.” • Loss and its gradients are bounded: Assumption (geometric ergodicity + time discretization) Λ𝜂 ∗ : spectral gap
  • 12. Fast rate: general result 12 Thm (Excess risk bound: fast rate) Let and . Can be faster than Ο(1/ 𝑛)
  • 13. Example: classification & regression 13 Strong low noise condition: For sufficiently large 𝑛 and any 𝛽 ≤ 𝑛, Classification Regression Model: Excess classification error
  • 14. Summary Neural network optimization • We formulate NN training as an infinite dimensional gradient Langevin dynamics in RKHS. ➢“Lift” of noisy gradient descent trajectory. • Global optimality is ensured. ➢Geometric ergodicity + time discretization error • Generalization error bound + Excess risk bound. ➢(i) 1/ 𝑛 gen error. (ii) Fast learning rate of excess risk. 14 • Finite/infinite width can be treated in a unifying manner. • Good generalization error guarantee → Different from NTK and mean field analysis.