The statistical physics of learning revisted: Phase transitions in layered neural networks

University of Groningen
University of GroningenUniversity of Groningen
Leipzig, June 2021 1 / 24
The statistical physics of learning revisited:
Phase transitions in layered neural networks
Elisa Oostwal
Michiel Straat
Michael Biehl
Bernoulli Institute for Mathematics,
Computer Science and Artificial Intelligence
University of Groningen / NL
Physica A Vol. 564, 2021, 125517 (open access)
Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation
Leipzig, June 2021 2 / 24
the revival of neural networks
success of multi-layered neural networks (Deep Learning)
• availability of large amounts of training data
• increased computational power
• improved training procedures and set-ups
• task specific network designs, e.g. activation functions
many open questions / lack of theoretical understanding
Leipzig, June 2021 3 / 24
statistical physics of learning
statistical physics of neural networks
training of feed-forward neural networks:
Elizabeth Gardner (1957-1988).
The space of interactions in neural networks.
J. Phys. A 21:257-270, 1988
dynamics of attractor neural networks:
John Hopfield. Neural Networks and
physical systems with emergent
collective computational abilities.
PNAS 79(8):2554-2558, 1982
1991
2001
2011
a successful branch of learning theory:
Leipzig, June 2021 4 / 24
statistical physics of learning
Leipzig, June 2021 5 / 24
N units: high-dim. input
example: a shallow neural network
K hidden units with activation
linear output
soft committee machine
input/output function defined by
• architecture, connectivity, activation functions
• adaptive weights
↑ target
function
• regression: learning from example data
e.g.
Leipzig, June 2021 6 / 24
statistical physics of learning in a nutshell
objective/cost/energy function with
• equilibrium state: compromise/competition between
minimal energy (ground state) vs. number (volume) of available
states with higher energy
• e.g. Metropolis algorithm, noisy gradient descent (Langevin)
with equilibrium (Gibbs-Boltzmann)
control parameter: „inverse temperature“ β =1 / T
• training by stochastic optimization of all adaptive weights
„thermal averages “ over Peq e.g.
minima of free energy microcanonical entropy:
Leipzig, June 2021 7 / 24
machine learning specifics
• energy function is given for a specific set of example data:
defined w.r.t.
• typical properties: additional average of the free energy over
difficult, even for the simplest model density:
with independent identically distributed (i.i.d.)
unstructured input density
• disorder-average of the free energy requires (e.g.) replica trick
frozen disorder
Leipzig, June 2021 8 / 24
machine learning at high temperatures 
• a simplifying limit: high (formal) temperature
with finite
“learn almost nothing... (high T )
...from very many examples”
• independent i.i.d. examples:
generalization error
limitations:
- training error and generalization error cannot be distinguished
- number of examples and training temperature are coupled
- (at best) qualitative agreement with low temperature results
• large number of examples: , in the limit
Leipzig, June 2021 9 / 24
adaptive student N inputs
(K) hidden units (M)
teacher
? ? ? ? ? ? ?
modelling: student teacher scenario
training: minimization of
here: learnable rules, reliable data (outputs provided by teacher)
perfectly matching complexity K=M
two prototypical activation functions:
sigmoidal / ReLU in student and teacher
Leipzig, June 2021 10 / 24
thermodynamic limit, CLT for
normally distributed with zero mean and covariance matrix
large N: Central Limit Theorem
order parameters: model parameters:
macroscopic
properties of
the system
(+ constant) independent of details (e.g. activation)
Leipzig, June 2021 11 / 24
generalization error
on average over P({xi,xj
*})
[D. Saad, S. Solla, 1995]
[M. Straat, 2019]
sigmoidal activation
rectified linear units
Leipzig, June 2021 12 / 24
site symmetry
simplification: orthonormal teacher vectors, isotropic input density
reflects permutation symmetry, allows for hidden unit specialization
sigmoidal
hidden units
ReLU
activations
entropy
(+ constant)
Leipzig, June 2021 13 / 24
given 𝛼, determine (global and local) minima of
given: size of the training data set
K, g(z),
obtain learning curves
typical learning curves
order parameters and generalization error
as a function of the (scaled) training set size
solve:
Leipzig, June 2021 14 / 24
sigmoidal ( K = 2 )
invariance under exchange of
the two hidden units
R=S: both units ~ (w1
* + w2
*) + noise
symmetry breaking phase transition
(second order, continuous) ...
... results in a kink in
the typical learning curve
Leipzig, June 2021 15 / 24
ReLU ( K = 2 )
qualitatively identical behavior
Note: num. values of and/or
are irrelevant, scale depends a.o.
on pre-factor of g(z)
Physica A Vol. 564, 2021, 125517
Leipzig, June 2021 16 / 24
sigmoidal ( K > 2 )
K=5
permutation symmetry of h.u.
initial R=S phase
discontinuous jump in ε g
coexistence of poor and good
generalization
first order transition, local min.
R>S competes with R=S
R>S becomes global minimum
facilitates perfect learning
“anti-specialization” S>R
(overlooked in 1998!)
weak/no effect of additional
anti-specialization on
generalization error
Leipzig, June 2021 17 / 24
ReLU ( K > 2)
K=10
permutation symmetry of h.u.
initial R=S phase
continuous kink in ε g
competing minima of
poor* vs. good generalization
continuous phase transition
global minimum: R>S
local minimum: R<S
* pretty good
Physica A Vol. 564, 2021, 125517
Leipzig, June 2021 18 / 24
ReLU ( large K )
permutation symmetry of h.u.
initial R=S phase
specialized and anti-specialized
branch achieve perfect
generalization, asymptotically !
(due to partial linearity of ReLU)
continuous phase transition at
degenerate minima: R>S, R<S
Leipzig, June 2021 19 / 24
Monte Carlo simulations
histogram of
observed Rij
continous Metropolis, ReLU activation, K=4, N=50, β=1 (=T)
gen. error vs. time, specialized and unspecialized initialization
anti-specialized specialized
unspecialized
R=S
R S S R
Leipzig, June 2021 20 / 24
Monte Carlo simulations
sigmoidal activation ReLU
K= 4
large gap / high barrier between
specialized and unspecialized
states delays success of learning
anti-specialized states
display near optimal
performance for large K
stationary generalization error:
Leipzig, June 2021 21 / 24
• formal equilibrium of training at high temperature in
student/teacher model situations of supervised learning
• unspecialized and partially or anti-specialized configurations
compete as local/global minima of the free energy
• phase transitions with scaled number of examples:
 K=2: continuous symmetry-breaking transitions
with equivalent competing states
 K>2, sigmoidal activations: first order transition with
competing states of distinct generalization ability
 K>2, ReLU networks: continuous transition with
competing states of similar performance
Summary
Leipzig, June 2021 22 / 24
piece-wise linear
„sigmoidal“ activation
ReLU
increasing slope
discontinuous to
continuous
Outlook
which is the decisive
property of the activation?
• consider various activation functions (leaky ReLU, swish ... )
most important question:
• study more complex solutions beyond site-symmetry
piece-wise linear activtations
Leipzig, June 2021 23 / 24
• replica trick / annealed approximation
- low temperatures, vary # of examples and T independently
- mismatched student/teacher networks 𝐾 ≠ 𝑀
- overfitting / underfitting effects
• complementary approach:
- dynamics of stochastic gradient descent
- description in terms of ODE for order parameters
• deep networks
- many hidden layers
- tree-like architectures with uncorrelated branches
• realistic input data
- clustered / correlated data
- recent developments: Zdeborova, Mezard, Goldt et al.
outlook (selected topics)
Leipzig, June 2021 24 / 24
www.cs.rug.nl/~biehl m.biehl@rug.nl
Questions ?
see also for: algorithm development in machine learning
applications in medicine, life sciences, astronomy …
1 von 24

Recomendados

The statistical physics of learning - revisited von
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisitedUniversity of Groningen
127 views30 Folien
2017: Prototype-based models in unsupervised and supervised machine learning von
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learningUniversity of Groningen
85 views22 Folien
OOPSLA04.ppt von
OOPSLA04.pptOOPSLA04.ppt
OOPSLA04.pptPtidej Team
197 views33 Folien
About functional SIR von
About functional SIRAbout functional SIR
About functional SIRtuxette
240 views60 Folien
From RNN to neural networks for cyclic undirected graphs von
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
439 views32 Folien
Random Forest for Big Data von
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
1.2K views86 Folien

Más contenido relacionado

Was ist angesagt?

Convolutional networks and graph networks through kernels von
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
880 views22 Folien
Fem lecture von
Fem lectureFem lecture
Fem lectureMuhammad Mohsin Waseem
2.1K views35 Folien
Kernel methods and variable selection for exploratory analysis and multi-omic... von
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
199 views40 Folien
Lecture 1 test von
Lecture 1 testLecture 1 test
Lecture 1 testfalcarragh
409 views34 Folien
[ppt] von
[ppt][ppt]
[ppt]butest
602 views60 Folien
Kernel methods for data integration in systems biology von
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
582 views115 Folien

Was ist angesagt?(20)

Convolutional networks and graph networks through kernels von tuxette
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette880 views
Kernel methods and variable selection for exploratory analysis and multi-omic... von tuxette
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
tuxette199 views
Lecture 1 test von falcarragh
Lecture 1 testLecture 1 test
Lecture 1 test
falcarragh409 views
[ppt] von butest
[ppt][ppt]
[ppt]
butest602 views
Kernel methods for data integration in systems biology von tuxette
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
tuxette582 views
Investigating the 3D structure of the genome with Hi-C data analysis von tuxette
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
tuxette1.6K views
A review on structure learning in GNN von tuxette
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
tuxette312 views
Robotics exploatory plans week 1 von Kevin Kopec
Robotics exploatory plans week 1Robotics exploatory plans week 1
Robotics exploatory plans week 1
Kevin Kopec238 views
Finite Element Methode (FEM) Notes von Zulkifli Yunus
Finite Element Methode (FEM) NotesFinite Element Methode (FEM) Notes
Finite Element Methode (FEM) Notes
Zulkifli Yunus24.6K views
010_20160216_Variational Gaussian Process von Ha Phuong
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
Ha Phuong833 views
Differential analyses of structures in HiC data von tuxette
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
tuxette217 views
About functional SIR von tuxette
About functional SIRAbout functional SIR
About functional SIR
tuxette375 views
Cross-view Activity Recognition using Hankelets von George Oleinikov
Cross-view Activity Recognition using HankeletsCross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using Hankelets
George Oleinikov967 views
Similarity Features, and their Role in Concept Alignment Learning von Shenghui Wang
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
Shenghui Wang310 views
Advanced Support Vector Machine for classification in Neural Network von Ashwani Jha
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural Network
Ashwani Jha1.3K views

Similar a The statistical physics of learning revisted: Phase transitions in layered neural networks

Solution of a subclass of lane emden differential equation by variational ite... von
Solution of a subclass of lane emden differential equation by variational ite...Solution of a subclass of lane emden differential equation by variational ite...
Solution of a subclass of lane emden differential equation by variational ite...Alexander Decker
385 views14 Folien
11.solution of a subclass of lane emden differential equation by variational ... von
11.solution of a subclass of lane emden differential equation by variational ...11.solution of a subclass of lane emden differential equation by variational ...
11.solution of a subclass of lane emden differential equation by variational ...Alexander Decker
444 views15 Folien
Key Implications Of The Solow Model von
Key Implications Of The Solow ModelKey Implications Of The Solow Model
Key Implications Of The Solow ModelLaura Brown
2 views45 Folien
Symbolic Background Knowledge for Machine Learning von
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
93 views64 Folien
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa... von
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...University of Groningen
287 views75 Folien
11.[36 49]solution of a subclass of lane emden differential equation by varia... von
11.[36 49]solution of a subclass of lane emden differential equation by varia...11.[36 49]solution of a subclass of lane emden differential equation by varia...
11.[36 49]solution of a subclass of lane emden differential equation by varia...Alexander Decker
363 views15 Folien

Similar a The statistical physics of learning revisted: Phase transitions in layered neural networks(20)

Solution of a subclass of lane emden differential equation by variational ite... von Alexander Decker
Solution of a subclass of lane emden differential equation by variational ite...Solution of a subclass of lane emden differential equation by variational ite...
Solution of a subclass of lane emden differential equation by variational ite...
Alexander Decker385 views
11.solution of a subclass of lane emden differential equation by variational ... von Alexander Decker
11.solution of a subclass of lane emden differential equation by variational ...11.solution of a subclass of lane emden differential equation by variational ...
11.solution of a subclass of lane emden differential equation by variational ...
Alexander Decker444 views
Key Implications Of The Solow Model von Laura Brown
Key Implications Of The Solow ModelKey Implications Of The Solow Model
Key Implications Of The Solow Model
Laura Brown2 views
Symbolic Background Knowledge for Machine Learning von Steffen Staab
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
Steffen Staab93 views
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa... von University of Groningen
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
11.[36 49]solution of a subclass of lane emden differential equation by varia... von Alexander Decker
11.[36 49]solution of a subclass of lane emden differential equation by varia...11.[36 49]solution of a subclass of lane emden differential equation by varia...
11.[36 49]solution of a subclass of lane emden differential equation by varia...
Alexander Decker363 views
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI... von IJDKP
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
IJDKP16 views
Lecture: Interatomic Potentials Enabled by Machine Learning von DanielSchwalbeKoda
Lecture: Interatomic Potentials Enabled by Machine LearningLecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine Learning
DanielSchwalbeKoda597 views
Wasserstein 1031 thesis [Chung il kim] von Chung-Il Kim
Wasserstein 1031 thesis [Chung il kim]Wasserstein 1031 thesis [Chung il kim]
Wasserstein 1031 thesis [Chung il kim]
Chung-Il Kim240 views
An Experimental Setup for Teaching Newton's Law of Cooling von inventionjournals
An Experimental Setup for Teaching Newton's Law of Cooling An Experimental Setup for Teaching Newton's Law of Cooling
An Experimental Setup for Teaching Newton's Law of Cooling
The Advancement and Challenges in Computational Physics - Phdassistance von PhD Assistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
PhD Assistance85 views
ELLA LC algorithm presentation in ICIP 2016 von InVID Project
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
InVID Project1.8K views
A Branch-And-Cut Algorithm For Quadratic Assignment Problems Based On Lineari... von Sabrina Baloi
A Branch-And-Cut Algorithm For Quadratic Assignment Problems Based On Lineari...A Branch-And-Cut Algorithm For Quadratic Assignment Problems Based On Lineari...
A Branch-And-Cut Algorithm For Quadratic Assignment Problems Based On Lineari...
Sabrina Baloi3 views
chap4_ann.pptx von ImXaib
chap4_ann.pptxchap4_ann.pptx
chap4_ann.pptx
ImXaib13 views
A First Course In With Applications Complex Analysis von Elizabeth Williams
A First Course In With Applications Complex AnalysisA First Course In With Applications Complex Analysis
A First Course In With Applications Complex Analysis

Más de University of Groningen

ESE-Eyes-2023.pdf von
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfUniversity of Groningen
99 views35 Folien
APPIS-FDGPET.pdf von
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdfUniversity of Groningen
14 views54 Folien
stat-phys-appis-reduced.pdf von
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfUniversity of Groningen
11 views45 Folien
prototypes-AMALEA.pdf von
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdfUniversity of Groningen
34 views148 Folien
stat-phys-AMALEA.pdf von
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfUniversity of Groningen
68 views91 Folien
Evidence for tissue and stage-specific composition of the ribosome: machine l... von
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...University of Groningen
140 views35 Folien

Más de University of Groningen(19)

Evidence for tissue and stage-specific composition of the ribosome: machine l... von University of Groningen
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
2020: Prototype-based classifiers and relevance learning: medical application... von University of Groningen
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
2020: So you thought the ribosome was constant and conserved ... von University of Groningen
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
Prototype-based classifiers and their applications in the life sciences von University of Groningen
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
2013: Sometimes you can trust a rat - The sbv improver species translation ch... von University of Groningen
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Prototype-based learning and adaptive distances for classification von University of Groningen
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
2015: Distance based classifiers: Basic concepts, recent developments and app... von University of Groningen
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
June 2017: Biomedical applications of prototype-based classifiers and relevan... von University of Groningen
June 2017: Biomedical applications of prototype-based classifiers and relevan...June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...

Último

ZEBRA FISH: as model organism.pptx von
ZEBRA FISH: as model organism.pptxZEBRA FISH: as model organism.pptx
ZEBRA FISH: as model organism.pptxmahimachoudhary0807
6 views17 Folien
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy... von
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Anmol Vishnu Gupta
7 views10 Folien
Krishna VSC 692 Credit Seminar.pptx von
Krishna VSC 692 Credit Seminar.pptxKrishna VSC 692 Credit Seminar.pptx
Krishna VSC 692 Credit Seminar.pptxKrishnaSharma682993
11 views54 Folien
POSTER IV LAWCN_ROVER_IUE.pdf von
POSTER IV LAWCN_ROVER_IUE.pdfPOSTER IV LAWCN_ROVER_IUE.pdf
POSTER IV LAWCN_ROVER_IUE.pdfSOCIEDAD JULIO GARAVITO
11 views1 Folie
NUTRITION IN BACTERIA.pdf von
NUTRITION IN BACTERIA.pdfNUTRITION IN BACTERIA.pdf
NUTRITION IN BACTERIA.pdfNandadulalSannigrahi
36 views14 Folien
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana... von
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...jahnviarora989
7 views12 Folien

Último(20)

Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy... von Anmol Vishnu Gupta
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana... von jahnviarora989
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...
jahnviarora9897 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... von ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio... von Trustlife
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...
Trustlife142 views
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... von SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 views
Factors affecting fluorescence and phosphorescence.pptx von SamarthGiri1
Factors affecting fluorescence and phosphorescence.pptxFactors affecting fluorescence and phosphorescence.pptx
Factors affecting fluorescence and phosphorescence.pptx
SamarthGiri17 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... von ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI8 views
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... von Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
selection of preformed arch wires during the alignment stage of preadjusted o... von MaherFouda1
selection of preformed arch wires during the alignment stage of preadjusted o...selection of preformed arch wires during the alignment stage of preadjusted o...
selection of preformed arch wires during the alignment stage of preadjusted o...
MaherFouda17 views
Determination of color fastness to rubbing(wet and dry condition) by crockmeter. von ShadmanSakib63
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
ShadmanSakib636 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... von InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific105 views

The statistical physics of learning revisted: Phase transitions in layered neural networks

  • 1. Leipzig, June 2021 1 / 24 The statistical physics of learning revisited: Phase transitions in layered neural networks Elisa Oostwal Michiel Straat Michael Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen / NL Physica A Vol. 564, 2021, 125517 (open access) Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation
  • 2. Leipzig, June 2021 2 / 24 the revival of neural networks success of multi-layered neural networks (Deep Learning) • availability of large amounts of training data • increased computational power • improved training procedures and set-ups • task specific network designs, e.g. activation functions many open questions / lack of theoretical understanding
  • 3. Leipzig, June 2021 3 / 24 statistical physics of learning statistical physics of neural networks training of feed-forward neural networks: Elizabeth Gardner (1957-1988). The space of interactions in neural networks. J. Phys. A 21:257-270, 1988 dynamics of attractor neural networks: John Hopfield. Neural Networks and physical systems with emergent collective computational abilities. PNAS 79(8):2554-2558, 1982 1991 2001 2011 a successful branch of learning theory:
  • 4. Leipzig, June 2021 4 / 24 statistical physics of learning
  • 5. Leipzig, June 2021 5 / 24 N units: high-dim. input example: a shallow neural network K hidden units with activation linear output soft committee machine input/output function defined by • architecture, connectivity, activation functions • adaptive weights ↑ target function • regression: learning from example data e.g.
  • 6. Leipzig, June 2021 6 / 24 statistical physics of learning in a nutshell objective/cost/energy function with • equilibrium state: compromise/competition between minimal energy (ground state) vs. number (volume) of available states with higher energy • e.g. Metropolis algorithm, noisy gradient descent (Langevin) with equilibrium (Gibbs-Boltzmann) control parameter: „inverse temperature“ β =1 / T • training by stochastic optimization of all adaptive weights „thermal averages “ over Peq e.g. minima of free energy microcanonical entropy:
  • 7. Leipzig, June 2021 7 / 24 machine learning specifics • energy function is given for a specific set of example data: defined w.r.t. • typical properties: additional average of the free energy over difficult, even for the simplest model density: with independent identically distributed (i.i.d.) unstructured input density • disorder-average of the free energy requires (e.g.) replica trick frozen disorder
  • 8. Leipzig, June 2021 8 / 24 machine learning at high temperatures  • a simplifying limit: high (formal) temperature with finite “learn almost nothing... (high T ) ...from very many examples” • independent i.i.d. examples: generalization error limitations: - training error and generalization error cannot be distinguished - number of examples and training temperature are coupled - (at best) qualitative agreement with low temperature results • large number of examples: , in the limit
  • 9. Leipzig, June 2021 9 / 24 adaptive student N inputs (K) hidden units (M) teacher ? ? ? ? ? ? ? modelling: student teacher scenario training: minimization of here: learnable rules, reliable data (outputs provided by teacher) perfectly matching complexity K=M two prototypical activation functions: sigmoidal / ReLU in student and teacher
  • 10. Leipzig, June 2021 10 / 24 thermodynamic limit, CLT for normally distributed with zero mean and covariance matrix large N: Central Limit Theorem order parameters: model parameters: macroscopic properties of the system (+ constant) independent of details (e.g. activation)
  • 11. Leipzig, June 2021 11 / 24 generalization error on average over P({xi,xj *}) [D. Saad, S. Solla, 1995] [M. Straat, 2019] sigmoidal activation rectified linear units
  • 12. Leipzig, June 2021 12 / 24 site symmetry simplification: orthonormal teacher vectors, isotropic input density reflects permutation symmetry, allows for hidden unit specialization sigmoidal hidden units ReLU activations entropy (+ constant)
  • 13. Leipzig, June 2021 13 / 24 given 𝛼, determine (global and local) minima of given: size of the training data set K, g(z), obtain learning curves typical learning curves order parameters and generalization error as a function of the (scaled) training set size solve:
  • 14. Leipzig, June 2021 14 / 24 sigmoidal ( K = 2 ) invariance under exchange of the two hidden units R=S: both units ~ (w1 * + w2 *) + noise symmetry breaking phase transition (second order, continuous) ... ... results in a kink in the typical learning curve
  • 15. Leipzig, June 2021 15 / 24 ReLU ( K = 2 ) qualitatively identical behavior Note: num. values of and/or are irrelevant, scale depends a.o. on pre-factor of g(z) Physica A Vol. 564, 2021, 125517
  • 16. Leipzig, June 2021 16 / 24 sigmoidal ( K > 2 ) K=5 permutation symmetry of h.u. initial R=S phase discontinuous jump in ε g coexistence of poor and good generalization first order transition, local min. R>S competes with R=S R>S becomes global minimum facilitates perfect learning “anti-specialization” S>R (overlooked in 1998!) weak/no effect of additional anti-specialization on generalization error
  • 17. Leipzig, June 2021 17 / 24 ReLU ( K > 2) K=10 permutation symmetry of h.u. initial R=S phase continuous kink in ε g competing minima of poor* vs. good generalization continuous phase transition global minimum: R>S local minimum: R<S * pretty good Physica A Vol. 564, 2021, 125517
  • 18. Leipzig, June 2021 18 / 24 ReLU ( large K ) permutation symmetry of h.u. initial R=S phase specialized and anti-specialized branch achieve perfect generalization, asymptotically ! (due to partial linearity of ReLU) continuous phase transition at degenerate minima: R>S, R<S
  • 19. Leipzig, June 2021 19 / 24 Monte Carlo simulations histogram of observed Rij continous Metropolis, ReLU activation, K=4, N=50, β=1 (=T) gen. error vs. time, specialized and unspecialized initialization anti-specialized specialized unspecialized R=S R S S R
  • 20. Leipzig, June 2021 20 / 24 Monte Carlo simulations sigmoidal activation ReLU K= 4 large gap / high barrier between specialized and unspecialized states delays success of learning anti-specialized states display near optimal performance for large K stationary generalization error:
  • 21. Leipzig, June 2021 21 / 24 • formal equilibrium of training at high temperature in student/teacher model situations of supervised learning • unspecialized and partially or anti-specialized configurations compete as local/global minima of the free energy • phase transitions with scaled number of examples:  K=2: continuous symmetry-breaking transitions with equivalent competing states  K>2, sigmoidal activations: first order transition with competing states of distinct generalization ability  K>2, ReLU networks: continuous transition with competing states of similar performance Summary
  • 22. Leipzig, June 2021 22 / 24 piece-wise linear „sigmoidal“ activation ReLU increasing slope discontinuous to continuous Outlook which is the decisive property of the activation? • consider various activation functions (leaky ReLU, swish ... ) most important question: • study more complex solutions beyond site-symmetry piece-wise linear activtations
  • 23. Leipzig, June 2021 23 / 24 • replica trick / annealed approximation - low temperatures, vary # of examples and T independently - mismatched student/teacher networks 𝐾 ≠ 𝑀 - overfitting / underfitting effects • complementary approach: - dynamics of stochastic gradient descent - description in terms of ODE for order parameters • deep networks - many hidden layers - tree-like architectures with uncorrelated branches • realistic input data - clustered / correlated data - recent developments: Zdeborova, Mezard, Goldt et al. outlook (selected topics)
  • 24. Leipzig, June 2021 24 / 24 www.cs.rug.nl/~biehl m.biehl@rug.nl Questions ? see also for: algorithm development in machine learning applications in medicine, life sciences, astronomy …