Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
calculation | consulting
This is an early draft of some notes
on the relationship between
statistical physics and deep lea...
calculation|consulting
This is an early draft of some notes
on the relationship between
statistical physics and deep learn...
calculation | consulting stat phys of deep learning
Who Are We?
c|c
(TM)
Dr. Charles H. Martin, PhD
University of Chicago,...
Data Scientists are Different
c|c
(TM)
theoretical physics
machine learning specialist
(TM)
4
experimental physics
data sc...
Statistical Physics of Information Theory
c|c
(TM)
(TM)
5
not my ideas just a summary
calculation | consulting stat phys o...
Statistical Physics of Information Theory
c|c
(TM)
(TM)
6
not my ideas just a summary
calculation | consulting stat phys o...
c|c
(TM)
(TM)
7
Energies: unnormalized probabilities

calculation | consulting stat phys of deep learning
in stat phys and...
c|c
(TM)
(TM)
8
Energy normalization: Partition Function (Z)

calculation | consulting stat phys of deep learning
the norm...
c|c
(TM)
(TM)
9
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
modern...
c|c
(TM)
(TM)
10
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
consi...
c|c
(TM)
(TM)
11
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
lets ...
c|c
(TM)
(TM)
12
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
http:...
c|c
(TM)
(TM)
13
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
http:...
c|c
(TM)
(TM)
14
Scaled Energies: w/ Temperature

calculation | consulting stat phys of deep learning
we do see T in some ...
c|c
(TM)
(TM)
15
Scaled Energies: Temperature smoothing

calculation | consulting stat phys of deep learning
and T arises ...
c|c
(TM)
(TM)
16
Scaled Energies: Max Norm Regularization

calculation | consulting stat phys of deep learning
http://www....
c|c
(TM)
(TM)
17
Scaled Energies: Batch Norm Regularization

calculation | consulting stat phys of deep learning
most rece...
c|c
(TM)
(TM)
18
Scaled Energies: Batch Norm Regularization

calculation | consulting stat phys of deep learning
most rece...
c|c
(TM)
(TM)
19
Recap: energies and temperatures

calculation | consulting stat phys of deep learning
http://www.deeplear...
c|c
(TM)
(TM)
20
Boltzmann Distribution: classic argument (Hill)

calculation | consulting stat phys of deep learning
http...
c|c
(TM)
(TM)
21
Boltzmann Distribution: the most likely distribution ?

calculation | consulting stat phys of deep learni...
min log s.t.
c|c
(TM)
(TM)
22
Boltzmann Distribution: Lagrange multiplier problem

calculation | consulting stat phys of d...
c|c
(TM)
(TM)
23
Boltzmann Distribution: Stirling’s Approximation

calculation | consulting stat phys of deep learning
see...
c|c
(TM)
(TM)
24
calculation | consulting stat phys of deep learning
https://charlesmartin14.wordpress.com/2013/11/14/metr...
c|c
(TM)
(TM)
25
Boltzmann Distribution: and Partition Function
calculation | consulting stat phys of deep learning
https:...
c|c
(TM)
(TM)
26
Partition Function: a generating function
calculation | consulting stat phys of deep learning
we get all ...
c|c
(TM)
(TM)
27
Ground State Energy: the low Temp limit

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
28
Statistical Physics: an ML viewpoint

calculation | consulting stat phys of deep learning
we can derive a...
c|c
(TM)
(TM)
29
Canonical Ensemble: from states to energies

calculation | consulting stat phys of deep learning
microcan...
c|c
(TM)
(TM)
30
Canonical Ensemble: from states to energies

calculation | consulting stat phys of deep learning
sum over...
c|c
(TM)
(TM)
31
Free Energy: back to probabilities

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
32
Free Energy: KL Divergence

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
33
Temperature: a Chernoff parameter

calculation | consulting stat phys of deep learning
given X1,X2 … i.i....
c|c
(TM)
(TM)
34
Temperature: a Chernoff parameter

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
35
Temperature: a Chernoff parameter

calculation | consulting stat phys of deep learning
principle of minim...
c|c
(TM)
(TM)
36
Free Energy: thermodynamic limit

calculation | consulting stat phys of deep learning
free energy density...
c|c
(TM)
(TM)
37
Free Energy: indicates Phase Transitions (PT)

calculation | consulting stat phys of deep learning
thermo...
c|c
(TM)
(TM)
38
Random Energies: sum of exponentials of
random numbers

calculation | consulting stat phys of deep learni...
c|c
(TM)
(TM)
39
sums of exp(rand(x)): concentration result

calculation | consulting stat phys of deep learning
w/expecta...
c|c
(TM)
(TM)
40
calculation | consulting stat phys of deep learning
either 1 event or 0 events are seen, depending on A/B...
c|c
(TM)
(TM)
41
Random Energy Model (REM): setup

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
42
Random Energy Model (REM): …

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
43
Replica Method: an old trick to eval Z

calculation | consulting stat phys of deep learning
expected valu...
c|c
(TM)
(TM)
44
Summary

calculation | consulting stat phys of deep learning
(TM)
c|c
(TM)
c | c
charles@calculationconsulting.com
Nächste SlideShare
Wird geladen in …5
×

Cc stat phys draft

715 Aufrufe

Veröffentlicht am

A very early draft of some notes on statistical physics...not for general consumption yet but i am happy to have comments or even some help

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Cc stat phys draft

  1. 1. calculation | consulting This is an early draft of some notes on the relationship between statistical physics and deep learning (TM) c|c (TM) charles@calculationconsulting.com
  2. 2. calculation|consulting This is an early draft of some notes on the relationship between statistical physics and deep learning (TM) charles@calculationconsulting.com
  3. 3. calculation | consulting stat phys of deep learning Who Are We? c|c (TM) Dr. Charles H. Martin, PhD University of Chicago, Chemical Physics NSF Fellow in Theoretical Chemistry Over 10 years experience in applied Machine Learning Developed ML algos for Demand Media; the first $1B IPO since Google Tech: Aardvark (now Google), eHow, GoDaddy, … Wall Street: BlackRock Fortune 500: Big Pharma, Telecom, eBay www.calculationconsulting.com charles@calculationconsulting.com (TM) 3
  4. 4. Data Scientists are Different c|c (TM) theoretical physics machine learning specialist (TM) 4 experimental physics data scientist engineer software, browser tech, dev ops, … not all techies are the same calculation | consulting stat phys of deep learning
  5. 5. Statistical Physics of Information Theory c|c (TM) (TM) 5 not my ideas just a summary calculation | consulting stat phys of deep learning the book : Merhav (2009) http://webee.technion.ac.il/people/merhav/papers/p138f.pdf ”If I have seen further than others, it is by standing on the shoulders of giants” (Isaac Newton) notes from the web &
  6. 6. Statistical Physics of Information Theory c|c (TM) (TM) 6 not my ideas just a summary calculation | consulting stat phys of deep learning the book : Merhav (2009) http://webee.technion.ac.il/people/merhav/papers/p138f.pdf ”If I have seen further than others, it is by standing on the shoulders of giants” (Isaac Newton) notes from the web &
  7. 7. c|c (TM) (TM) 7 Energies: unnormalized probabilities
 calculation | consulting stat phys of deep learning in stat phys and ML , energies give unnormalized probabilities xj = Ej = - ln xj xj in ML, is an (optional) scale /smoothing parameter in stat phys, is the inverse Temperature
  8. 8. c|c (TM) (TM) 8 Energy normalization: Partition Function (Z)
 calculation | consulting stat phys of deep learning the normalization factor Z is to get probabilities, we do a soft-max transform but we also include the inverse Temperature
  9. 9. c|c (TM) (TM) 9 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning modern nets are layers of nodes and activation functions What happened to E and Z ? They are easy to recover in simple cases…
  10. 10. c|c (TM) (TM) 10 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning consider 1 layer of an RBM
  11. 11. c|c (TM) (TM) 11 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning lets compute the p(h|x) directly from the Energy function we expect the conditional probabilities to factor and to have sigmoid activations
  12. 12. c|c (TM) (TM) 12 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning http://www.youtube.com/watch?v=lekCh_i32iE&t=18m31s
  13. 13. c|c (TM) (TM) 13 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning http://www.youtube.com/watch?v=lekCh_i32iE&t=18m31s we find that the conditional probabilities do factor and we can recover the local sigmoid activations but we don’t include Temperature…although old models did
  14. 14. c|c (TM) (TM) 14 Scaled Energies: w/ Temperature
 calculation | consulting stat phys of deep learning we do see T in some simple reinforcement learning methods
  15. 15. c|c (TM) (TM) 15 Scaled Energies: Temperature smoothing
 calculation | consulting stat phys of deep learning and T arises as a smoothing parameter in Dark Knowledge
  16. 16. c|c (TM) (TM) 16 Scaled Energies: Max Norm Regularization
 calculation | consulting stat phys of deep learning http://www.deeplearningbook.org/slides/dls_2016.pdf We frequently have to rescale the weights in the deep net I simply observe that this, effectively, energy rescaling
  17. 17. c|c (TM) (TM) 17 Scaled Energies: Batch Norm Regularization
 calculation | consulting stat phys of deep learning most recent ideas out of Google Deep Mind http://www.deeplearningbook.org/slides/dls_2016.pdf ReLU mean = 0 variance = 1 Z ~ E energy local layer energies must be rescaled explicitly on each batch step
  18. 18. c|c (TM) (TM) 18 Scaled Energies: Batch Norm Regularization
 calculation | consulting stat phys of deep learning most recent ideas out of Google Deep Mind http://www.deeplearningbook.org/slides/dls_2016.pdf ReLU mean = 0 variance = 1 Z ~ E energy local layer energies must be rescaled explicitly on each batch step
  19. 19. c|c (TM) (TM) 19 Recap: energies and temperatures
 calculation | consulting stat phys of deep learning http://www.deeplearningbook.org/slides/dls_2016.pdf Neural Networks define energies at each layer Sigmoid activations result from normalization and factorization Local energies / weights must be rescaled carefully Lots of hacks to get good convergence Lets turn to some stat mech / stats to see howT arises
  20. 20. c|c (TM) (TM) 20 Boltzmann Distribution: classic argument (Hill)
 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ given the constraints (constant N, E) given many discrete states, the distribution is what is the most probable distribution ?
  21. 21. c|c (TM) (TM) 21 Boltzmann Distribution: the most likely distribution ?
 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ and the most likely energy distribution we expect the most likely distribution of states to both be highly peaked i.e. concentrate to the means very fast
  22. 22. min log s.t. c|c (TM) (TM) 22 Boltzmann Distribution: Lagrange multiplier problem
 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ so peaked we can minimize the log of the distribution as giving are Lagrange multipliers, and aswhere
  23. 23. c|c (TM) (TM) 23 Boltzmann Distribution: Stirling’s Approximation
 calculation | consulting stat phys of deep learning see Art of Computer Programming by Knuth we apply an asymptotically convergent expansion to the terms in the multinomial distribution when taking ; note that term vanishes
  24. 24. c|c (TM) (TM) 24 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ Boltzmann Distribution: Lagrange multiplier problem
 after applying Stirling’s approximation, and taking partials mean number of events this leads to the final most likely distribution … we get giving
  25. 25. c|c (TM) (TM) 25 Boltzmann Distribution: and Partition Function calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ optimal probability average energy partition function central result of Gibbs statistical mechanics
  26. 26. c|c (TM) (TM) 26 Partition Function: a generating function calculation | consulting stat phys of deep learning we get all sorts of useful stuff out of it
  27. 27. c|c (TM) (TM) 27 Ground State Energy: the low Temp limit
 calculation | consulting stat phys of deep learning
  28. 28. c|c (TM) (TM) 28 Statistical Physics: an ML viewpoint
 calculation | consulting stat phys of deep learning we can derive and describe these results using language familiar to the ML community • max entropy principle • KL divergence • Chernoff bounds • sums of random numbers • concentration to the mean • extreme value statistics some results may be familiar; others surprising
  29. 29. c|c (TM) (TM) 29 Canonical Ensemble: from states to energies
 calculation | consulting stat phys of deep learning microcanonical: maximum entropy Boltzmann-Gibbs distribution minimizes the free energy canonical: minimum free energy at constantT
  30. 30. c|c (TM) (TM) 30 Canonical Ensemble: from states to energies
 calculation | consulting stat phys of deep learning sum over states sum over energy levels many states ( ) can have the same energy level E we count them w/ density of states free energy entropy S = ln
  31. 31. c|c (TM) (TM) 31 Free Energy: back to probabilities
 calculation | consulting stat phys of deep learning
  32. 32. c|c (TM) (TM) 32 Free Energy: KL Divergence
 calculation | consulting stat phys of deep learning
  33. 33. c|c (TM) (TM) 33 Temperature: a Chernoff parameter
 calculation | consulting stat phys of deep learning given X1,X2 … i.i.d vars, and a function how fast does event (sum) decay ? where apply Chernoff bound w/ exponential Indicator minimize over
  34. 34. c|c (TM) (TM) 34 Temperature: a Chernoff parameter
 calculation | consulting stat phys of deep learning
  35. 35. c|c (TM) (TM) 35 Temperature: a Chernoff parameter
 calculation | consulting stat phys of deep learning principle of minimum free energy is the equilibrium inverse temperature see book for details & caveats S is really a rate function, as in large deviations theory
  36. 36. c|c (TM) (TM) 36 Free Energy: thermodynamic limit
 calculation | consulting stat phys of deep learning free energy density these may differ: the order of the limits matter annealed (w/ moments)
  37. 37. c|c (TM) (TM) 37 Free Energy: indicates Phase Transitions (PT)
 calculation | consulting stat phys of deep learning thermodynamic functions change abruptly with external changes should be analytic first order PT second order PT discontinuous
  38. 38. c|c (TM) (TM) 38 Random Energies: sum of exponentials of random numbers
 calculation | consulting stat phys of deep learning say we have i.i.d. events w/probability what is the probability that at least one event occurs ?
  39. 39. c|c (TM) (TM) 39 sums of exp(rand(x)): concentration result
 calculation | consulting stat phys of deep learning w/expectation # successes = sum of i.i.d. binary random vars A < B vanishes completely A > B concentrates to mean very fast
  40. 40. c|c (TM) (TM) 40 calculation | consulting stat phys of deep learning either 1 event or 0 events are seen, depending on A/B ln(1- x) x + … sums of exp(rand(x)): proof of concentrations

  41. 41. c|c (TM) (TM) 41 Random Energy Model (REM): setup
 calculation | consulting stat phys of deep learning
  42. 42. c|c (TM) (TM) 42 Random Energy Model (REM): …
 calculation | consulting stat phys of deep learning
  43. 43. c|c (TM) (TM) 43 Replica Method: an old trick to eval Z
 calculation | consulting stat phys of deep learning expected value in moments of Z of ln Z express w/ integer m analytic continuation to real as m-> 0 bad branch cut? deal w/ later
  44. 44. c|c (TM) (TM) 44 Summary
 calculation | consulting stat phys of deep learning
  45. 45. (TM) c|c (TM) c | c charles@calculationconsulting.com

×