12. Multi-layer neural network
MLP uses multiple hidden layers between the input
and output layers to extract meaningful features
A Neural Network = A Function
MLP(Multi-Layer Perceptron)
28
15. Find network weights to minimize the training
error between true and estimated labels of training
examples, e.g.:
Training of multi-layer networks
31
16. Back-propagation: gradients are computed in the
direction from output to input layers and
combined using chain rule
SGD(Stochastic gradient descent): compute the
weight update w.r.t. one training example at a time,
cycle through training examples in random order in
multiple epochs Slow Convergence
每次隨機選一個樣本,一筆一筆去更新很慢
• mini-batch SGD (a batch of samples computed
simultaneously) faster to complete one epoch
Optimizer
32
18. Mini-batch is expected to be called several times
consecutively on different chunks of a dataset so as to
implement out-of-core or online learning.
This is especially useful when the whole dataset is too
big to fit in memory at once.
Mini-batch vs. Epoch
*一個epoch = 看完所有training data 一次
*依照mini-batch 把所有training data 拆成多份
假設全部有1000 筆資料
batch size = 100 可拆成10 份 一個epoch 內會更新10 次
batch size = 10 可拆成100 份 一個epoch 內會更新100 次
*如何設定batch size?
不要設太大,常用28, 32, 128, 256, …
mini-batch: partial fit method
34
20. To avoid falling into the local minimum and further
increase the training speed
Adaptive Learning Rate/Gradient algorithms
1. Adagrad
2. Momentum
3. RMSProp
4. Adam
5. …
Adaptive Learning Rate/Gradient algorithms
36