4. Neural network NEURAL NETWORK FOR BEGINNERS, JUST LINEAR REGRESSION
f Y = f(X,w) = w1 + w2X2 + w3X3 + w4X41
X2
X3
X4
w4
w3
w1
w2 Neural network compute node
f is the so-called activation function. This could be
the logit function, but other choices are possible.
There are no hidden layers.
There are four weights w’s that have to
be determined
5. Neural networks ONE HIDDEN LAYER, MATHEMATICAL FORMULATION
Age
Income
Region
Gender
X1
X2
X3
X4
Z1
Z2
Z3
f
X inputs Hidden layer z outputs
α1
β1
neural net prediction f = 𝑔 𝑇𝑌
𝑇𝑌 = 𝛽0𝑌 + 𝛽 𝑌
𝑇
𝑍
𝑍 𝑚 = 𝜎 𝛼0𝑚 + 𝛼 𝑚
𝑇 𝑋
The function σ is defined as:
𝜎(𝑥) =
1
1+𝑒−𝑥
𝝈 is also called the activation function,
In case of regression the function g is the Identify function I
In case of a binary classifier, g is the softmax 𝑔 𝑇𝑌 =
𝑒 𝑇 𝑌
𝑒 𝑇 𝑁+𝑒 𝑇 𝑌
The model weights w = (α , β) have to be estimated from the data
m = 1, ... ,M
number of nodes / neurons
in the hidden layer
6. Neural networks
Back propagation algorithm
is just gradient descent in numerical optimization terms
Randomly choose small values for all wi’ s. For each data point (observation) i :
• Calculate the neural net prediction fi
• Calculate the error, for example for regression squared error (yi – fi)2
• Calculate the sum of all errors: E = Σ (yi – fi)2
Adjust weights w according to:
A run through all observations is called an epoch
Stop if error E is small enough.
Training the weights
𝑤𝑖
𝑛𝑒𝑤
= 𝑤𝑖 + ∆𝑤𝑖
∆𝑤𝑖 = −𝛼
𝜕𝐸
𝜕𝑤𝑖
8. Deep learning LOOSELY DEFINED:
NEURAL NET WORK WITH MORE THAN 2 HIDDEN LAYERS
Don’t use deep learning for ‘simple’ business
analytics problems… it is really an overkill!
Keep it simple if you have ‘classical’ churn or
response models: logistics regression, trees,
or forests.
In this example all layers are fully connected (or also called dense layers)
9. Convolutional networks
For computer vision special structures are used.Usually not all layers fully connected.
We have so-called Convolutional layers and pooling layers.
Convolutional layer A, takes only from a
local window inputs from previous layer
Pooling layer ‘max’, takes max value of a bunch of inputs
10. But pictures are arrays…. No problem
These blocks of numbers are called “tensors” in linear algebra terms.
Calculations on these tensors can be done very fast in parallel on GPU’s
11. Training images
VGG19 deep learning networks structure
The model achieves 92.7% top-5 test
accuracy in ImageNet , which is a dataset of
over 14 million images belonging to 1000
classes. 143.mln weights!
Target output:
1000 classes
13. Keras
• Keras is a high-level neural networks API, written in Python and
capable of running on top of either TensorFlow or Theano.
• It was developed with a focus on enabling fast experimentation.
• Being able to go from idea to result with the least possible delay is
key to doing good research.
• Specifying models in keras is at a higher level than tensorflow, but you
still have lot’s of options
• There is now also an R interface (of course created by Rstudio… )
14. Simpel set-up “Architecture”
Tensorflow installed on a (linux) machine
Ideally with lots of GPU’s
pip install keras
You’re good to go in
Python
(Jupyter notebooks)
install_github("rstudio/keras")
You’re good to go in
R / RStudio
15. Training from scratch: MNIST example
MNIST data:
70.000 handwritten digits with a
label (“0”, “1”,…,”9”)
Each image has a resolution of
28*28 pixels, so a 28 by 28 matrix
16. First a simple neural network in R
Treat image as a vector. It has length 784 (28by28), the number of
pixels. One hidden layer (fully connected)
Pixel 3
Pixel 2
Pixel 1
Pixel 783
Pixel 784
neuron 1
neuron 256
Label 0
Label 9
17. First a simple neural network in R
N of neurons time for 50 epochs Test accuracy
5 39 s 0.8988
15 51 s 0.9486
25 44 s 0.9626
50 51 s 0.9741
100 73 s 0.9751
256 125 s 0.9796
512 213 s 0.9825
1024 314 s 0.9830
21. Now compare with GPU
Some extra steps:
1. Spin up: Microsoft NC6 machine: 1 X Tesla K80 GPU ($1.084/hr)
2. Install CUDA toolkit / install cuDNN
3. pip install tensorflow-gpu
Run same model as in previous slide: Now it takes 2.9 minutes
23. Tensorboard
TensorBoard is a visualization tool included with TensorFlow
It enables you to visualize dynamic graphs of your Keras training and
test metrics, as well as activation histograms for the different layers in
your model.
model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 2,
callbacks = callback_tensorboard(
log_dir = "logs/run_1",
write_images = TRUE
),
validation_split = 0.2
)
24. Now open a shell and start tensorboard, providing the log directory
26. Using pre-trained models
Image classifiers have been trained on big GPU machines
for weeks with millions of pictures on very large networks
Not many people do that from scratch. Instead, one can
use pre-trained networks and start from there.
29. Images from Videos
Use ffmpeg: open source tool for video analyses Example call for Dutch series Family Kruys trailer
ffmpeg –i
"FAMILIE_KRUYS_TRAILER.mp4"
-s 600x400 –ss 00:00:05.000
-t 1200 -r 2.0
"FamKruys%03d.jpg"
34. RTL NIEUWS Image similarity
1024 RTL Nieuws Sample pictures. Compute for each image the 25.088 feature values.
Calculate for each image the top 10 closest images, based on cosine similarity.
Little Shiny APP
40. Take five Brad Pitt pictures
Run them trough the pre-trained
vgg16 and extract feature vectors.
This is a 5 by 25088 matrix
The brad Pit Index
Take other images, run them through the VGG16
Calculate the distances with the five Brad Pitt pictures and average:
0.771195 0.802654 0.714752 0.792587 0.8291976 0.8096944 0.665990 0.9737212
41. 0.6273 0.5908 0.8231 0.7711 0.8839 0.8975 0.6934 0.9659
Focusing on only the face!!
43. Transfer learning or
finetune pre-trained models
Train new image classifiers on limited training cases
• Get a pretrained model, say VGG16
• Remove existing top layers
• Add your own (fully) connected layer(s)
• Fix all the parameters except for your layers
• Use your (limited) samples as train cases to train the
weights of your layers.
44. Python code example
base_model = VGG16(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(256, activation='relu')(x)
# and a logistic layer -- 2 classes dogs and cats
predictions = Dense(2, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics =['accuracy'])