Introduction to Neural Networks and Deep Learning

Introduction to Neural Networks
and Deep Learning
Dr. Vahid Mirjalili
CSE802 – Pattern Recognition Lecture
Michigan State University
March 12, 2018
1

Outline
• Feed-forward neural networks
➖ Forward pass
➖ Backpropagation
• Activation functions
• Multilayer Perceptrons (MLP)
• Convolutional Neural Networks (CNN)
2

Classification Recap
Finding the decision boundary between two classes
𝑤1 𝑤2 ×
𝑥1
𝑥2
+ 𝑏 = 0
𝑥1
𝑥2
𝒃
𝑃(𝜔1| 𝑥) = 𝜎(𝑤 𝑇
𝑥 + 𝑏)
𝑧
𝑃
1
0
𝑦 ∈ {𝜔1, 𝜔2}
Decision Boundary:
3
𝑧 = 𝑤 𝑇
𝑥 + 𝑏

Perceptron
4
Input units (neuron) Activation
Function
𝒃
𝑧+ 𝑦
𝜎
𝑥2
𝑥1
Output units
(class label)

𝑦1
𝑦2
= 𝑊 𝑇
× 𝑥 + 𝑏
𝑥 =
𝑥1
𝑥2
𝑥3
𝑊 𝑇
=
𝑤1,1 𝑤2,1 𝑤3,1
𝑤2,1 𝑤2,2 𝑤3,2
Neural Network
𝑏 =
𝑏1
𝑏2
Weights:
Bias:
b
𝑥1
𝑥2
𝑤1,1
𝑤2,1
𝑦1
𝑦2
𝑥3
𝑤3,1
5
Input Layer Output Layer

𝑦1
𝑦2
= 𝑊 𝑇
× 𝑥
𝑥 =
𝑥1
𝑥2
𝑥3
1
𝑊 𝑇
=
𝑤1,1 𝑤2,1 𝑤3,1 𝑏1
𝑤1,2 𝑤2,2 𝑤3,2 𝑏2
Neural Networks
Weights:
1
𝑥1
𝑥2
𝑤1,1
𝑤2,1
𝑦1
𝑦2
𝑥3
𝑤3,1
6

Perceptron in two formulations
𝑤1,1 𝑤2,1 𝑤3,1
𝑤2,1 𝑤2,2 𝑤3,2
×
𝑥1
𝑥2
𝑥3
+
𝑏1
𝑏2
=
𝑤1,1. 𝑥1 + 𝑤2,1. 𝑥2 + 𝑤3,1. 𝑥3 + 𝑏1
𝑤1,2. 𝑥1 + 𝑤2,2. 𝑥2 + 𝑤3,2. 𝑥3 + 𝑏2
𝑦1
𝑦2
= 𝑊 𝑇
× 𝑥 + 𝑏
𝑤1,1 𝑤2,1 𝑤3,1 𝑏1
𝑤1,2 𝑤2,2 𝑤3,2 𝑏2
×
𝑥1
𝑥2
𝑥3
1
=
𝑤1,1. 𝑥1 + 𝑤2,1. 𝑥2 + 𝑤3,1. 𝑥3 + 𝑏1
𝑤1,2. 𝑥1 + 𝑤2,2. 𝑥2 + 𝑤3,2. 𝑥3 + 𝑏2
7
𝑦1
𝑦2
= 𝑊 𝑇
× 𝑥

Linear decision boundary
Assume we have two features: 𝑥1, 𝑥2 ∈ {0, 1}
4 data points
A perceptron can handle linear
decision boundaries.
𝑥1
𝑥2
𝑦
8
x1 x2 y
0 0 ..
1 1 ..
0 1 ..
1 0 ..

Non-linear decision boundary
How can we handle
such a non-linear
decision boundary?
9
A perceptron can only
handle linear decision
boundary ..

Non-linear decision boundary
A multilayer perceptron (MLP)
can handle such a non-linear
decision boundary.
𝑥1
𝑥2
𝑦
𝐴𝑁𝐷( 𝑥1, 𝑥2)
𝐴𝑁𝐷(𝑥1, 𝑥2)
Transform to
new features
10

Activation Functions 𝜎 𝑥 =
𝑒 𝑥
1 + 𝑒 𝑥
𝜕..
𝜕𝑥
𝜎 𝑥 − 𝜎2
(𝑥)
tanh(𝑥) =
𝑒 𝑥 − 𝑒−𝑥
𝑒 𝑥 + 𝑒−𝑥
𝜕..
𝜕𝑥
1 − tanh2
(𝑥)ReLU(𝑥) =
𝑥 𝑖𝑓 𝑥 > 0
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝐿𝑖𝑛𝑒𝑎𝑟(𝑥) = 𝑥
11

A Multi-Layer Perceptron (MLP)
• Weights are associated to
each connection
• Neural Network will learn
these weights from
training data
1
1
𝑥1
𝑥2
𝑦
12
Hidden Layer with
4 hidden units

Multi-Layer Perceptron (with 2 hidden layer)
bb b
Hidden
Layer 1
Hidden
Layer 2
Input output
13

Forward pass
1 1
𝑥1
𝑥2
⋮
𝑥 𝑛
𝑦1
⋮
𝑦 𝑚
• Information flows through
the network based on the
weights of each
connection to compute
output vector
14
ℎ = 𝑓ℎ(𝑊1
𝑇
𝑥)
𝑦 = 𝑓𝑦(𝑊2
𝑇
ℎ)

Back-propagation • The error computed from
expected output and computed
ones is propagated through the
network to adjust the weights1 1
𝛿1
⋮
𝛿 𝑚
=
𝑦1 − 𝑦1
⋮
𝑦 𝑚 − 𝑦 𝑚
15
A Neural Network Learns
from its mistakes!
𝛿𝑊2
𝛿𝑊1
𝑊1 = 𝑊1 + 𝛿𝑊1
𝑊2 = 𝑊2 + 𝛿𝑊2

Decision Boundaries
m=0 m=1
m=2 m=3
Number of hidden units: n=2
Varying the number of
hidden layers (m)
16

Decision Boundaries Number of hidden units: n=4
m=1 m=2
17

Decision Boundaries
m=1 m=2
Number of hidden units: n=8
18

m=3m=2m=1m=0
n=2
n=4
n=8
n=16
Capacity of a neural network
determines the complexity
of decision boundary that it
can handle.
Credit: decision boundaries are
plotted using mlxtend:
http://rasbt.github.io/mlxtend/

Example: Hand-written digit recognition
Each sample is a 28x28 gray-scale
image of hand-written digits 0—9.
Ground-truth labels are shown in red.
0
1
2
3
4
5
6
7
8
9
Predict the
class label?
21

A 2-layer neural network for hand-written digit
recognition
28x28
784 pixels
Flatten the 2D
input 0
1
7
8
9
2
Max
22

Implementation
in PyTorch
Test Accuracy:
97.85%
23

Convolutional Neural Networks (CNN)
• Neural network for image recognition problems must deal with
high-dimensional data (e.g. an image of 100x100 pixels)
• Fully-connected (FC) networks require too many parameters,
inefficient computation
• FC networks limit the depth of networks
• CNNs are computationally more efficient and can be used for
deep neural networks
 state-of-the-art performance in computer vision applications
24

Building blocks of Conv. Neural Networks
• Convolutional layer
• Non-linear activation
• Pooling layer
• May/may not include fully connected layer
• Recent developments also include
• dropout
• normalization layer
• residual block
• etc
25

Convolution Operation
0.1 0.4 0.3
0.40.3 0.7
0.80.1 0.2
*
0.3 0.4 0
0.2 0.5 0.1
0.7 0.2 0.2
𝑋 ∗ 𝑊 =
𝑖=1,𝑗=1
𝑘1,𝑘2
𝑋 𝑖, 𝑗 . 𝑊[𝑖 − 𝑘1, 𝑗 − 𝑘2]
𝐼𝑛𝑝𝑢𝑡: 𝑋 𝑘1,𝑘2
𝐾𝑒𝑟𝑛𝑒𝑙: 𝑊𝑘1,𝑘2
Input: 3x3 Kernel: 3x3
0.8 0.2 0.1
0.30.4 0.7
0.10.3 0.4
0.3 0.4 0
0.2 0.5 0.1
0.7 0.2 0.2
Rotated
1.15
26

Exercise: compute the following convolution
0.2 0.2 0.8
0.520.24 2
0-0.4 0.1
*
1.5 0 -2
1 0.5 2
-1 2.5 0
??
1.5x0 + 0x0.1 + (-2)x(-0.4) + 1x0.52 + 0.5x2 + 2x0.24 + (-1)x0.8 + 2.5x0.2 + 0x0.2 = 2.5
27

Convolution in Neural Networks
• Usually, kernel is smaller than input matrix
• Sliding the kernel over the input matrix of size
Input size: 8x8 Output size: 6x6
Visualization: https://ezyang.github.io/convolution-visualizer/index.html28

Convolution options
Zero-padding:
• Add zeros on each side of input
matrix
• Zero-padding is used to control the
output size
• Different padding schemes: same,
valid, full
Stride:
• The step between sliding windows
0 00 0 0 0 0 0 0 0
0 00 0 0 0 0 0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Padding: p = 1
Output size: 8x8
29

Dealing with multiple feature maps
• Input can have multiple feature maps called channels (e.g. RGB
image has 3 channels)
• Apply convolution to each input channel separately, then compute
the sum/average of results
• Repeat this process to reach the desired number of output channels
Conv.
28x28x3
28x28x8
Question: How many
kernels is needed for this
convolution layer to go
from 3 input channels to 8
output channels?
30

Convolution Layer: Multiple
feature mapsz
Each input feature map
can have a separate
kernel
31

Pooling (subsampling)
• Reduces the size of feature maps for computational efficiency
and reducing network size
• Two pooling options:
• Max-pooling
• Average-pooling
-0.5 1.7
1 0.5
Max-pooling
2x2
Average-pooling
2x2
max −0.5, 1.7, 1, 0.5 = 1.7
−0.5 + 1.7 + 1 + 0.5
4
= 0.675
32

Pooling on input of size 6x6
33

Dealing with multiple feature maps
• Conv. layer can change the number of channels
• Pooling layer can only change the size of feature maps; the
number of channels stays the same
Conv. Pooling
28x28x3
28x28x8 14x14x8
34

CNN: Stacking Convolution and Pooling
Layers
Output feature mapsInput: a 2D image (RGB)
Pooling output
35

CNN for hand-written digit recognition
36

Implementation in
PyTorch
Test Accuracy:
99.32%
37

Visualizing feature maps
39
Input Image:
(Gray-scale)
28 × 28 × 3228 × 28 × 32
14 × 14 × 32
ReLU#1
MaxPooling#1
Conv.#1

14 × 14 × 64 14 × 14 × 64
7 × 7 × 64
These features will be
flattened and passed to
the fully-connected layer
Conv#2 ReLU#2
MaxPooling#2
40
Visualizing feature maps

Advantages of CNN over densely-connected networks
• Reducing the number of parameters
• Local connectivity  for example, pixels on the face are more related
to each other
• Parameter sharing  same kernel is used across the entire input
matrix (sliding)
• Max-pooling provides local invariance
The properties of CNNs made it feasible to train deep neural
networks
41

Example: things we can do with conv. neural networks
Object Detection:
• pedestrians, cars, ...
Link to video: https://youtu.be/_zZe27JYi8Y
42

Example: things we can do with conv. neural networks
Object Segmentation
(semantic segmentation)
Link to video: https://youtu.be/PNzQ4PNZSzc
43

Summary
• Covered two types of feed-forward neural networks
• Multilayer perceptrons (MLP)
• Convolutional neural networks (CNN)
• Example Implementation in PyTorch
• CNNs have shown significant performance in computer vision
tasks
44

Introduction to Neural Networks and Deep Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Introduction to Neural Networks and Deep Learning

Ähnlich wie Introduction to Neural Networks and Deep Learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Introduction to Neural Networks and Deep Learning

Hinweis der Redaktion