https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Â
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
1. [course site]
Javier Ruiz Hidalgo
javier.ruiz@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Loss functions
#DLUPC
2. Me
â Javier Ruiz Hidalgo
â Email: j.ruiz@upc.edu
â Office: UPC, Campus Nord, D5-008
â Teaching experience
â Basic signal processing
â Image processing & computer vision
â Research experience
â Master on hierarchical image representations by UEA (UK)
â PhD on video coding by UPC (Spain)
â Interests in image & video coding, 3D analysis and super-resolution
2
3. Outline
â Introduction
â Definition, properties, training process
â Common types of loss functions
â Regression
â Classification
â Regularization
â Example
3
4. Definition
In a supervised deep learning context the loss
function measures the quality of a particular
set of parameters based on how well the output
of the network agrees with the ground truth
labels in the training data.
4
6. Loss function (1)
6
How good does
our network with
the training data?
labels (ground truth)
input
parameters (weights, biases)error
Deep Network
input output
7. Loss function (2)
â The loss function does not want to measure
the entire performance of the network
against a validation/test dataset.
â The loss function is used to guide the
training process in order to find a set of
parameters that reduce the value of the loss
function.
7
8. Training process
Stochastic gradient descent
â Find a set of parameters which make the loss as small
as possible.
â Change parameters at a rate determined by the partial
derivatives of the loss function:
8
9. Properties (1)
â Minimum (0 value) when the output of the
network is equal to the ground truth data.
â Increase value when output differs from
ground truth.
9
10. Properties (2)
10
â Ideally â convex function
â In reality â some many
parameters (in the order of
millions) than it is not
convex
â Varies smoothly with changes on the output
â Better gradients for gradient descent
â Easy to compute small changes in the
parameters to get an improvement in the
loss
11. Assumptions
â For backpropagation to work:
â Loss function can be written as an average over loss
functions for individual training examples:
â Loss functions can be written as a function of the
output activations from the neural network.
11Neural Networks and Deep Learning
empirical risk
12. Outline
â Introduction
â Definition, properties, training process
â Common types of loss functions
â Regression
â Classification
â Regularization
â Example
12
13. Common types of loss functions (1)
â Loss functions depen on the type of task:
â Regression: the network predicts continuous,
numeric variables
â Example: Length of fishes in images,
temperature from latitude/longitud
â Absolute value, square error
13
14. Common types of loss functions (2)
â Loss functions depen on the type of task:
â Classification: the network predicts categorical
variables (fixed number of classes)
â Example: classify email as spam, predict student
grades from essays.
â hinge loss, Cross-entropy loss
14
15. Absolute value, L1-norm
â Very intuitive loss function
â produces sparser solutions
â good in high dimensional spaces
â prediction speed
â less sensitive to outliers
15
16. Square error, Euclidean loss, L2-norm
â Very common loss function
â More precise and better than L1-norm
â Penalizes large errors more strongly
â Sensitive to outliers
16
17. Outline
â Introduction
â Definition, properties, training process
â Common types of loss functions
â Regression
â Classification
â Regularization
â Example
17
18. Classification (1)
We want the network to classify the input into a
fixed number of classes
18
class â1â
class â2â
class â3â
19. Classification (2)
â Each input can have only one label
â One prediction per output class
â The network will have âkâ outputs (number of
classes)
19
0.1
2
1
class â1â
class â2â
class â3â
input
Network
output
scores / logits
20. Classification (3)
â How can we create a loss function to
improve the scores?
â Somehow write the labels (ground truth of the data)
into a vector â One-hot encoding
â Non-probabilistic interpretation â hinge loss
â Probabilistic interpretation: need to transform the
scores into a probability function â Softmax
20
21. Softmax (1)
â Convert scores into probabilities
â From 0.0 to 1.0
â Probability for all classes adds to 1.0
21
0.1
2
1
input
Network
output
scores / logits
0.1
0.7
0.2
probability
22. Softmax (2)
â Softmax function
22
0.1
2
1
input
Network
output
scores / logits
0.1
0.7
0.2
probability
scores (logits)
Neural Networks and Deep Learning (softmax)
23. One-hot encoding
â Transform each label into a vector (with only 1
and 0)
â Length equal to the total number of classes âkâ
â Value of 1 for the correct class and 0 elsewhere
23
0
1
0
class â2â
1
0
0
class â1â
0
0
1
class â3â
27. Cross-entropy loss (4)
â In general, cross-entropy loss works better than
square error loss:
â Square error loss usually gives too much emphasis to
incorrect outputs.
â In square error loss, as the output gets closer to either 0.0
or 1.0 the gradients get smaller, the change in weights gets
smaller and training is slower.
27
28. Multi-label classification (1)
â Outputs can be matched to more than one label
â âcarâ, âautomobileâ, âmotor vehicleâ can be applied to a same image of a car.
â Use sigmoid at each output independently instead of softmax
28
30. Outline
â Introduction
â Definition, properties, training process
â Common types of loss functions
â Regression
â Classification
â Regularization
â Example
30
31. Regularization
â Control the capacity of the network to prevent
overfitting
â L2-regularization (weight decay):
â L1-regularization:
31
regularization parameter
32. Outline
â Introduction
â Definition, properties, training process
â Common types of loss functions
â Regression
â Classification
â Regularization
â Example
32
33. Example
33Why you should use cross-entropy instead of classification error
0.1 0.3 0.3
0.2 0.4 0.3
0.7 0.3 0.4
â1â â2â â3â
1 0 0
0 1 0
0 0 1
0.3 0.1 0.1
0.4 0.7 0.2
0.3 0.2 0.7
â1â â2â â3â
1 0 0
0 1 0
0 0 1
Classification accuracy = 2/3
cross-entropy loss = 4.14
Classification accuracy = 2/3
cross-entropy loss = 1.92
34. References
â About loss functions
â Neural networks and deep learning
â Are loss functions all the same?
â Convolutional neural networks for Visual Recognition
â Deep learning book, MIT Press, 2016
â On Loss Functions for Deep Neural Networks in
Classification
34