SlideShare a Scribd company logo
1 of 29
Download to read offline
Índice Analítico
A long time ago in a
land close enough...
Who said video games were not cool?!
Representing reality. Representation learning
Image samples
Índice Analítico Si menos de dos horas en
comer tú tardas, apresurado
eres
Maestro Yoda. Leal y muy noble
Pablo J. Villacorta Iglesias
pvillacorta@stratio.com
July 2017
Deep Learning Course
Session 1: Introduction to
Artificial Neural Networks
6
Contents
Artificial Neural Networks: concept and motivation
Gradient descent in Logistic regression
The backpropagation algorithm
1
2
3
References and further reading4
7
Review: learning a model from data
Features Target (only in supervised
learning)
8
x1
5.1
x1
= x2
= 3.5
x3
1.4
x4
0.2
Artificial Neural Networks
Concept and motivation
9
1
Motivation I: the need for non-linear decision boundaries
● What happens if we have n = 50 features?
○ Too many new features generated
● Even worse: what if we have an image,
where each pixel is a feature, and want to
learn the concept being displayed?
● Trick: create new features as non-linear combinations of
existing features, and give them to the linear classifier
○ Eg: use x1
and x2
but also x1
x2
, x1
2
, x2
2
○ Pros: still using a simple (white-box) classifier,
understandable by business people (econometrics)
Decision boundary = set of points that are equally likely to belong to any of two classes
Simple classifiers such as linear regression and logistic regression can only find linear boundaries
10
Motivation II: the brain as a “universal” learning algorithm
11
● Neurons behave as computing units. Each neuron
receives electric input signals (called spikes) through the
dendrites, computes an output with them and sends it
through the axon to other neurons connected to it
● The human brain can learn almost any task
○ Let’s see how it is structured and try to imitate
it if we want a really good learning machine
● Axon-dendrite transmission is done in
a mechanism called synapse.
● This process never changes, yet the
brain is always learning
A computational model of a natural neuron
12
x0
=1
x1
x2
x3
Output (called activation):
g( T
x) = g( 0
+ 1
x1
+ 2
x2
+ 3
x3
)
The value of the parameters =
depends on the neuron (they
represent “what the neuron has
learned upt to now”).
Function g(·) is known as the activation
function and it is a non-linear function
of the input.
Most often, it is the sigmoid function:
g(z) = 1 / ( 1 + e-z
)
0
1
2
3
Conclusion: with the sigmoid, each
neuron actually learns a logistic
regression for a (mysterious)
sub-task which contributes
appropriately to the network task.
g( T
x) = 1 / (1 + e-( 0 + 1x1 + 2x2 + 3x3)
)
x1
x = x2
x3
SIMPLE PERCEPTRON
Equation (“hypothesis”) of a small neural network
13
x1
x2
x3
a2
(2)
a1
(2)
a3
(2)
a1
(3)
Input layer
(layer 1)
Output layer
(layer 3)
Hidden layer(s)
(layer 2)
ai
(j)
= activation of the i-th neuron of layer j
B(j)
= matrix of parameters multiplied by the inputs (activations)
from layer j to compute the activations of layer j+1
h (x)
Neuron 1
of layer 2
N2 of L2
Neuron 3
of layer 2
Neuron 1
of layer 3
MULTILAYER PERCEPTRON (MLP)
Equation (“hypothesis”) of a simple neural network
14
x1
x2
x3
Input layer
(layer 1)
Output layer
(layer 3)
Hidden layer(s)
(layer 2)
ai
(j)
= activation of the i-th neuron of layer j
B(j)
= matrix of parameters to be multiplied by the inputs
(activations) from layer j to compute the activations of layer j+1
h (x)
( 1
(1)
)T
1,0
(1)
1,1
(1)
1,2
(1)
1,3
(1)
B(1)
= ( 2
(1)
)T
= 2,0
(1)
2,1
(1)
2,2
(1)
2,3
(1)
( 3
(1)
)T
3,0
(1)
3,1
(1)
3,2
(1)
3,3
(1)
B(2)
= ( 1
(2)
)T
= 1,0
(2)
1,1
(2)
1,2
(2)
1,3
(2)
1
(1)
a1
(2)
2
(1)
a2
(2)
3
(1)
a3
(2)
1
(2)
a1
(3)
a1
(2)
= g(B(1)
10
+ B(1)
11
x1
+ B(1)
12
x2
+ B(1)
13
x3
)
a2
(2)
= g(B(1)
20
+ B(1)
21
x1
+ B(1)
22
x2
+ B(1)
23
x3
)
a3
(2)
= g(B(1)
30
+ B(1)
31
x1
+ B(1)
32
x2
+ B(1)
33
x3
)
h (x) = a1
(3)
= g(B(2)
10
+ B(2)
11
a1
(2)
+ B(2)
12
a2
(2)
+ B(2)
13
a3
(2)
)
which is a logistic regression with new variables a1
, a2
,
a3
created as non-linear transformations of x1
, x2
, x3
A matricial compact form to compute the output
15
ai
(j)
= activation of the i-th neuron of layer j
B(j)
= matrix of parameters multiplied by the inputs (activations)
from layer j to compute the activations of layer j+1
( 1
(1)
)T
1,0
(1)
1,1
(1)
1,2
(1)
1,3
(1)
B(1)
= ( 2
(1)
)T
= 2,0
(1)
2,1
(1)
2,2
(1)
2,3
(1)
( 3
(1)
)T
3,0
(1)
3,1
(1)
3,2
(1)
3,3
(1)
B(2)
= ( 1
(2)
)T
= 1,0
(2)
1,1
(2)
1,2
(2)
1,3
(2)
a1
(2)
= g(B(1)
10
+ B(1)
11
x1
+ B(1)
12
x2
+ B(1)
13
x3
) = g(z1
(2)
)
a2
(2)
= g(B(1)
20
+ B(1)
21
x1
+ B(1)
22
x2
+ B(1)
23
x3
) = g(z2
(2)
)
a3
(2)
= g(B(1)
30
+ B(1)
31
x1
+ B(1)
32
x2
+ B(1)
33
x3
) = g(z3
(2)
)
h (x) = a1
(3)
= g(B(2)
10
+ B(2)
11
a1
(2)
+ B(2)
12
a2
(2)
+ B(2)
13
a3
(2)
)
z1
(2)
z(2)
= z2
(2)
= B(1)
a(1)
= B(1)
z3
(2)
1
x1
x2
x3
z(3)
= B(2)
a(2)
a(3)
= g(z(3)
) (apply g element-wise)
1
g(z1
(2)
)
g(z2
(2)
)
g(z3
(2)
)
a(2)
= (1, g(z(2)
))T
= (apply g element-wise)
Two-class and multiclass classification with a neural network
16
K classes = K neurons in the output layer (K > 2)
2 classes = 1 neuron in the output layer
x1
x2
x3
a2
(2)
a1
(2)
a3
(2)
a1
(3)
We see the whole network
as a function hB
: ℝp
→ℝ
(p is the number of features)
h (x)
We see the whole network as a function hB
: ℝp
→ℝK
(p is the number of features, K is the number of classes)
Gradient descent in
Logistic regression
17
2
The concept of cost function in Machine Learning
● In any Machine Learning model, fitting a model means finding the best values of its parameters.
● The best model is that whose parameter values minimize the total error with respect to the actual outputs
● Since the error depends on the parameters chosen, it is a function called cost function J. The most common error
measure is the MSE (Mean Squared Error):
J ( 0
, 1
, …, R
) = [ 1/(2m) ] i=1
m
( ŷi
- yi
)2
= m = number of training examples
= [ 1/(2m) ] i=1
m
(h 0, 1, …, R
(xi
) - yi
)2
h 0, 1, …, R
= hypothesis (the equation of the model)
● Instead of minimizing the total sum, we minimize the average error, (1/m).
● Finding the optimum of any function f is equivalent to finding the optimum of f / 2.
● Hence we divide by 2 because it eases further calculations.
18
Ideally the cost
function is convex:
for every pair of
points, the curve is
always below the
line (or hyper-plane)
between them
Cost function of one parameter Cost function of two parameters
Gradient descent with one variable
● If the cost function is convex: only one global optimum, which is the global minimum.
● Closed form of the minimum:
○ Compute the derivative of the model equation and find the point where it is 0 (exact solution).
○ Multiple variables: partial derivatives, and solve an equation system to find where all are 0 simultaneously.
● Can be difficult if the model equation is complicated, has many variables (large equation system) or is not derivable.
● Solution: approximate iterative algorithm called gradient descent (also valid for non-convex functions!)
19
GRADIENT DESCENT ALGORITHM
0
←some initial value
←some fixed (small) constant
( is called the learning ratio )
t←0
tolerance←small value (e.g: 0.000001)
while dJ/d | = (t)
> tolerance :
(t+1)
← (t)
- ( dJ/d | = (t)
)
derivate < 0 derivate > 0
⟶
dJ/d < 0 :
(t+1) > (t):
increases
⟵
dJ/d < 0 :
(t+1) < (t):
decreases
* : derivate = 0
Gradient descent in general (variables = ( 0
, 1
, …, p
))
● With multiple variables: evaluate the modulus (noted ||·||) of the gradient vector to test the stopping criterion
20
GRADIENT DESCENT ALGORITHM (p variables)
0
←some initial vector ( 01
, …, 0p
)
←some fixed (small) constant
( is called the learning ratio )
t←0
tolerance←small value (e.g: 0.000001)
while ||∇g | = (t)
|| > tolerance:
(t+1)
← (t)
- || ∇g | = (t)
||
NOTE: ∇g = ( J/ 0
, J/ 1
, … , J/ p
) ∈ ℝp
● In general, cost functions are not convex: many local
optima. Gradient descent does not guarantee the minimum
● The solution found depends on the starting point
In summary: we need to compute the
partial derivatives at each parameter point
Gradient descent with two variables
● Note that the error function J is determined uniquely by the dataset and the shape model being fitted.
● The function (and hence its derivative) does not change during the algorithm, we just evaluate the derivate function
at different values of the model parameters, which are the variables of that function.
● E.g: linear regression with 2 variables x1
, x2
. The model is
h (x) = 0
+ 1
x1
+ 2
x2
Imagine this dataset: x1
= (2, 3, 1.8)
x2
= (4, 5, 3.2) (having only m = 2 examples)
Then
J( 0
, 1
, 2
) = (1/(2·2)) [( 0
+ 2 1
+ 3 2
- 1.8)2
+ ( 0
+ 4 1
+ 5 2
- 3.2)2
]
J/ 0
= (1/2) ( ( 0
+ 2 1
+ 3 2
- 1.8) + ( 0
+ 4 1
+ 5 2
- 3.2) )
J/ 1
= (1/2) ( ( 0
+ 2 1
+ 3 2
- 1.8)·2 + ( 0
+ 4 1
+ 5 2
- 3.2)·4 )
J/ 2
= (1/2) ( ( 0
+ 2 1
+ 3 2
- 1.8)·3 + ( 0
+ 4 1
+ 5 2
- 3.2)·5 )
∇g = ( (1/2) ( ( 0
+ 2 1
+ 3 2
- 1.8) + ( 0
+ 4 1
+ 5 2
- 3.2) ),
(1/2) ( ( 0
+ 2 1
+ 3 2
- 1.8)·2 + ( 0
+ 4 1
+ 5 2
- 3.2)·4 ),
(1/2) ( ( 0
+ 2 1
+ 3 2
- 1.8)·3 + ( 0
+ 4 1
+ 5 2
- 3.2)·5 ) ) 21
and now we can start evaluating ∇g at different
points (t)
, each point being a vector of ℝ3
.
Cost function of logistic regression and a neural network
● In logistic regression, h (x) = 1 / (1 + e-( )
) and so the cost function of a logistic regression,
J( 0
, 1
, …, R
) = [ 1/(2m) ] i=1
m
( 1 / (1 + e-( )
) - yi
)2
is non-convex. A somewhat equivalent, convex cost function in logistic regression is
J( 0
, 1
, …, R
) = -(1/m) [ i=1
m
yi
log(h (xi
)) + (1-yi
)log(1 - h (xi
)) ] (we ignore any regularization term)
22
T
x
T
x
Recall that a neural network can be seen as an aggregation of logistic regressions. In a NN with
K neurons in the output layer (no matter how many are inside), the cost function is
J(B) = -(1/m) [ i=1
m
k=1
K
yi
log(hB
(xi
))k
+ (1-yi
)log(1 - (hB
(xi
))k
) ] B = {B(1)
, B(2)
, … }
which is again non-convex.
How can we compute the partial derivatives J / Bij
(ℓ)
at each step… and not die along the
way? The task seems a bit (computationally) heavy...
The backpropagation
algorithm
23
3
Backpropagation
Algorithm to compute the partial derivatives of the cost function with respect to each parameter.
● Intuition: compute the contribution of each neuron to the final error, and change its weights
accordingly
○ Compute the contribution (deltas) of each neuron to the error of each example separately,
and then accumulate over all examples to obtain the total contribution of each neuron to the
total error.
Example: we first apply forward propagation to compute every aj
(ℓ)
and the output of the network hB
(x)
24
z1
(3)
= B10
(2)
+ B11
(2)
a1
(2)
+ B12
(2)
a2
(2)
a1
(3)
= g(z1
(3)
)
B10
(1)
,B11
(1)
,B12
(1)
z1
(2)
→a1
(2)
B20
(1)
,B21
(1)
,B22
(1)
z2
(2)
→a2
(2)
B10
(2)
,B11
(2)
,B12
(2)
z1
(3)
→a1
(3)
B20
(2)
,B21
(2)
,B22
(2)
z2
(3)
→a2
(3)
x1
x2
B10
(3)
,B11
(3)
,B12
(3)
z1
(4)
→a1
(4)
a1
(2)
a2
(2)
Backpropagation: contribution of a1
(3)
to the error
How wrong is a1
(3)
? In other words:
How much did neuron 1 of layer 3 contribute to the network error on a given example (xi
, yi
) ?
1
(4)
= yi
- a1
(4)
1
(3)
= B11
(3)
1
(4)
because a1
(3)
had contributed to a1
(4)
with the term a1
(3)
B11
(3)
(recall: a1
(4)
= g( B10
(3)
+ a1
(3)
B11
(3)
+ a2
(3)
B12
(3)
) )
25
z1
(2)
→a1
(2)
z2
(2)
→a2
(2)
z1
(3)
→a1
(3)
Error: 1
(3)
z2
(3)
→a2
(3)
x1
x2
B10
(3)
,B11
(3)
,B12
(3)
z1
(4)
→a1
(4)
Error: 1
(4)
Backpropagation: contribution of a1
(2)
to the error
How wrong is a1
(2)
? In other words:
How much did neuron 1 of layer 2 contribute to the network error on a given example (xi
, yi
) ?
1
(2)
= B11
(2)
1
(3)
+ B21
(2)
2
(3)
because a1
(2)
contributed to
26
z1
(2)
→a1
(2)
Error: 1
(2)
z2
(2)
→a2
(2)
B10
(2)
,B11
(2)
,B12
(2)
z1
(3)
→a1
(3)
Error: 1
(3)
B20
(2)
,B21
(2)
,B22
(2)
z2
(3)
→a2
(3)
Error: 2
(3)
x1
x2
z1
(4)
→a1
(4)
Error: 1
(4)
a1
(3)
with term a1
(2)
B11
(2)
(recall: a1
(3)
= g(B10
(2)
+ a1
(2)
B11
(2)
+ a2
(2)
B12
(2)
) )
a2
(3)
with term a1
(2)
B21
(2)
(recall: a2
(3)
= g(B10
(2)
+ a1
(2)
B11
(2)
+ a2
(2)
B12
(2)
) )
Backpropagation algorithm
INPUT: training set {(x1
, y1
), (x2
, y2
), …, (xm
, ym
)}
Initialize ij
(ℓ)
←0 for every i, j, ℓ
For i = 1 to m:
a(1)
← xi
Perform forward propagation to compute every aj
(ℓ)
for ℓ = 1, 2, …,L (recall the aj
(L)
are the outputs of the network)
(L)
← a(L)
- yi
(errors of the output layer)
Compute the (L-1)
, (L-2)
, …, (2)
as (ℓ)
= (B(ℓ)
)T (ℓ+1)
.* (a(ℓ)
.* (1 - a(ℓ)
) (where .* means element-wise product)
ij
(ℓ)
← ij
(ℓ)
+ aj
(ℓ)
i
(ℓ+1)
Dij
(ℓ)
= (1/m) ij
(ℓ)
(assuming no regularization)
Finally……….. the D’s are the components of the gradient vector evaluated at the current values of the parameters:
J / Bij
(ℓ)
| Bij(ℓ) = Bij(ℓ)(t)
= Dij
(ℓ)
and now we can use them to update the parameter values B(t)
to obtain B(t+1)
, either as in gradient descent or any other
optimization method
27
Summary
● Neural Networks are a machine learning model inspired in the human brain
● They appear as a way to create highly non-linear features in an intelligent way
○ It is not the only model dealing with a non-linear frontier, e.g. Support Vector Machines
● Training a Neural Network requires a lot of training data
○ … because they are needed to obtain a good approximation of the gradient at each point of the parameter
space (and because there are a lot of parameters, it is a high-dimensional space!)
● The backpropagation algorithm allows computing the gradient at each point much more efficiently than doing it
directly.
28
THANK YOU!
29

More Related Content

What's hot

Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Introduction to Fuzzy logic
Introduction to Fuzzy logicIntroduction to Fuzzy logic
Introduction to Fuzzy logicAdri Jovin
 
Neural Networks
Neural NetworksNeural Networks
Neural NetworksAdri Jovin
 
Neural networks
Neural networksNeural networks
Neural networksSlideshare
 
Neural networks
Neural networksNeural networks
Neural networksSlideshare
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Basics Of Neural Network Analysis
Basics Of Neural Network AnalysisBasics Of Neural Network Analysis
Basics Of Neural Network Analysisbladon
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural NetworksArslan Zulfiqar
 
Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9Randa Elanwar
 
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSArtificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSMohammed Bennamoun
 
Neural network
Neural networkNeural network
Neural networkFacebook
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 

What's hot (20)

Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Introduction to Fuzzy logic
Introduction to Fuzzy logicIntroduction to Fuzzy logic
Introduction to Fuzzy logic
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Artificial Neuron network
Artificial Neuron network Artificial Neuron network
Artificial Neuron network
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Neural networks
Neural networksNeural networks
Neural networks
 
Neural networks
Neural networksNeural networks
Neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Basics Of Neural Network Analysis
Basics Of Neural Network AnalysisBasics Of Neural Network Analysis
Basics Of Neural Network Analysis
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networks
 
Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9
 
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSArtificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
 
Neural network
Neural networkNeural network
Neural network
 
Artifical Neural Network
Artifical Neural NetworkArtifical Neural Network
Artifical Neural Network
 
Activation function
Activation functionActivation function
Activation function
 
Activation function
Activation functionActivation function
Activation function
 
Unit 1
Unit 1Unit 1
Unit 1
 

Similar to Introduction to Artificial Neural Networks

Modelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptx
Modelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptxModelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptx
Modelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptxKadiriIbrahim2
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 
2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptxsaadhaq6
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct ProblemKai Zhang
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Introduction to Neural Netwoks
Introduction to Neural Netwoks Introduction to Neural Netwoks
Introduction to Neural Netwoks Abdallah Bashir
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technicalalpinedatalabs
 
06_finite_elements_basics.ppt
06_finite_elements_basics.ppt06_finite_elements_basics.ppt
06_finite_elements_basics.pptAditya765321
 
Paper computer
Paper computerPaper computer
Paper computerbikram ...
 
Paper computer
Paper computerPaper computer
Paper computerbikram ...
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 

Similar to Introduction to Artificial Neural Networks (20)

Modelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptx
Modelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptxModelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptx
Modelling using differnt metods in matlab2 (2) (2) (2) (4) (1) (1).pptx
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct Problem
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Introduction to Neural Netwoks
Introduction to Neural Netwoks Introduction to Neural Netwoks
Introduction to Neural Netwoks
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
 
06_finite_elements_basics.ppt
06_finite_elements_basics.ppt06_finite_elements_basics.ppt
06_finite_elements_basics.ppt
 
5994944.ppt
5994944.ppt5994944.ppt
5994944.ppt
 
Paper computer
Paper computerPaper computer
Paper computer
 
Paper computer
Paper computerPaper computer
Paper computer
 
Time complexity.ppt
Time complexity.pptTime complexity.ppt
Time complexity.ppt
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 

More from Stratio

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupStratio
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science MeetupStratio
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupStratio
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Stratio
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0Stratio
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challengeStratio
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big DataStratio
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformStratio
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Stratio
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosStratio
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + KerberosStratio
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model TreesStratio
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesStratio
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] SparktaStratio
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scalaStratio
 

More from Stratio (20)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 

Recently uploaded

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 

Introduction to Artificial Neural Networks

  • 1. Índice Analítico A long time ago in a land close enough...
  • 2. Who said video games were not cool?!
  • 5. Índice Analítico Si menos de dos horas en comer tú tardas, apresurado eres Maestro Yoda. Leal y muy noble
  • 6. Pablo J. Villacorta Iglesias pvillacorta@stratio.com July 2017 Deep Learning Course Session 1: Introduction to Artificial Neural Networks 6
  • 7. Contents Artificial Neural Networks: concept and motivation Gradient descent in Logistic regression The backpropagation algorithm 1 2 3 References and further reading4 7
  • 8. Review: learning a model from data Features Target (only in supervised learning) 8 x1 5.1 x1 = x2 = 3.5 x3 1.4 x4 0.2
  • 10. Motivation I: the need for non-linear decision boundaries ● What happens if we have n = 50 features? ○ Too many new features generated ● Even worse: what if we have an image, where each pixel is a feature, and want to learn the concept being displayed? ● Trick: create new features as non-linear combinations of existing features, and give them to the linear classifier ○ Eg: use x1 and x2 but also x1 x2 , x1 2 , x2 2 ○ Pros: still using a simple (white-box) classifier, understandable by business people (econometrics) Decision boundary = set of points that are equally likely to belong to any of two classes Simple classifiers such as linear regression and logistic regression can only find linear boundaries 10
  • 11. Motivation II: the brain as a “universal” learning algorithm 11 ● Neurons behave as computing units. Each neuron receives electric input signals (called spikes) through the dendrites, computes an output with them and sends it through the axon to other neurons connected to it ● The human brain can learn almost any task ○ Let’s see how it is structured and try to imitate it if we want a really good learning machine ● Axon-dendrite transmission is done in a mechanism called synapse. ● This process never changes, yet the brain is always learning
  • 12. A computational model of a natural neuron 12 x0 =1 x1 x2 x3 Output (called activation): g( T x) = g( 0 + 1 x1 + 2 x2 + 3 x3 ) The value of the parameters = depends on the neuron (they represent “what the neuron has learned upt to now”). Function g(·) is known as the activation function and it is a non-linear function of the input. Most often, it is the sigmoid function: g(z) = 1 / ( 1 + e-z ) 0 1 2 3 Conclusion: with the sigmoid, each neuron actually learns a logistic regression for a (mysterious) sub-task which contributes appropriately to the network task. g( T x) = 1 / (1 + e-( 0 + 1x1 + 2x2 + 3x3) ) x1 x = x2 x3 SIMPLE PERCEPTRON
  • 13. Equation (“hypothesis”) of a small neural network 13 x1 x2 x3 a2 (2) a1 (2) a3 (2) a1 (3) Input layer (layer 1) Output layer (layer 3) Hidden layer(s) (layer 2) ai (j) = activation of the i-th neuron of layer j B(j) = matrix of parameters multiplied by the inputs (activations) from layer j to compute the activations of layer j+1 h (x) Neuron 1 of layer 2 N2 of L2 Neuron 3 of layer 2 Neuron 1 of layer 3 MULTILAYER PERCEPTRON (MLP)
  • 14. Equation (“hypothesis”) of a simple neural network 14 x1 x2 x3 Input layer (layer 1) Output layer (layer 3) Hidden layer(s) (layer 2) ai (j) = activation of the i-th neuron of layer j B(j) = matrix of parameters to be multiplied by the inputs (activations) from layer j to compute the activations of layer j+1 h (x) ( 1 (1) )T 1,0 (1) 1,1 (1) 1,2 (1) 1,3 (1) B(1) = ( 2 (1) )T = 2,0 (1) 2,1 (1) 2,2 (1) 2,3 (1) ( 3 (1) )T 3,0 (1) 3,1 (1) 3,2 (1) 3,3 (1) B(2) = ( 1 (2) )T = 1,0 (2) 1,1 (2) 1,2 (2) 1,3 (2) 1 (1) a1 (2) 2 (1) a2 (2) 3 (1) a3 (2) 1 (2) a1 (3) a1 (2) = g(B(1) 10 + B(1) 11 x1 + B(1) 12 x2 + B(1) 13 x3 ) a2 (2) = g(B(1) 20 + B(1) 21 x1 + B(1) 22 x2 + B(1) 23 x3 ) a3 (2) = g(B(1) 30 + B(1) 31 x1 + B(1) 32 x2 + B(1) 33 x3 ) h (x) = a1 (3) = g(B(2) 10 + B(2) 11 a1 (2) + B(2) 12 a2 (2) + B(2) 13 a3 (2) ) which is a logistic regression with new variables a1 , a2 , a3 created as non-linear transformations of x1 , x2 , x3
  • 15. A matricial compact form to compute the output 15 ai (j) = activation of the i-th neuron of layer j B(j) = matrix of parameters multiplied by the inputs (activations) from layer j to compute the activations of layer j+1 ( 1 (1) )T 1,0 (1) 1,1 (1) 1,2 (1) 1,3 (1) B(1) = ( 2 (1) )T = 2,0 (1) 2,1 (1) 2,2 (1) 2,3 (1) ( 3 (1) )T 3,0 (1) 3,1 (1) 3,2 (1) 3,3 (1) B(2) = ( 1 (2) )T = 1,0 (2) 1,1 (2) 1,2 (2) 1,3 (2) a1 (2) = g(B(1) 10 + B(1) 11 x1 + B(1) 12 x2 + B(1) 13 x3 ) = g(z1 (2) ) a2 (2) = g(B(1) 20 + B(1) 21 x1 + B(1) 22 x2 + B(1) 23 x3 ) = g(z2 (2) ) a3 (2) = g(B(1) 30 + B(1) 31 x1 + B(1) 32 x2 + B(1) 33 x3 ) = g(z3 (2) ) h (x) = a1 (3) = g(B(2) 10 + B(2) 11 a1 (2) + B(2) 12 a2 (2) + B(2) 13 a3 (2) ) z1 (2) z(2) = z2 (2) = B(1) a(1) = B(1) z3 (2) 1 x1 x2 x3 z(3) = B(2) a(2) a(3) = g(z(3) ) (apply g element-wise) 1 g(z1 (2) ) g(z2 (2) ) g(z3 (2) ) a(2) = (1, g(z(2) ))T = (apply g element-wise)
  • 16. Two-class and multiclass classification with a neural network 16 K classes = K neurons in the output layer (K > 2) 2 classes = 1 neuron in the output layer x1 x2 x3 a2 (2) a1 (2) a3 (2) a1 (3) We see the whole network as a function hB : ℝp →ℝ (p is the number of features) h (x) We see the whole network as a function hB : ℝp →ℝK (p is the number of features, K is the number of classes)
  • 17. Gradient descent in Logistic regression 17 2
  • 18. The concept of cost function in Machine Learning ● In any Machine Learning model, fitting a model means finding the best values of its parameters. ● The best model is that whose parameter values minimize the total error with respect to the actual outputs ● Since the error depends on the parameters chosen, it is a function called cost function J. The most common error measure is the MSE (Mean Squared Error): J ( 0 , 1 , …, R ) = [ 1/(2m) ] i=1 m ( ŷi - yi )2 = m = number of training examples = [ 1/(2m) ] i=1 m (h 0, 1, …, R (xi ) - yi )2 h 0, 1, …, R = hypothesis (the equation of the model) ● Instead of minimizing the total sum, we minimize the average error, (1/m). ● Finding the optimum of any function f is equivalent to finding the optimum of f / 2. ● Hence we divide by 2 because it eases further calculations. 18 Ideally the cost function is convex: for every pair of points, the curve is always below the line (or hyper-plane) between them Cost function of one parameter Cost function of two parameters
  • 19. Gradient descent with one variable ● If the cost function is convex: only one global optimum, which is the global minimum. ● Closed form of the minimum: ○ Compute the derivative of the model equation and find the point where it is 0 (exact solution). ○ Multiple variables: partial derivatives, and solve an equation system to find where all are 0 simultaneously. ● Can be difficult if the model equation is complicated, has many variables (large equation system) or is not derivable. ● Solution: approximate iterative algorithm called gradient descent (also valid for non-convex functions!) 19 GRADIENT DESCENT ALGORITHM 0 ←some initial value ←some fixed (small) constant ( is called the learning ratio ) t←0 tolerance←small value (e.g: 0.000001) while dJ/d | = (t) > tolerance : (t+1) ← (t) - ( dJ/d | = (t) ) derivate < 0 derivate > 0 ⟶ dJ/d < 0 : (t+1) > (t): increases ⟵ dJ/d < 0 : (t+1) < (t): decreases * : derivate = 0
  • 20. Gradient descent in general (variables = ( 0 , 1 , …, p )) ● With multiple variables: evaluate the modulus (noted ||·||) of the gradient vector to test the stopping criterion 20 GRADIENT DESCENT ALGORITHM (p variables) 0 ←some initial vector ( 01 , …, 0p ) ←some fixed (small) constant ( is called the learning ratio ) t←0 tolerance←small value (e.g: 0.000001) while ||∇g | = (t) || > tolerance: (t+1) ← (t) - || ∇g | = (t) || NOTE: ∇g = ( J/ 0 , J/ 1 , … , J/ p ) ∈ ℝp ● In general, cost functions are not convex: many local optima. Gradient descent does not guarantee the minimum ● The solution found depends on the starting point In summary: we need to compute the partial derivatives at each parameter point
  • 21. Gradient descent with two variables ● Note that the error function J is determined uniquely by the dataset and the shape model being fitted. ● The function (and hence its derivative) does not change during the algorithm, we just evaluate the derivate function at different values of the model parameters, which are the variables of that function. ● E.g: linear regression with 2 variables x1 , x2 . The model is h (x) = 0 + 1 x1 + 2 x2 Imagine this dataset: x1 = (2, 3, 1.8) x2 = (4, 5, 3.2) (having only m = 2 examples) Then J( 0 , 1 , 2 ) = (1/(2·2)) [( 0 + 2 1 + 3 2 - 1.8)2 + ( 0 + 4 1 + 5 2 - 3.2)2 ] J/ 0 = (1/2) ( ( 0 + 2 1 + 3 2 - 1.8) + ( 0 + 4 1 + 5 2 - 3.2) ) J/ 1 = (1/2) ( ( 0 + 2 1 + 3 2 - 1.8)·2 + ( 0 + 4 1 + 5 2 - 3.2)·4 ) J/ 2 = (1/2) ( ( 0 + 2 1 + 3 2 - 1.8)·3 + ( 0 + 4 1 + 5 2 - 3.2)·5 ) ∇g = ( (1/2) ( ( 0 + 2 1 + 3 2 - 1.8) + ( 0 + 4 1 + 5 2 - 3.2) ), (1/2) ( ( 0 + 2 1 + 3 2 - 1.8)·2 + ( 0 + 4 1 + 5 2 - 3.2)·4 ), (1/2) ( ( 0 + 2 1 + 3 2 - 1.8)·3 + ( 0 + 4 1 + 5 2 - 3.2)·5 ) ) 21 and now we can start evaluating ∇g at different points (t) , each point being a vector of ℝ3 .
  • 22. Cost function of logistic regression and a neural network ● In logistic regression, h (x) = 1 / (1 + e-( ) ) and so the cost function of a logistic regression, J( 0 , 1 , …, R ) = [ 1/(2m) ] i=1 m ( 1 / (1 + e-( ) ) - yi )2 is non-convex. A somewhat equivalent, convex cost function in logistic regression is J( 0 , 1 , …, R ) = -(1/m) [ i=1 m yi log(h (xi )) + (1-yi )log(1 - h (xi )) ] (we ignore any regularization term) 22 T x T x Recall that a neural network can be seen as an aggregation of logistic regressions. In a NN with K neurons in the output layer (no matter how many are inside), the cost function is J(B) = -(1/m) [ i=1 m k=1 K yi log(hB (xi ))k + (1-yi )log(1 - (hB (xi ))k ) ] B = {B(1) , B(2) , … } which is again non-convex. How can we compute the partial derivatives J / Bij (ℓ) at each step… and not die along the way? The task seems a bit (computationally) heavy...
  • 24. Backpropagation Algorithm to compute the partial derivatives of the cost function with respect to each parameter. ● Intuition: compute the contribution of each neuron to the final error, and change its weights accordingly ○ Compute the contribution (deltas) of each neuron to the error of each example separately, and then accumulate over all examples to obtain the total contribution of each neuron to the total error. Example: we first apply forward propagation to compute every aj (ℓ) and the output of the network hB (x) 24 z1 (3) = B10 (2) + B11 (2) a1 (2) + B12 (2) a2 (2) a1 (3) = g(z1 (3) ) B10 (1) ,B11 (1) ,B12 (1) z1 (2) →a1 (2) B20 (1) ,B21 (1) ,B22 (1) z2 (2) →a2 (2) B10 (2) ,B11 (2) ,B12 (2) z1 (3) →a1 (3) B20 (2) ,B21 (2) ,B22 (2) z2 (3) →a2 (3) x1 x2 B10 (3) ,B11 (3) ,B12 (3) z1 (4) →a1 (4) a1 (2) a2 (2)
  • 25. Backpropagation: contribution of a1 (3) to the error How wrong is a1 (3) ? In other words: How much did neuron 1 of layer 3 contribute to the network error on a given example (xi , yi ) ? 1 (4) = yi - a1 (4) 1 (3) = B11 (3) 1 (4) because a1 (3) had contributed to a1 (4) with the term a1 (3) B11 (3) (recall: a1 (4) = g( B10 (3) + a1 (3) B11 (3) + a2 (3) B12 (3) ) ) 25 z1 (2) →a1 (2) z2 (2) →a2 (2) z1 (3) →a1 (3) Error: 1 (3) z2 (3) →a2 (3) x1 x2 B10 (3) ,B11 (3) ,B12 (3) z1 (4) →a1 (4) Error: 1 (4)
  • 26. Backpropagation: contribution of a1 (2) to the error How wrong is a1 (2) ? In other words: How much did neuron 1 of layer 2 contribute to the network error on a given example (xi , yi ) ? 1 (2) = B11 (2) 1 (3) + B21 (2) 2 (3) because a1 (2) contributed to 26 z1 (2) →a1 (2) Error: 1 (2) z2 (2) →a2 (2) B10 (2) ,B11 (2) ,B12 (2) z1 (3) →a1 (3) Error: 1 (3) B20 (2) ,B21 (2) ,B22 (2) z2 (3) →a2 (3) Error: 2 (3) x1 x2 z1 (4) →a1 (4) Error: 1 (4) a1 (3) with term a1 (2) B11 (2) (recall: a1 (3) = g(B10 (2) + a1 (2) B11 (2) + a2 (2) B12 (2) ) ) a2 (3) with term a1 (2) B21 (2) (recall: a2 (3) = g(B10 (2) + a1 (2) B11 (2) + a2 (2) B12 (2) ) )
  • 27. Backpropagation algorithm INPUT: training set {(x1 , y1 ), (x2 , y2 ), …, (xm , ym )} Initialize ij (ℓ) ←0 for every i, j, ℓ For i = 1 to m: a(1) ← xi Perform forward propagation to compute every aj (ℓ) for ℓ = 1, 2, …,L (recall the aj (L) are the outputs of the network) (L) ← a(L) - yi (errors of the output layer) Compute the (L-1) , (L-2) , …, (2) as (ℓ) = (B(ℓ) )T (ℓ+1) .* (a(ℓ) .* (1 - a(ℓ) ) (where .* means element-wise product) ij (ℓ) ← ij (ℓ) + aj (ℓ) i (ℓ+1) Dij (ℓ) = (1/m) ij (ℓ) (assuming no regularization) Finally……….. the D’s are the components of the gradient vector evaluated at the current values of the parameters: J / Bij (ℓ) | Bij(ℓ) = Bij(ℓ)(t) = Dij (ℓ) and now we can use them to update the parameter values B(t) to obtain B(t+1) , either as in gradient descent or any other optimization method 27
  • 28. Summary ● Neural Networks are a machine learning model inspired in the human brain ● They appear as a way to create highly non-linear features in an intelligent way ○ It is not the only model dealing with a non-linear frontier, e.g. Support Vector Machines ● Training a Neural Network requires a lot of training data ○ … because they are needed to obtain a good approximation of the gradient at each point of the parameter space (and because there are a lot of parameters, it is a high-dimensional space!) ● The backpropagation algorithm allows computing the gradient at each point much more efficiently than doing it directly. 28