Artificial neural networks

A Brief Introduction to
Artificial Neural Networks
http://www.raymazza.com/uploads/1/0/6/2/10622510/4006097_orig.jpg

• A neuron is electrically excitable
cell that processes and transmits
information through electrical and
chemical signals.
• It’s the basic unit of computation in
our nervous system.
• A typical neuron possesses :
• Dendrites : Input
• Cell body (Soma) : Processor
• Axon : Output
• Neurons communicate with one
another via synapses which can be
excitatory or inhibitory.
What is a neuron ?
http://www.urbanchildinstitute.org/sites/all/files/databooks/2011/ch1-fg2-communication-between-neurons.jpg

Neural Network
• A neural network is an interconnected web
of neurons transmitting elaborate patterns
of electrical signals.
• Dendrites receive input signals and, based
on those inputs, fire an output signal via an
axon.
• A neural network is a adaptive system,
meaning it can change its internal structure
based on the information flowing through it.
• Typically, this is achieved through the
adjusting of weights.
http://www.fridayfonts.com/wp-content/uploads/2009/05/neurons.jpg

Artificial neuron
• An artificial neuron is a mathematical model of biological neurons which receives one or
more inputs and sums them to produce an output.
• Usually the sums of each node are weighted, and the sum is passed through a non-
linear function known as an activation function.
http://upload.wikimedia.org/wikipedia/commons/thumb/6/60/ArtificialNeuronModel_english.png/600px-ArtificialNeuronModel_english.png

Artificial Neural Network (ANN)
• An ANN is typically defined by three types of parameters:
• The interconnection pattern between the different layers of neurons
• The learning process for updating the weights of the interconnections
• The activation function that converts a neuron's weighted input to its output
http://wp.fobiss.com/wp-content/uploads/2013/06/Brain2.png

Perceptron
• Invented in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory.
• It is a computational model of a single neuron.
• It consists of one or more binary inputs, a processor, and a single binary output.
• Binary inputs are multiplied by weights [real numbers expressing the importance of the
respective inputs to the output] & then fed into processor.
Activation function
http://neuralnetworksanddeeplearning.com/chap1.html

Stochastic Model
𝑝 𝑥 =
1
1 + 𝑒− 𝑣
𝑇
𝑊𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 1 − 𝑝(𝑥)
𝑊𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝(𝑥)
Where T is pseudo temperature.
• For T=0 :

• Sigmoid neuron inputs and output can take any value between 0 and 1.
• Sigmoid neuron activation function is given as :
𝜎 𝑤. 𝑥 + 𝑏 =
1
1+𝑒(−𝑤.𝑥−𝑏)
Sigmoidal Neuron
• The smoothness of σ allows us to write a small change Δoutput in terms of small
changes Δwj in the weights and Δb in the bias as :

Neural network architecture
• The leftmost layer in this network is called the input layer, and the neurons within the
layer are called input neurons. The rightmost or output layer contains the output neurons.
The middle layer is called a hidden layer, since the neurons in this layer are neither inputs
nor outputs.
Feed-forward network

Design of Input and Output Layers
• The design of the input and output layers in a network is often straightforward.
• For example, suppose we're trying to determine whether a handwritten image depicts a
"9" or not.
• A natural way to design the network is to encode the intensities of the image pixels into
the input neurons.
• If the image is a 64 by 64 greyscale image, then we'd have 4,096=64×64 input neurons,
with the intensities scaled appropriately between 0 and 1.
• The output layer will contain just a single neuron, with output values of less than 0.5
indicating "input image is not a 9", and values greater than 0.5 indicating "input image is
a 9 ".

Learning strategy
• Now we want an algorithm which lets us find weights and biases so that the output from
the network approximates the desired output for all training inputs x. For that purpose we
define a cost function :
𝐶 𝑤, 𝑏 =
1
2𝑛 𝑥 𝑑 − 𝑦(𝑥) 2
• The aim of our training algorithm is to find a set of weights and biases which minimizes
the cost function.
• We'll do that using an algorithm known as gradient descent.
Where,
w : the collection of all weights in the network
b : all the biases
n : total number of training inputs,
d : desired output from network
y(x) : Actual output from network

Gradient Descent Algorithm
• Let's suppose we're trying to minimize some function :
𝐶 𝒗 𝑤ℎ𝑒𝑟𝑒, 𝒗 = (𝑣1, 𝑣2, … . , 𝑣 𝑚) 𝑇 𝑣1, 𝑣2, … . , 𝑣 𝑚 ∈ ℝ
• To visualize C(v) it helps to imagine C as a function of just two variables, which we'll call
v1 and v2
What we'd like is to find where C achieves
its global minimum.

• A small change in 𝐶(𝒗) can be written as :
∆𝐶 =
𝜕𝐶
𝜕𝑣1
∆𝑣1 +
𝜕𝐶
𝜕𝑣2
∆𝑣2 + ⋯ +
𝜕𝐶
𝜕𝑣 𝑚
∆𝑣 𝑚
• Above expression can also be written as :
∆𝐶 = 𝜵𝑪. ∆𝒗
• Where ,
𝜵𝑪 =
𝜕𝐶
𝜕𝑣1
,
𝜕𝐶
𝜕𝑣2
, … ,
𝜕𝐶
𝜕𝑣 𝑚
𝑇
∆𝒗 = ∆𝑣1, ∆𝑣2, … , ∆𝑣 𝑚
• Now ,we want to choose ∆𝒗 such that ∆𝐶 ≤ 0. If we choose :
∆𝒗 = −𝜂𝜵𝑪 ⇒ ∆𝐶 = −𝜂 𝜵𝑪 2 ≤ 0

• Now , the next point on the surface can be given as :
𝒗′ = 𝒗 + ∆𝑣
𝒗′ = 𝒗 − 𝜂𝜵𝑪
• We'll use above updating rule to compute a value for new value of 𝑣1, 𝑣2,…, 𝑣 𝑚 :
𝑣1
′
= 𝑣1 − 𝜂
𝜕𝐶
𝜕𝑣1
𝑣2
′
= 𝑣2 − 𝜂
𝜕𝐶
𝜕𝑣2
.
.
𝑣 𝑚
′
= 𝑣 𝑚 − 𝜂
𝜕𝐶
𝜕𝑣 𝑚
• Then we'll use this update rule again, to make another move. If we keep doing this, over
and over, we'll keep decreasing C until , we hope , we reach a global minimum.
𝜂 = 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒

Update rule for weights & bias
• In case of a neural network our variables are : 𝑤𝑖 𝑎𝑛𝑑 𝑏𝑖
• Using gradient descent algorithm, the update rule for weights and biases can be given as :
𝑤𝑖
′ = 𝑤𝑖 − 𝜂
𝜕𝐶
𝜕𝑤𝑖
𝑏𝑖
′
= 𝑏𝑖 − 𝜂
𝜕𝐶
𝜕𝑏𝑖
• By repeatedly applying this update rule we can "roll down the hill", and hopefully find a
minimum of the cost function. In other words, this is a rule which can be used to learn in
a neural network.

Practical applications
• To recognize individual digits we will use a three-layer neural network:

Fitting Straight to the data
• A linear model of neuron is used for fitting straight line to our data.
• In case of N dimension , there are N free parameters : (N-1) slope components and an
intercept.
𝑥1
𝑥 𝑛−1
𝑦𝑛
𝑤 𝑛0
𝑤 𝑛1
𝑤 𝑛(𝑛−1)
+1
𝑦𝑛 = 𝑤 𝑛0 +
𝑗=1
𝑛−1
𝑥𝑗 𝑤 𝑛𝑗
Output Input
𝑦𝑝
𝑡 𝑝
𝑝
𝐶 =
𝑝
𝐶 𝑝
𝐶 𝑝 =
1
2
𝑡 𝑝 − 𝑦𝑝
2

Update rule for weights
𝑤 𝑛𝑖
′
= 𝑤 𝑛𝑖 − 𝜂
𝜕𝐶
𝜕𝑤 𝑛𝑖
• Now , the derivative in above equation can be calculated using chain rule as follows :
𝜕𝐶
𝜕𝑤 𝑛𝑖
=
𝜕
𝜕𝑤 𝑛𝑖
𝑝
𝐶 𝑝 =
𝑝
𝜕𝐶 𝑝
𝜕𝑤 𝑛𝑖
𝜕𝐶 𝑝
𝜕𝑤 𝑛𝑖
=
𝜕𝐶 𝑝
𝜕𝑦𝑝
𝜕𝑦𝑝
𝜕𝑤 𝑛𝑖
𝜕𝐶 𝑝
𝜕𝑦𝑝
=
𝜕
𝜕𝑦𝑝
1
2
𝑡 𝑝 − 𝑦𝑝
2
= − 𝑡 𝑝 − 𝑦𝑝
𝜕𝑦𝑝
𝜕𝑤 𝑛𝑖
=
𝜕
𝜕𝑤 𝑛𝑖
𝑗=0
𝑛
𝑥𝑗 𝑤 𝑛𝑗 = 𝑥𝑖

𝑤 𝑛𝑖
′
= 𝑤 𝑛𝑖 − 𝜂
𝜕𝐶
𝜕𝑤 𝑛𝑖
𝜕𝐶
𝜕𝑤 𝑛𝑖
=
𝑝
−𝑥𝑖 𝑡 𝑝 − 𝑦𝑝
𝑤 𝑛𝑖
′
= 𝑤 𝑛𝑖 + 𝜂
𝑝
𝑥𝑖 𝑡 𝑝 − 𝑦𝑝

References
• http://neuralnetworksanddeeplearning.com/chap1.html
• http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/
• https://en.wikipedia.org/
• https://images.google.com/

Artificial neural networks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Artificial neural networks

Ähnlich wie Artificial neural networks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Artificial neural networks