SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
DOI : 10.5121/ijdkp.2014.4301 1
BOUNDNESS OF A NEURAL NETWORK
WEIGHTS USING THE NOTION OF A LIMIT OF
A SEQUENCE
Dr. Hazem Migdady
Department of Mathematics and Computer Science, Tafila Technical University, P.O.
Box 179, Tafila 66110
ABSTRACT
feed forward neural network with backpropagation learning algorithm is considered as a black box
learning classifier since there is no certain interpretation or anticipation of the behavior of a neural
network weights. The weights of a neural network are considered as the learning tool of the classifier, and
the learning task is performed by the repetition modification of those weights. This modification is
performed using the delta rule which is mainly used in the gradient descent technique. In this article a
proof is provided that helps to understand and explain the behavior of the weights in a feed forward neural
network with backpropagation learning algorithm. Also, it illustrates why a feed forward neural network is
not always guaranteed to converge in a global minimum. Moreover, the proof shows that the weights in the
neural network are upper bounded (i.e. they do not approach infinity).
KEYWORDS
Data Mining, Delta Rule, Machine Learning, Neural Networks, Gradient Descent.
1. INTRODUCTION
Mitchell (1997) argue that “Neural network learning methods provide a robust approach to
approximating target functions, and they are among the most effective learning methods currently
known”. Moreover, the author believes that “backpropagation learning algorithm has proven
surprisingly successful in many practical problems”. LeCun, et al. (1989), Cottrell (1990) and
Lang, et al. (1990) provided experimental results support the efficient characteristic of
backpropagation learning algorithm with neural networks.
A feed forward neural network is considered as a black box since there is no certain interpretation
or anticipation of its weights behavior. The weights of a neural network are considered as the
learning tool. During the training process of a neural network, the weights are repeatedly
modified, since the main characteristic of a neural network is: “those connections between
neurons leading to the right answer are strengthened by maximizing their corresponding weights,
while those leading to the wrong answer are weaken” (Negnevitsky, 2005).
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
2
“Each input node is connected with a real-valued constant (i.e. weight) that determines the
contribution of that input to the output. Learning a neural network involves choosing values for
the weights. Hence, the learning problem faced by Backpropagation is to search a large
hypotheses space defined by all possible weight values for all the units in the network” (Mitchell,
1997).
The gradient descent technique is among the common techniques that are used to perform the
search process that was mentioned by Mitchell (1997) by optimizing the weights of a neural
classifier, which is achieved by applying the delta rule that is used to find out the amount by
which the current value of a weight will be updated.
Mathew (2013) believes that “The most popular neural network algorithm is backpr opagation, a
kind of gradient descent method. Backpropagation iteratively process the data set, comparing the
network’s prediction for each tuple with actual known target value to find out an acceptable local
minimum in the NN weight space in turns achieves the least number of errors”. Moreover, in a
related context, Kumar, et al. (2012) mentioned that backpropagation algorithm “learns by
recursively processing a set of training tuples, comparing the network’s observed output for each
tuple with the actual known class attribute value. For each training tuple, the weights are edited
so as to reduce the mean squared error between the network’s output and the actual class label or
value. These changes are propagated in backward direction through each hidden layer down to
the first hidden layer. After running the process repetitively, the weights will finally converge,
and the training process stops”.
Even though a neural classifier is considered as a robust learning machine, it is not guaranteed to
converge to a global minimum; instead it is possible to stuck in a local minimum. In this notion a
proof is provided that helps to understand the characteristics and the nature of a neural network
weights, which in turn can be used to interpret the whole behavior of a neural classifier.
The remainder of this paper is organized as the following: the next section mentions the concept
of the gradient descent technique. This section contains two subsections show that the delta rules
for the input and hidden weights approaches zero. The paper ends with the conclusion.
2. GRADIENT DESCENT TECHNIQUE AND THE DELTA RULE
In this section the gradient descent technique and the delta rule will be introduced. We will show
that the delta rule approaches zero. Hence, taking into consideration that the delta rule is used to
update the values of the weights in a neural classifier, this implies that the weights are upper
bounded and they do not approach infinity.
The learning task in a feed forward neural network with backpropagation training algorithm is
achieved by the optimizing of the neural network weights in order to fit the data points in a
dataset. This can be achieved by applying the gradient descent technique which mainly based on
choosing the weights that minimize the error (i.e. the difference between the actual output and the
target). Russell and Norvig (2010) provided some details about the error function and called it as:
Loss function.
According to Negnevitsky (2005) Equation (1) below shows the major equation to update a
neural network weights.
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
3
‫ݓ‬௜ାଵ = ‫ݓ‬௜ + ∆‫ݓ‬௜ (1)
where ‫ݓ‬௜ is the weight at the current training iteration ݅, and ‫ݓ‬௜ାଵ is the weight at next training
iteration ݅ + 1. ∆‫ݓ‬௜ is the amount of change in ‫ݓ‬௜. ∆‫ݓ‬௜ is known as the delta rule which is used to
update the weights in the neural network. The delta rule is not identical for all weights in the
neural network, since there are two kinds of weights: (1) Input Weights (connect the input layer
to the output layer) and (2) Hidden Weights (connect the hidden layer to the output layer).
(1) DELTA RULE FOR HIDDEN WEIGHTS
Equation (2) below illustrates the delta rule ∆‫ݓ‬௝ that is used to update the hidden weights in a
feed forward neural network.
∆‫ݓ‬௝ = ߙ (‫ݕ‬஽ − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺) ‫ݕ‬௝௪ೕ
(2)
where ‫ݕ‬஽ and ‫ݕ‬஺ are the target and the actual outputs on the output layer respectively, where
‫ݕ‬஽ ∈ ሼ0,1ሽ. ‫ݕ‬௝ is the output of the hidden neuron ݆. ‫ݓ‬௝ is the hidden weight that connects the
hidden neuron ݆ to the output layer in the neural network. ߙ is a small positive value that is
known as the learning rate. ‫ݕ‬஺ is calculated using the logistic regression function as equation (3)
illustrates.
‫ݕ‬஺ =
1
1 + ݁ି௭
(3)
where ‫ݖ‬ is a multiple regression function, ‫ݕ‬ଵ … ‫ݕ‬௡ are the outputs of the hidden neurons, and
‫ݓ‬ଵ … ‫ݓ‬௡ are the weights that connect the hidden layer to the output layer. The major aim here is
to prove that the delta rule in (2) approaches zero, since this implies that the weights will be
stable in some level, which means that the weights are bounded and they do not approach
infinity. Such feature causes the neural classifier to stuck in a local minimum or a global minima.
As mentioned before, the desired output takes two values only: ‫ݕ‬஽ ∈ ሼ0,1ሽ.
(1) when ‫ݕ‬஽ = 1, there are three cases: ‫ݕ‬௝ = 0, ‫ݕ‬௝ = 1, and ‫ݕ‬௝ ∈ (0,1).
Now let ‫݆ݓ‬௜ be the weight of the hidden output ‫ݕ‬௝ at iteration ݅. Thus:
∆‫݆ݓ‬௜ = ߙ ‫ݕ‬௝ (1 − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺) (4)
= ߙ ‫ݕ‬௝ (1 − ‫ݕ‬஺)ଶ
‫ݕ‬஺
= ߙ ‫ݕ‬௝ ‫ݕ‬஺ (‫ݕ‬஺ − 1)ଶ
= ‫ܥ‬ଵ ‫ݕ‬஺ (‫ݕ‬஺ − 1)ଶ
where ‫ܥ‬ଵ = ߙ ‫ݕ‬௝
= ‫ܥ‬ଵ ቀ
ଵ
ଵ ା ௘ష೥ቁ ቀ
ଵ
ଵ ା௘ష೥ − 1ቁ
ଶ
Case (I): when ‫ݕ‬௝ ∈ (0,1) => ‫ܥ‬ଵ = ߙ ‫ݕ‬௝ is positive (note that ߙ is a small positive value such
that ߙ = 0.1, 0.01, 0.001)
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
4
- if ‫ݓ‬௝ → +∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~
→ +∞
where ‫ݖ‬~
= ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݕ‬ଵ + ⋯ + ‫ݓ‬௝ିଵ‫ݕ‬௝ିଵ + ‫ݓ‬௝ାଵ‫ݕ‬௝ାଵ + ⋯ + ‫ݓ‬௡‫ݕ‬௡
݁ି௭
→ 0
ଵ
ଵା௘ష೥ → 1
(
ଵ
ଵା௘ష೥ − 1) → 0
∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ
ଵ
ଵା ௘ష೥ቁ ቀ
ଵ
ଵା ௘ష೥ − 1ቁ
ଶ
→ 0
∆‫݆ݓ‬௜ → 0
Thus, as ‫݆ݓ‬௜ increases (‫݆ݓ‬௜ → +∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0
(∆‫݆ݓ‬௜ → 0). This result can be used with the notion of limit of a sequence, which is considered as
the most basic concept among different concepts of limit in real analysis see Bartle and Sherbert
(2000).
According to Bartle and Sherbert (2000), the notion of a limit of a sequence implies that: when a
sequence converges to 0, then for each ߝ > 0 ∃ ݊∗
∈ ℕ such that if ݅ ≥ ݊∗
then |∆‫݆ݓ‬௜ − 0| < ߝ.
That means for any ߝ > 0 there is some number ݊∗
such that all the terms of the sequence ∆‫݆ݓ‬௜
that are produced after the term ∆‫݆ݓ‬௡∗ (i.e. ∆‫݆ݓ‬௡∗ାଵ, ∆‫݆ݓ‬௡∗ାଶ, … ) will be less than ߝ. Hence, if
ߝ = 1 × 10ିଵ଴
for instance, then there is ݊∗
∈ ℕ such that all the terms
∆‫݆ݓ‬௡∗ାଵ, ∆‫݆ݓ‬௡∗ାଶ, … are less than ߝ. In other words ∆‫݆ݓ‬ will not exceed 1 × 10ିଵ଴
for any term
that is produced after the term ݊∗
.
- if ‫݆ݓ‬ → −∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~
→ −∞
where ‫ݖ‬~
= ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݕ‬ଵ + ⋯ + ‫ݓ‬௝ିଵ‫ݕ‬௝ିଵ + ‫ݓ‬௝ାଵ‫ݕ‬௝ାଵ + ⋯ + ‫ݓ‬௡‫ݕ‬௡
݁ି௭
→ +∞
ଵ
ଵା௘ష೥ → 0
ቀ
ଵ
ଵା௘ష೥ − 1ቁ → −1
∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ
ଵ
ଵା ௘ష೥ቁ ቀ
ଵ
ଵା ௘ష೥ − 1ቁ
ଶ
→ 0
∆‫݆ݓ‬௜ → 0
So as ‫݆ݓ‬௜ decreases (‫݆ݓ‬௜ → −∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0
(∆‫݆ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned
earlier when ‫ݓ‬௝ → +∞, and it proves that ∆‫݆ݓ‬ is a sequence approaches 0.
Case (II): when ‫ݕ‬௝ = 1 => ‫ܥ‬ଵ = ߙ ‫ݕ‬௝: ߙ ‫ݕ‬௝ ∈ (0,1) (note that ߙ is a small positive value such that
ߙ = 0.1, 0.01, 0.001)
- if ‫ݓ‬௝ → +∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~
→ +∞
݁ି௭
→ 0
where ‫ݖ‬~
= ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݕ‬ଵ + ⋯ + ‫ݓ‬௝ିଵ‫ݕ‬௝ିଵ + ‫ݓ‬௝ାଵ‫ݕ‬௝ାଵ + ⋯ + ‫ݓ‬௡‫ݕ‬௡
ଵ
ଵା௘ష೥ → 1
ቀ
ଵ
ଵା௘ష೥ − 1ቁ → 0
∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ
ଵ
ଵା ௘ష೥ቁ ቀ
ଵ
ଵା ௘ష೥ − 1ቁ
ଶ
→ 0
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
5
∆‫݆ݓ‬௜ → 0
So as ‫݆ݓ‬௜ increases (‫݆ݓ‬௜ → +∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0
(∆‫݆ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned
earlier.
- if ‫ݓ‬௝ → −∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~
→ −∞
݁ି௭
→ +∞
ଵ
ଵା௘ష೥ → 0
ቀ
ଵ
ଵା௘ష೥ − 1ቁ → −1
∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ
ଵ
ଵା ௘ష೥ቁ ቀ
ଵ
ଵା ௘ష೥ − 1ቁ
ଶ
→ 0
∆‫݆ݓ‬௜ → 0
So as ‫݆ݓ‬௜ decreases (‫݆ݓ‬௜ → −∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0
(∆‫݆ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned
earlier.
Case (III): when ‫ݕ‬௝ = 0 this implies that the delta rule is also 0. Recall the delta rule from
equation (2):
∆‫݆ݓ‬௜ = ߙ ‫ݕ‬௝ (1 − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺)
∆‫݆ݓ‬௜ = 0 since ‫ݕ‬௝ = 0
And this also satisfies the notion of a limit of a sequence.
(2) when ‫ݕ‬஽ = 0, there are also three cases: ‫ݕ‬௝ = 0, ‫ݕ‬௝ = 1, and ‫ݕ‬௝ ∈ (0,1).
Now let ‫݆ݓ‬௜ be the weight of the hidden output ‫ݕ‬௝ at iteration ݅. Thus:
∆‫݆ݓ‬௜ = ߙ ‫ݕ‬௝ (0 − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺)
= ߙ ‫ݕ‬௝ (‫ݕ‬஺
ଶ
)(‫ݕ‬஺ − 1)
= ‫ܥ‬ଵ ‫ݕ‬஺
ଶ (‫ݕ‬஺ − 1)
where ‫ܥ‬ଵ = ߙ ‫ݕ‬௝
= ‫ܥ‬ଵ ቀ
ଵ
ଵ ା௘ష೥ቁ
ଶ
ቀ
ଵ
ଵ ା ௘ష೥ − 1ቁ
Hence, by using the same argument above when ‫ݕ‬஽ = 1 we will end up that: when ‫ݕ‬௝ ∈ (0,1) or
‫ݕ‬௝ ∈ ሼ0,1ሽ then ∆‫݆ݓ‬௜ → 0 when ‫ݓ‬௝ → +∞ or ‫ݓ‬௝ → −∞.
Since ∆‫݆ݓ‬௜ → 0 for large ݅, then:
(‫݆ݓ‬௜ାଵ = ‫݆ݓ‬௜ + ∆‫݆ݓ‬௜) → 0
(‫݆ݓ‬௜ାଵ − ‫݆ݓ‬௜ = ∆‫݆ݓ‬௜) → 0
(‫݆ݓ‬௜ାଵ − ‫݆ݓ‬௜) → 0
(‫݆ݓ‬௜ାଵ → ‫݆ݓ‬௜) for large ݅.
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
6
(2) DELTA RULE FOR INPUT WEIGHTS
Equation (5) below illustrates the delta rule ∆‫ݓ‬௞ that is used to update the input weights.
∆‫ݓ‬௞ = ߙ ∙ ‫ݔ‬௞ ቀ‫ݕ‬௝൫1 − ‫ݕ‬௝൯ቁ ቆ෍ ൫∆‫ݓ‬௝൯൫‫ݓ‬௝൯
௟
௝ୀଵ
ቇ (5)
where ‫ݔ‬௞ an input variable, ‫ݕ‬௝ the output of the hidden neuron ݆, and ݈ is the number of the
hidden neurons in the neural classifier.
Since the delta rule ∆‫ݓ‬௞ in equation (5) depends on the amount ∆‫ݓ‬௝ which approaches zero for
large ݅ (as proved earlier), then the delta rule ∆‫ݓ‬௞ also approaches zero for large ݅. Moreover, the
hidden output ‫ݕ‬௝ affects the value of the delta rule ∆‫ݓ‬௞, but ‫ݕ‬௝ is a hidden output of the logistic
regression function at hidden neuron ݆, thus the values of ‫ݕ‬௝ are: ‫ݕ‬௝ ∈ ሼ0,1ሽ, or ‫ݕ‬௝ ∈ (0,1). Thus,
by applying the former argument with the delta rule for hidden weights, it is easy to show that
when ‫ݕ‬௝ ∈ ሼ0,1ሽ then ∆‫ݓ‬௞ → 0.
By now we need to show that ∆‫ݓ‬௞ approaches 0 when ‫ݕ‬௝ ∈ (0,1):
let ‫݇ݓ‬௜ be the weight of the input ‫ݔ‬௞ at iteration ݅. Thus:
∆‫݇ݓ‬௜ = ‫ܥ‬ଵ ‫ݕ‬௝൫1 − ‫ݕ‬௝൯
where ‫ܥ‬ଵ = ߙ ‫ݔ‬௞൫∑ ൫∆‫ݓ‬௝൯൫‫ݓ‬௝൯௟
௝ୀଵ ൯
∆‫݇ݓ‬௜ = ‫ܥ‬ଵ ቀ
ଵ
ଵ ା ௘ష೥ቁ ቀ1 −
ଵ
ଵ ା ௘ష೥ቁ
- If ‫ݓ‬௞ → +∞ then ‫ݖ‬ = ‫ݔ‬௞‫ݓ‬௞ + ‫ݖ‬~
→ +∞
where ‫ݖ‬~
= ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݔ‬ଵ + ⋯ + ‫ݓ‬௞ିଵ‫ݔ‬௞ିଵ + ‫ݓ‬௞ାଵ‫ݔ‬௞ାଵ + ⋯ + ‫ݓ‬௡‫ݔ‬௡
݁ି௭
→ 0
ଵ
ଵା௘ష೥ → 1
(1 −
ଵ
ଵା௘ష೥) → 0
∆ ‫݇ݓ‬௜ = ‫ܥ‬ଵ ቀ
ଵ
ଵା ௘ష೥ቁ (1 −
ଵ
ଵା௘ష೥) → 0
∆‫݇ݓ‬௜ → 0
Thus as ‫݇ݓ‬௜ increases (‫݇ݓ‬௜ → +∞), then the sequence ∆‫݇ݓ‬௜ decreases and approaches to 0
(∆‫݇ݓ‬௜ → 0). This result satisfies the notion of limit of a sequence that was mentioned earlier.
- If ‫ݓ‬௞ → −∞ then ‫ݖ‬ = ‫ݔ‬௞‫ݓ‬௞ + ‫ݖ‬~
→ −∞
where ‫ݖ‬~
= ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݔ‬ଵ + ⋯ + ‫ݓ‬௞ିଵ‫ݔ‬௞ିଵ + ‫ݓ‬௞ାଵ‫ݔ‬௞ାଵ + ⋯ + ‫ݓ‬௡‫ݔ‬௡
݁ି௭
→ +∞
ଵ
ଵା௘ష೥ → 0
ቀ1 −
ଵ
ଵା௘ష೥ቁ → 1
∆ ‫݇ݓ‬௜ = ‫ܥ‬ଵ ቀ
ଵ
ଵା ௘ష೥ቁ ቀ1 −
ଵ
ଵା௘ష೥ቁ → 0
∆‫݇ݓ‬௜ → 0
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
7
So as ‫݇ݓ‬௜ decreases (‫݇ݓ‬௜ → −∞), then the sequence ∆‫݇ݓ‬௜ decreases and approaches to 0
(∆‫݇ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned in
the previous case when ‫ݓ‬௞ → +∞.
Thus, we proved that ∆‫݇ݓ‬ is a sequence approaches to 0 for large ݅, since it satisfies the notion of
limit of a sequence.
Since ∆‫݇ݓ‬௜ → 0 for large ݅, then:
(‫݇ݓ‬௜ାଵ = ‫݇ݓ‬௜ + ∆‫݇ݓ‬௜) → 0
(‫݇ݓ‬௜ାଵ − ‫݇ݓ‬௜ = ∆‫݇ݓ‬௜) → 0
(‫݇ݓ‬௜ାଵ − ‫݇ݓ‬௜) → 0
(‫݇ݓ‬௜ାଵ → ‫݇ݓ‬௜) for large ݅.
Therefore, both sequences in (2) and (5) approach zero, which implies that the weights in the
neural network are bounded.
3. CONCLUSION
In this notion a proof was introduced that identifies the upper bounds of a neural network
weights. This was achieved by applying the notion of a limit of a sequence on the delta rule
which is part of the gradient descent technique. The result showed that the weights in a neural
classifier are upper bounded (i.e. they do not approach infinity) since the amount of the delta rule
is decreased though training epochs and it approaches zero. This, in turn, minimizes the change
of the weights amounts which satisfies the notion of a limit of a sequence. Such result helps to
understand the behavior of a neural network classifier and explains why a neural network is not
always guaranteed to find the global minimum in the solution surface, and stuck in a local
minimum.
REFERENCES
[1] Bartle, R. G., & Sherbert, D. R. (2000). Introduction to Real Analysis. 3rd ed., Wiley. ISBN:
0471321486.
[2] Cottrell, G. W. (1990). Extracting features from faces using compression networks: Face, identity,
trol, signals, and systems, 2, 303-314.
[3] Kumar, P., Sehgal, K., V. & N., Chauhan, S., D. (2012). A Benchmark to Select Data Mining Based
Classification Algorithms for Business Intelligence and Decision Support Systems. International
Journal of Data Mining & Knowledge Management Process (IJDKP), Vol.2, No.5.
[4] Lang, K. J., Waibel, A. H., & Hinton, G. E. (1990). A time-delay neural network architecture for
isolated word recognition. Neural Networks, 3, 33-43.
[5] LeCun, Y., Boser, B., Denker, J. S., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky
(Ed.), Advances in Neural Information Processing Systems (Vol. 2, pp. 598 - 605).
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014
8
[6] Mathew, L., S. (2013). Integrated Associative Classification and Neural Network Model Enhanced
By Using A statistical Approach. International Journal of Data Mining & Knowledge Management
Process (IJDKP), Vol.3, No.4.
[7] Mitchell, T. (1997). Machine learning. Singapore: McGRAW-HILL.
[8] Negnevitsky, M. (2005). Artificial intelligence: a guide to intelligent systems (2nd ed.). Britain:
Addison-Wesley.
[9] Russell, S., & Norvig, P. (2010). Artificial intelligence: a modern approach (3rd ed.). USA: Prentice
Hall.
AUTHORS
Dr. HAZEM MIGDADY
• A PhD in data mining and machine learning, with an emphasis on inductive
learning from large datasets and patterns.
• Lecturer in the Department of Mathematics and Computer Science.

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9Randa Elanwar
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...cscpconf
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networksParveenMalik18
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamicshirokazutanaka
 
A STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONS
A STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONSA STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONS
A STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONSijaia
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introductionZahra Amini
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologytheijes
 
On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...iosrjce
 
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rulesJAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning ruleshirokazutanaka
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
Quantum brain a recurrent quantum neural network model to describe eye trac...
Quantum brain   a recurrent quantum neural network model to describe eye trac...Quantum brain   a recurrent quantum neural network model to describe eye trac...
Quantum brain a recurrent quantum neural network model to describe eye trac...Elsa von Licy
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks ShwethaShreeS
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Simplilearn
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)Mostafa G. M. Mostafa
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkKnoldus Inc.
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 

Was ist angesagt? (20)

Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9Introduction to Neural networks (under graduate course) Lecture 8 of 9
Introduction to Neural networks (under graduate course) Lecture 8 of 9
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
20120140503023
2012014050302320120140503023
20120140503023
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
 
A STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONS
A STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONSA STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONS
A STDP RULE THAT FAVOURS CHAOTIC SPIKING OVER REGULAR SPIKING OF NEURONS
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introduction
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 
On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...
 
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rulesJAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Quantum brain a recurrent quantum neural network model to describe eye trac...
Quantum brain   a recurrent quantum neural network model to describe eye trac...Quantum brain   a recurrent quantum neural network model to describe eye trac...
Quantum brain a recurrent quantum neural network model to describe eye trac...
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
 
Icmtea
IcmteaIcmtea
Icmtea
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Neural Networks: Introducton
Neural Networks: IntroductonNeural Networks: Introducton
Neural Networks: Introducton
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 

Andere mochten auch

COMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUES
COMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUESCOMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUES
COMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUESIJDKP
 
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...IJDKP
 
A novel algorithm for mining closed sequential patterns
A novel algorithm for mining closed sequential patternsA novel algorithm for mining closed sequential patterns
A novel algorithm for mining closed sequential patternsIJDKP
 
Confidential data identification using
Confidential data identification usingConfidential data identification using
Confidential data identification usingIJDKP
 
Relative parameter quantification in data
Relative parameter quantification in dataRelative parameter quantification in data
Relative parameter quantification in dataIJDKP
 
A data mining approach to predict
A data mining approach to predictA data mining approach to predict
A data mining approach to predictIJDKP
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density predictionIJDKP
 
Crime prediction based on crime types
Crime prediction based on crime typesCrime prediction based on crime types
Crime prediction based on crime typesIJDKP
 
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTIONIMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTIONIJDKP
 
An apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceAn apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceIJDKP
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing theIJDKP
 
Guia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOCGuia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOCAlejandro Videla
 
Opinion mining of customer reviews
Opinion mining of customer reviewsOpinion mining of customer reviews
Opinion mining of customer reviewsIJDKP
 
5 bootstrap-3-m4-slides
5 bootstrap-3-m4-slides5 bootstrap-3-m4-slides
5 bootstrap-3-m4-slidesMasterCode.vn
 
Khyati eye flu b.b.p.s v b 03
Khyati eye flu b.b.p.s   v b    03Khyati eye flu b.b.p.s   v b    03
Khyati eye flu b.b.p.s v b 03khyatimiley
 
Asientos contables
Asientos contablesAsientos contables
Asientos contablesltsaenz
 
Tecnologia de Veículos de Imersão Profunda para o século XXI
Tecnologia de Veículos de Imersão Profunda para o século XXITecnologia de Veículos de Imersão Profunda para o século XXI
Tecnologia de Veículos de Imersão Profunda para o século XXILeonam Guimarães
 

Andere mochten auch (18)

COMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUES
COMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUESCOMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUES
COMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUES
 
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
 
A novel algorithm for mining closed sequential patterns
A novel algorithm for mining closed sequential patternsA novel algorithm for mining closed sequential patterns
A novel algorithm for mining closed sequential patterns
 
Confidential data identification using
Confidential data identification usingConfidential data identification using
Confidential data identification using
 
Relative parameter quantification in data
Relative parameter quantification in dataRelative parameter quantification in data
Relative parameter quantification in data
 
A data mining approach to predict
A data mining approach to predictA data mining approach to predict
A data mining approach to predict
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density prediction
 
Crime prediction based on crime types
Crime prediction based on crime typesCrime prediction based on crime types
Crime prediction based on crime types
 
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTIONIMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
 
An apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceAn apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distance
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing the
 
Guia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOCGuia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOC
 
Opinion mining of customer reviews
Opinion mining of customer reviewsOpinion mining of customer reviews
Opinion mining of customer reviews
 
5 bootstrap-3-m4-slides
5 bootstrap-3-m4-slides5 bootstrap-3-m4-slides
5 bootstrap-3-m4-slides
 
Khyati eye flu b.b.p.s v b 03
Khyati eye flu b.b.p.s   v b    03Khyati eye flu b.b.p.s   v b    03
Khyati eye flu b.b.p.s v b 03
 
Nuclear marine propulsion
Nuclear marine propulsionNuclear marine propulsion
Nuclear marine propulsion
 
Asientos contables
Asientos contablesAsientos contables
Asientos contables
 
Tecnologia de Veículos de Imersão Profunda para o século XXI
Tecnologia de Veículos de Imersão Profunda para o século XXITecnologia de Veículos de Imersão Profunda para o século XXI
Tecnologia de Veículos de Imersão Profunda para o século XXI
 

Ähnlich wie Boundness of a neural network weights using the notion of a limit of a sequence

Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...bihira aggrey
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMapAshish Patel
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)sai anjaneya
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxgnans Kgnanshek
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural NetworkDerek Kane
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationIJCSEA Journal
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...Cemal Ardil
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cseNaveenBhajantri1
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_ASrimatre K
 

Ähnlich wie Boundness of a neural network weights using the notion of a limit of a sequence (20)

Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...
 
Backpropagation.pptx
Backpropagation.pptxBackpropagation.pptx
Backpropagation.pptx
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
 
Perceptron
PerceptronPerceptron
Perceptron
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
071bct537 lab4
071bct537 lab4071bct537 lab4
071bct537 lab4
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Lec10
Lec10Lec10
Lec10
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
6
66
6
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
 

Kürzlich hochgeladen

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 

Kürzlich hochgeladen (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 

Boundness of a neural network weights using the notion of a limit of a sequence

  • 1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 DOI : 10.5121/ijdkp.2014.4301 1 BOUNDNESS OF A NEURAL NETWORK WEIGHTS USING THE NOTION OF A LIMIT OF A SEQUENCE Dr. Hazem Migdady Department of Mathematics and Computer Science, Tafila Technical University, P.O. Box 179, Tafila 66110 ABSTRACT feed forward neural network with backpropagation learning algorithm is considered as a black box learning classifier since there is no certain interpretation or anticipation of the behavior of a neural network weights. The weights of a neural network are considered as the learning tool of the classifier, and the learning task is performed by the repetition modification of those weights. This modification is performed using the delta rule which is mainly used in the gradient descent technique. In this article a proof is provided that helps to understand and explain the behavior of the weights in a feed forward neural network with backpropagation learning algorithm. Also, it illustrates why a feed forward neural network is not always guaranteed to converge in a global minimum. Moreover, the proof shows that the weights in the neural network are upper bounded (i.e. they do not approach infinity). KEYWORDS Data Mining, Delta Rule, Machine Learning, Neural Networks, Gradient Descent. 1. INTRODUCTION Mitchell (1997) argue that “Neural network learning methods provide a robust approach to approximating target functions, and they are among the most effective learning methods currently known”. Moreover, the author believes that “backpropagation learning algorithm has proven surprisingly successful in many practical problems”. LeCun, et al. (1989), Cottrell (1990) and Lang, et al. (1990) provided experimental results support the efficient characteristic of backpropagation learning algorithm with neural networks. A feed forward neural network is considered as a black box since there is no certain interpretation or anticipation of its weights behavior. The weights of a neural network are considered as the learning tool. During the training process of a neural network, the weights are repeatedly modified, since the main characteristic of a neural network is: “those connections between neurons leading to the right answer are strengthened by maximizing their corresponding weights, while those leading to the wrong answer are weaken” (Negnevitsky, 2005).
  • 2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 2 “Each input node is connected with a real-valued constant (i.e. weight) that determines the contribution of that input to the output. Learning a neural network involves choosing values for the weights. Hence, the learning problem faced by Backpropagation is to search a large hypotheses space defined by all possible weight values for all the units in the network” (Mitchell, 1997). The gradient descent technique is among the common techniques that are used to perform the search process that was mentioned by Mitchell (1997) by optimizing the weights of a neural classifier, which is achieved by applying the delta rule that is used to find out the amount by which the current value of a weight will be updated. Mathew (2013) believes that “The most popular neural network algorithm is backpr opagation, a kind of gradient descent method. Backpropagation iteratively process the data set, comparing the network’s prediction for each tuple with actual known target value to find out an acceptable local minimum in the NN weight space in turns achieves the least number of errors”. Moreover, in a related context, Kumar, et al. (2012) mentioned that backpropagation algorithm “learns by recursively processing a set of training tuples, comparing the network’s observed output for each tuple with the actual known class attribute value. For each training tuple, the weights are edited so as to reduce the mean squared error between the network’s output and the actual class label or value. These changes are propagated in backward direction through each hidden layer down to the first hidden layer. After running the process repetitively, the weights will finally converge, and the training process stops”. Even though a neural classifier is considered as a robust learning machine, it is not guaranteed to converge to a global minimum; instead it is possible to stuck in a local minimum. In this notion a proof is provided that helps to understand the characteristics and the nature of a neural network weights, which in turn can be used to interpret the whole behavior of a neural classifier. The remainder of this paper is organized as the following: the next section mentions the concept of the gradient descent technique. This section contains two subsections show that the delta rules for the input and hidden weights approaches zero. The paper ends with the conclusion. 2. GRADIENT DESCENT TECHNIQUE AND THE DELTA RULE In this section the gradient descent technique and the delta rule will be introduced. We will show that the delta rule approaches zero. Hence, taking into consideration that the delta rule is used to update the values of the weights in a neural classifier, this implies that the weights are upper bounded and they do not approach infinity. The learning task in a feed forward neural network with backpropagation training algorithm is achieved by the optimizing of the neural network weights in order to fit the data points in a dataset. This can be achieved by applying the gradient descent technique which mainly based on choosing the weights that minimize the error (i.e. the difference between the actual output and the target). Russell and Norvig (2010) provided some details about the error function and called it as: Loss function. According to Negnevitsky (2005) Equation (1) below shows the major equation to update a neural network weights.
  • 3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 3 ‫ݓ‬௜ାଵ = ‫ݓ‬௜ + ∆‫ݓ‬௜ (1) where ‫ݓ‬௜ is the weight at the current training iteration ݅, and ‫ݓ‬௜ାଵ is the weight at next training iteration ݅ + 1. ∆‫ݓ‬௜ is the amount of change in ‫ݓ‬௜. ∆‫ݓ‬௜ is known as the delta rule which is used to update the weights in the neural network. The delta rule is not identical for all weights in the neural network, since there are two kinds of weights: (1) Input Weights (connect the input layer to the output layer) and (2) Hidden Weights (connect the hidden layer to the output layer). (1) DELTA RULE FOR HIDDEN WEIGHTS Equation (2) below illustrates the delta rule ∆‫ݓ‬௝ that is used to update the hidden weights in a feed forward neural network. ∆‫ݓ‬௝ = ߙ (‫ݕ‬஽ − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺) ‫ݕ‬௝௪ೕ (2) where ‫ݕ‬஽ and ‫ݕ‬஺ are the target and the actual outputs on the output layer respectively, where ‫ݕ‬஽ ∈ ሼ0,1ሽ. ‫ݕ‬௝ is the output of the hidden neuron ݆. ‫ݓ‬௝ is the hidden weight that connects the hidden neuron ݆ to the output layer in the neural network. ߙ is a small positive value that is known as the learning rate. ‫ݕ‬஺ is calculated using the logistic regression function as equation (3) illustrates. ‫ݕ‬஺ = 1 1 + ݁ି௭ (3) where ‫ݖ‬ is a multiple regression function, ‫ݕ‬ଵ … ‫ݕ‬௡ are the outputs of the hidden neurons, and ‫ݓ‬ଵ … ‫ݓ‬௡ are the weights that connect the hidden layer to the output layer. The major aim here is to prove that the delta rule in (2) approaches zero, since this implies that the weights will be stable in some level, which means that the weights are bounded and they do not approach infinity. Such feature causes the neural classifier to stuck in a local minimum or a global minima. As mentioned before, the desired output takes two values only: ‫ݕ‬஽ ∈ ሼ0,1ሽ. (1) when ‫ݕ‬஽ = 1, there are three cases: ‫ݕ‬௝ = 0, ‫ݕ‬௝ = 1, and ‫ݕ‬௝ ∈ (0,1). Now let ‫݆ݓ‬௜ be the weight of the hidden output ‫ݕ‬௝ at iteration ݅. Thus: ∆‫݆ݓ‬௜ = ߙ ‫ݕ‬௝ (1 − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺) (4) = ߙ ‫ݕ‬௝ (1 − ‫ݕ‬஺)ଶ ‫ݕ‬஺ = ߙ ‫ݕ‬௝ ‫ݕ‬஺ (‫ݕ‬஺ − 1)ଶ = ‫ܥ‬ଵ ‫ݕ‬஺ (‫ݕ‬஺ − 1)ଶ where ‫ܥ‬ଵ = ߙ ‫ݕ‬௝ = ‫ܥ‬ଵ ቀ ଵ ଵ ା ௘ష೥ቁ ቀ ଵ ଵ ା௘ష೥ − 1ቁ ଶ Case (I): when ‫ݕ‬௝ ∈ (0,1) => ‫ܥ‬ଵ = ߙ ‫ݕ‬௝ is positive (note that ߙ is a small positive value such that ߙ = 0.1, 0.01, 0.001)
  • 4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 4 - if ‫ݓ‬௝ → +∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~ → +∞ where ‫ݖ‬~ = ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݕ‬ଵ + ⋯ + ‫ݓ‬௝ିଵ‫ݕ‬௝ିଵ + ‫ݓ‬௝ାଵ‫ݕ‬௝ାଵ + ⋯ + ‫ݓ‬௡‫ݕ‬௡ ݁ି௭ → 0 ଵ ଵା௘ష೥ → 1 ( ଵ ଵା௘ష೥ − 1) → 0 ∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ ଵ ଵା ௘ష೥ቁ ቀ ଵ ଵା ௘ష೥ − 1ቁ ଶ → 0 ∆‫݆ݓ‬௜ → 0 Thus, as ‫݆ݓ‬௜ increases (‫݆ݓ‬௜ → +∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0 (∆‫݆ݓ‬௜ → 0). This result can be used with the notion of limit of a sequence, which is considered as the most basic concept among different concepts of limit in real analysis see Bartle and Sherbert (2000). According to Bartle and Sherbert (2000), the notion of a limit of a sequence implies that: when a sequence converges to 0, then for each ߝ > 0 ∃ ݊∗ ∈ ℕ such that if ݅ ≥ ݊∗ then |∆‫݆ݓ‬௜ − 0| < ߝ. That means for any ߝ > 0 there is some number ݊∗ such that all the terms of the sequence ∆‫݆ݓ‬௜ that are produced after the term ∆‫݆ݓ‬௡∗ (i.e. ∆‫݆ݓ‬௡∗ାଵ, ∆‫݆ݓ‬௡∗ାଶ, … ) will be less than ߝ. Hence, if ߝ = 1 × 10ିଵ଴ for instance, then there is ݊∗ ∈ ℕ such that all the terms ∆‫݆ݓ‬௡∗ାଵ, ∆‫݆ݓ‬௡∗ାଶ, … are less than ߝ. In other words ∆‫݆ݓ‬ will not exceed 1 × 10ିଵ଴ for any term that is produced after the term ݊∗ . - if ‫݆ݓ‬ → −∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~ → −∞ where ‫ݖ‬~ = ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݕ‬ଵ + ⋯ + ‫ݓ‬௝ିଵ‫ݕ‬௝ିଵ + ‫ݓ‬௝ାଵ‫ݕ‬௝ାଵ + ⋯ + ‫ݓ‬௡‫ݕ‬௡ ݁ି௭ → +∞ ଵ ଵା௘ష೥ → 0 ቀ ଵ ଵା௘ష೥ − 1ቁ → −1 ∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ ଵ ଵା ௘ష೥ቁ ቀ ଵ ଵା ௘ష೥ − 1ቁ ଶ → 0 ∆‫݆ݓ‬௜ → 0 So as ‫݆ݓ‬௜ decreases (‫݆ݓ‬௜ → −∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0 (∆‫݆ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned earlier when ‫ݓ‬௝ → +∞, and it proves that ∆‫݆ݓ‬ is a sequence approaches 0. Case (II): when ‫ݕ‬௝ = 1 => ‫ܥ‬ଵ = ߙ ‫ݕ‬௝: ߙ ‫ݕ‬௝ ∈ (0,1) (note that ߙ is a small positive value such that ߙ = 0.1, 0.01, 0.001) - if ‫ݓ‬௝ → +∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~ → +∞ ݁ି௭ → 0 where ‫ݖ‬~ = ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݕ‬ଵ + ⋯ + ‫ݓ‬௝ିଵ‫ݕ‬௝ିଵ + ‫ݓ‬௝ାଵ‫ݕ‬௝ାଵ + ⋯ + ‫ݓ‬௡‫ݕ‬௡ ଵ ଵା௘ష೥ → 1 ቀ ଵ ଵା௘ష೥ − 1ቁ → 0 ∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ ଵ ଵା ௘ష೥ቁ ቀ ଵ ଵା ௘ష೥ − 1ቁ ଶ → 0
  • 5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 5 ∆‫݆ݓ‬௜ → 0 So as ‫݆ݓ‬௜ increases (‫݆ݓ‬௜ → +∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0 (∆‫݆ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned earlier. - if ‫ݓ‬௝ → −∞ then ‫ݖ‬ = ‫ݕ‬௝‫ݓ‬௝ + ‫ݖ‬~ → −∞ ݁ି௭ → +∞ ଵ ଵା௘ష೥ → 0 ቀ ଵ ଵା௘ష೥ − 1ቁ → −1 ∆ ‫݆ݓ‬௜ = ‫ܥ‬ଵ ቀ ଵ ଵା ௘ష೥ቁ ቀ ଵ ଵା ௘ష೥ − 1ቁ ଶ → 0 ∆‫݆ݓ‬௜ → 0 So as ‫݆ݓ‬௜ decreases (‫݆ݓ‬௜ → −∞), then the sequence ∆‫݆ݓ‬௜ decreases and approaches to 0 (∆‫݆ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned earlier. Case (III): when ‫ݕ‬௝ = 0 this implies that the delta rule is also 0. Recall the delta rule from equation (2): ∆‫݆ݓ‬௜ = ߙ ‫ݕ‬௝ (1 − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺) ∆‫݆ݓ‬௜ = 0 since ‫ݕ‬௝ = 0 And this also satisfies the notion of a limit of a sequence. (2) when ‫ݕ‬஽ = 0, there are also three cases: ‫ݕ‬௝ = 0, ‫ݕ‬௝ = 1, and ‫ݕ‬௝ ∈ (0,1). Now let ‫݆ݓ‬௜ be the weight of the hidden output ‫ݕ‬௝ at iteration ݅. Thus: ∆‫݆ݓ‬௜ = ߙ ‫ݕ‬௝ (0 − ‫ݕ‬஺) ‫ݕ‬஺(1 − ‫ݕ‬஺) = ߙ ‫ݕ‬௝ (‫ݕ‬஺ ଶ )(‫ݕ‬஺ − 1) = ‫ܥ‬ଵ ‫ݕ‬஺ ଶ (‫ݕ‬஺ − 1) where ‫ܥ‬ଵ = ߙ ‫ݕ‬௝ = ‫ܥ‬ଵ ቀ ଵ ଵ ା௘ష೥ቁ ଶ ቀ ଵ ଵ ା ௘ష೥ − 1ቁ Hence, by using the same argument above when ‫ݕ‬஽ = 1 we will end up that: when ‫ݕ‬௝ ∈ (0,1) or ‫ݕ‬௝ ∈ ሼ0,1ሽ then ∆‫݆ݓ‬௜ → 0 when ‫ݓ‬௝ → +∞ or ‫ݓ‬௝ → −∞. Since ∆‫݆ݓ‬௜ → 0 for large ݅, then: (‫݆ݓ‬௜ାଵ = ‫݆ݓ‬௜ + ∆‫݆ݓ‬௜) → 0 (‫݆ݓ‬௜ାଵ − ‫݆ݓ‬௜ = ∆‫݆ݓ‬௜) → 0 (‫݆ݓ‬௜ାଵ − ‫݆ݓ‬௜) → 0 (‫݆ݓ‬௜ାଵ → ‫݆ݓ‬௜) for large ݅.
  • 6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 6 (2) DELTA RULE FOR INPUT WEIGHTS Equation (5) below illustrates the delta rule ∆‫ݓ‬௞ that is used to update the input weights. ∆‫ݓ‬௞ = ߙ ∙ ‫ݔ‬௞ ቀ‫ݕ‬௝൫1 − ‫ݕ‬௝൯ቁ ቆ෍ ൫∆‫ݓ‬௝൯൫‫ݓ‬௝൯ ௟ ௝ୀଵ ቇ (5) where ‫ݔ‬௞ an input variable, ‫ݕ‬௝ the output of the hidden neuron ݆, and ݈ is the number of the hidden neurons in the neural classifier. Since the delta rule ∆‫ݓ‬௞ in equation (5) depends on the amount ∆‫ݓ‬௝ which approaches zero for large ݅ (as proved earlier), then the delta rule ∆‫ݓ‬௞ also approaches zero for large ݅. Moreover, the hidden output ‫ݕ‬௝ affects the value of the delta rule ∆‫ݓ‬௞, but ‫ݕ‬௝ is a hidden output of the logistic regression function at hidden neuron ݆, thus the values of ‫ݕ‬௝ are: ‫ݕ‬௝ ∈ ሼ0,1ሽ, or ‫ݕ‬௝ ∈ (0,1). Thus, by applying the former argument with the delta rule for hidden weights, it is easy to show that when ‫ݕ‬௝ ∈ ሼ0,1ሽ then ∆‫ݓ‬௞ → 0. By now we need to show that ∆‫ݓ‬௞ approaches 0 when ‫ݕ‬௝ ∈ (0,1): let ‫݇ݓ‬௜ be the weight of the input ‫ݔ‬௞ at iteration ݅. Thus: ∆‫݇ݓ‬௜ = ‫ܥ‬ଵ ‫ݕ‬௝൫1 − ‫ݕ‬௝൯ where ‫ܥ‬ଵ = ߙ ‫ݔ‬௞൫∑ ൫∆‫ݓ‬௝൯൫‫ݓ‬௝൯௟ ௝ୀଵ ൯ ∆‫݇ݓ‬௜ = ‫ܥ‬ଵ ቀ ଵ ଵ ା ௘ష೥ቁ ቀ1 − ଵ ଵ ା ௘ష೥ቁ - If ‫ݓ‬௞ → +∞ then ‫ݖ‬ = ‫ݔ‬௞‫ݓ‬௞ + ‫ݖ‬~ → +∞ where ‫ݖ‬~ = ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݔ‬ଵ + ⋯ + ‫ݓ‬௞ିଵ‫ݔ‬௞ିଵ + ‫ݓ‬௞ାଵ‫ݔ‬௞ାଵ + ⋯ + ‫ݓ‬௡‫ݔ‬௡ ݁ି௭ → 0 ଵ ଵା௘ష೥ → 1 (1 − ଵ ଵା௘ష೥) → 0 ∆ ‫݇ݓ‬௜ = ‫ܥ‬ଵ ቀ ଵ ଵା ௘ష೥ቁ (1 − ଵ ଵା௘ష೥) → 0 ∆‫݇ݓ‬௜ → 0 Thus as ‫݇ݓ‬௜ increases (‫݇ݓ‬௜ → +∞), then the sequence ∆‫݇ݓ‬௜ decreases and approaches to 0 (∆‫݇ݓ‬௜ → 0). This result satisfies the notion of limit of a sequence that was mentioned earlier. - If ‫ݓ‬௞ → −∞ then ‫ݖ‬ = ‫ݔ‬௞‫ݓ‬௞ + ‫ݖ‬~ → −∞ where ‫ݖ‬~ = ‫ݓ‬଴ + ‫ݓ‬ଵ‫ݔ‬ଵ + ⋯ + ‫ݓ‬௞ିଵ‫ݔ‬௞ିଵ + ‫ݓ‬௞ାଵ‫ݔ‬௞ାଵ + ⋯ + ‫ݓ‬௡‫ݔ‬௡ ݁ି௭ → +∞ ଵ ଵା௘ష೥ → 0 ቀ1 − ଵ ଵା௘ష೥ቁ → 1 ∆ ‫݇ݓ‬௜ = ‫ܥ‬ଵ ቀ ଵ ଵା ௘ష೥ቁ ቀ1 − ଵ ଵା௘ష೥ቁ → 0 ∆‫݇ݓ‬௜ → 0
  • 7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 7 So as ‫݇ݓ‬௜ decreases (‫݇ݓ‬௜ → −∞), then the sequence ∆‫݇ݓ‬௜ decreases and approaches to 0 (∆‫݇ݓ‬௜ → 0). This result also satisfies the notion of limit of a sequence which was mentioned in the previous case when ‫ݓ‬௞ → +∞. Thus, we proved that ∆‫݇ݓ‬ is a sequence approaches to 0 for large ݅, since it satisfies the notion of limit of a sequence. Since ∆‫݇ݓ‬௜ → 0 for large ݅, then: (‫݇ݓ‬௜ାଵ = ‫݇ݓ‬௜ + ∆‫݇ݓ‬௜) → 0 (‫݇ݓ‬௜ାଵ − ‫݇ݓ‬௜ = ∆‫݇ݓ‬௜) → 0 (‫݇ݓ‬௜ାଵ − ‫݇ݓ‬௜) → 0 (‫݇ݓ‬௜ାଵ → ‫݇ݓ‬௜) for large ݅. Therefore, both sequences in (2) and (5) approach zero, which implies that the weights in the neural network are bounded. 3. CONCLUSION In this notion a proof was introduced that identifies the upper bounds of a neural network weights. This was achieved by applying the notion of a limit of a sequence on the delta rule which is part of the gradient descent technique. The result showed that the weights in a neural classifier are upper bounded (i.e. they do not approach infinity) since the amount of the delta rule is decreased though training epochs and it approaches zero. This, in turn, minimizes the change of the weights amounts which satisfies the notion of a limit of a sequence. Such result helps to understand the behavior of a neural network classifier and explains why a neural network is not always guaranteed to find the global minimum in the solution surface, and stuck in a local minimum. REFERENCES [1] Bartle, R. G., & Sherbert, D. R. (2000). Introduction to Real Analysis. 3rd ed., Wiley. ISBN: 0471321486. [2] Cottrell, G. W. (1990). Extracting features from faces using compression networks: Face, identity, trol, signals, and systems, 2, 303-314. [3] Kumar, P., Sehgal, K., V. & N., Chauhan, S., D. (2012). A Benchmark to Select Data Mining Based Classification Algorithms for Business Intelligence and Decision Support Systems. International Journal of Data Mining & Knowledge Management Process (IJDKP), Vol.2, No.5. [4] Lang, K. J., Waibel, A. H., & Hinton, G. E. (1990). A time-delay neural network architecture for isolated word recognition. Neural Networks, 3, 33-43. [5] LeCun, Y., Boser, B., Denker, J. S., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems (Vol. 2, pp. 598 - 605).
  • 8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.3, May 2014 8 [6] Mathew, L., S. (2013). Integrated Associative Classification and Neural Network Model Enhanced By Using A statistical Approach. International Journal of Data Mining & Knowledge Management Process (IJDKP), Vol.3, No.4. [7] Mitchell, T. (1997). Machine learning. Singapore: McGRAW-HILL. [8] Negnevitsky, M. (2005). Artificial intelligence: a guide to intelligent systems (2nd ed.). Britain: Addison-Wesley. [9] Russell, S., & Norvig, P. (2010). Artificial intelligence: a modern approach (3rd ed.). USA: Prentice Hall. AUTHORS Dr. HAZEM MIGDADY • A PhD in data mining and machine learning, with an emphasis on inductive learning from large datasets and patterns. • Lecturer in the Department of Mathematics and Computer Science.