13 intelligent systems-neural_networks

STI Innsbruck
STI InnsbruckSTI Innsbruck
1
© Copyright 2010 Dieter Fensel and Reto Krummenacher
Intelligent Systems
Neural Networks
2
Where are we?
# Title
1 Introduction
2 Propositional Logic
3 Predicate Logic
4 Reasoning
5 Search Methods
6 CommonKADS
7 Problem-Solving Methods
8 Planning
9 Software Agents
10 Rule Learning
11 Inductive Logic Programming
12 Formal Concept Analysis
13 Neural Networks
14 Semantic Web and Services
3
Agenda
• Motivation
• Technical Solution
– (Artificial) Neural Networks
– Perceptron
– Artificial Neural Network Structures
– Learning and Generalization
– Expressiveness of Multi-Layer Perceptrons
• Illustration by Larger Examples
• Summary
4
MOTIVATION
4
5
Motivation
• A main motivation behind neural networks is the fact that
symbolic rules do not reflect reasoning processes performed
by humans.
• Biological neural systems can capture highly parallel
computations based on representations that are distributed
over many neurons.
• They learn and generalize from training data; no need for
programming it all...
• They are very noise tolerant – better resistance than symbolic
systems.
6
Motivation
• Neural networks are stong in:
– Learning from a set of examples
– Optimizing solutions via constraints and cost functions
– Classification: grouping elements in classes
– Speech recognition, pattern matching
– Non-parametric statistical analysis and regressions
7
TECHNICAL SOLUTIONS
7
8
(Artificial) Neural Networks
8
9
What are Neural Networks?
• A biological neural network is composed of groups of
chemically connected or functionally associated neurons.
• A single neuron may be connected to many other neurons
and the total number of neurons and connections in a network
may be extensive.
• Neurons are highly specialized cells that transmit impulses
within animals to cause a change in a target cell such as a
muscle effector cell or a glandular cell.
• Impulses are noisy “spike trains” of electronical potential.
• Apart from the electrical signaling, neurotransmitter diffusion
(endogenous chemicals) can effectuate the impulses.
10
What are Neural Networks?
• Connections, called synapses, are usually formed from
axons to dendrites; also other connections are possible.
• The axon is the primary conduit through which the neuron
transmits impulses to neurons downstream in the signal chain
• A human has 1011 neurons of > 20 types, 1014 synapses, 1ms-
10ms cycle time.
• There are 1011 in a galaxy, and 1011 galaxies in the universe.
• Not only the impulse carries value
but also the frequency of
oscillation, hence neural
networks have an even
higher complexity.
11
What are Neural Networks?
• Metaphorically speaking, a thought is a specific self-oscillation
of a network of neurons.
• The topology of the network determines its resonance.
• However, it is the resonance in the brain's interaction with the
environment and with itself that creates, reinforces or
decouples interaction patterns.
• The brain is not a static device, but a device that is created
through usage…
12
What are Neural Networks?
• What we refer to as Neural Networks, in this course, are
mostly Artificial Neural Networks (ANN).
• ANN are approximations of biological neural networks and are
built of physical devices, or simulated on computers.
• ANN are parallel computational entities that consist of multiple
simple processing units that are connected in specific ways in
order to perform the desired tasks.
• Remember: ANN are computationally primitive
approximations of the real biological brains.
13
Perceptron
13
14
Artificial Neurons - McCulloch-Pitts „Unit“
• Output is a „squashed“ linear function of the inputs:
• A clear oversimplification of real neurons, but its purpose is to
develop understanding of what networks of simple units can
do.
15
Activation Functions
(a) is a step function or threshold function, signum function
(b) is a sigmoid function 1/(1+e-x)
• Changing the bias weight W0,i moves the threshold location
16
Perceptron
• McCulloch-Pitts neurons can be connected together in any
desired way to build an artificial neural network.
• A construct of one input layer of neurons that feed forward to
one output layer of neurons is called Perceptron.
Perceptron Network
Ij Wj O
Single Perceptron
Ij Wj,i Oi
Input units Output units Input units Ouput unit
17
Expressiveness of Perceptrons
• A perceptron with g = step function can model Boolean
functions and linear classification:
– As we will see, a perceptron can represent AND, OR, NOT,
but not XOR
• A perceptron represents a linear separator for the input space
∑jWjaj > 0
a1 a2 a1 a2 a1 a2
a1
a1 a1
a2 a2 a2
18
Expressiveness of Perceptrons (2)
• Threshold perceptrons can represent only linearly separable
functions (i.e. functions for which such a separation
hyperplane exists)
• Such perceptrons have limited expressivity.
• There exists an algorithm that can fit a threshold perceptron to
any linearly separable training set.
a2
a1
0
0
19
Example: Logical Functions
• McCulloch and Pitts: Boolean function can be implemented
with a artificial neuron (not XOR).
W0 = 1.5
W1 = 1
W2 = 1
AND
a1
a2
a0
W0 = 0.5
W1 = 1
W2 = 1
OR
a1
a2
a0
W0 = -0.5
W1 = -1
NOT
a0
a1
20
Example: Finding Weights for AND Operation
• There are two input weights W1 and W2 and a treshold W0. For each
training pattern the perceptron needs to satisfay the following
equation:
out = g(W1*a1 + W2*a2 – W0) = sign(W1*a1 + W2*a2 – W0)
• For a binary AND there are four training data items available that
lead to four inequalities:
– W1*0 + W2*0 – W0 < 0 ⇒ W0 > 0
– W1*0 + W2*1 – W0 < 0 ⇒ W2 < W0
– W1*1 + W2*0 – W0 < 0 ⇒ W1 < W0
– W1*1 + W2*1 – W0 ≥ 0 ⇒ W1 + W2 ≥ W0
• There is an obvious infinite number of solutions that realize a logical
AND; e.g. W1 = 1, W2 = 1 and W0 = 1.5.
21
Limitations of Simple Perceptrons
• XOR:
– W1*0 + W2*0 – W0 < 0 ⇒ W0 > 0
– W1*0 + W2*1 – W0 ≥ 0 ⇒ W2 ≥ W0
– W1*1 + W2*0 – W0 ≥ 0 ⇒ W1 ≥ W0
– W1*1 + W2*1 – W0 < 0 ⇒ W1 + W2 < W0
• The 2nd and 3rd inequalities are not compatible with inequality 4,
and there is no solution to the XOR problem.
• XOR requires two separation hyperplanes!
• There is thus a need for more complex networks that combine
simple perceptrons to address more sophisticated classification
tasks.
22
Artificial Neural Network Structures
22
23
Neural Network Structures
• Mathematically artificial neural networks are represented by
weighted directed graphs.
• In more practical terms, a neural network has activations
flowing between processing units via one-way connections.
• There are three common artificial neural network architectures
known:
– Single-Layer Feed-Forward (Perceptron)
– Multi-Layer Feed-Forward
– Recurrent Neural Network
24
Single-Layer Feed-Forward
• A Single-Layer Feed-Forward Structure is a simple
perceptron, and has thus
– one input layer
– one output layer
– NO feed-back connections
• Feed-forward networks implement functions, have no
internal state (of course also valid for multi-layer
perceptrons).
25
Single-Layer Feed-Forward: Example
• Output units all operate separately – no shared weights
• Adjusting weights moves the location, orientation, and
steepness of cliff (i.e., the separation hyperplane).
Wj,i
Input
units
Output
units
a1
a2
26
Multi-Layer Feed-Forward
• Multi-Layer Feed-Forward Structures
have:
– one input layer
– one output layer
– one or many hidden layers of processing units
• The hidden layers are between the input and the output layer,
and thus hidden from the outside world: no input from the
world, not output to the world.
27
Multi-Layer Feed-Forward (2)
• Multi-Layer Perceptrons (MLP) have fully connected layers.
• The numbers of hidden units is typically chosen by hand; the
more layers, the more complex the network (see Step 2 of
Building Neural Networks)
• Hidden layers enlarge the space of hypotheses that the
network can represent.
• Learning done by back-propagation algorithm → errors are
back-propagated from the output layer to the hidden layers.
28
Simple MLP Example
• XOR Problem: Recall that XOR cannot be modeled with a
Single-Layer Feed-Forward perceptron.
3
2
1
29
Expressiveness of MLPs
• 1 hidden layer can represent all continuous functions
• 2 hidden layers can represent all functions
• Combining two opposite-facing threshold functions makes a
ridge.
• Combining two perpendicular ridges makes a bump.
• Add bumps of various sizes and locations to fit any surface.
• The more hidden units, the more bumps.
30
Number of Hidden Layers
• Rule of Thumb 1
Even if the function to learn is slightly non-linear, the
generalization may be better with a simple linear model than
with a complicated non-linear model; if there is too little data
or too much noise to estimate the non-linearities accurately.
• Rule of Thumb 2
If there is only one input, there seems to be no advantage to
using more than one hidden layer; things get much more
complicated when there are two or more inputs.
31
Example: Number of Hidden Layers
1st layer draws
linear boundaries
2nd layer combines
the boundaries.
3rd layer can generate
arbitrarily boundaries.
32
Recurrent Network
• Recurrent networks have at least one feedback connection:
– They have directed cycles with delays: they have internal
states (like flip flops), can oscillate, etc.
– The response to an input depends on the initial state which
may depend on previous inputs.
– This creates an internal state of the network which allows it
to exhibit dynamic temporal behaviour; offers means to
model short-time memory
33
Building Neural Networks
• Building a neural network for particular problems requires
multiple steps:
1. Determine the input and outputs of the problem;
2. Start from the simplest imaginable network, e.g. a single
feed-forward perceptron;
3. Find the connection weights to produce the required
output from the given training data input;
4. Ensure that the training data passes successfully, and
test the network with other training/testing data;
5. Go back to Step 3 if performance is not good enough;
6. Repeat from Step 2 if Step 5 still lacks performance; or
7. Repeat from Step 1 if the network in Step 6 does still not
perform well enough.
34
Learning and Generalization
34
35
Learning and Generalization
• Neural networks have two important aspects to fulfill:
– They must learn decision surfaces from training data, so that
training data (and test data) are classified correctly;
– They must be able to generalize based on the learning process,
in order to classify data sets it has never seen before.
• Note that there is an important trade-off between the learning
behavior and the generalization of a neural network (called over-
fitting)
• The better a network learns to successfully classify a training
sequence (that might contain errors) the less flexible it is with
respect to arbitrary data.
36
Learning vs. Generalization
• Noise in the actual data is never a good thing, since it limits the
accuracy of generalization that can be achieved no matter how
extensive the training set is.
• Non-perfect learning is better in this case!
• However, injecting artificial noise (so-called jitter) into the inputs
during training is one of several ways to improve generalization
„Perfect“ learning achieves the dotted separation, while the desired one
is in fact given by the solid line.
Regression Classification
37
Estimation of Generalization Error
• There are many methods for estimating generalization errors.
• Single-sample statistics
– Statistical theory provides estimators for the generalization error
in non-linear models with a "large" training set.
• Split-sample or hold-out validation.
– The most commonly used method: reserve some data as a "test
set”, which must not be used during training.
– The test set must represent the cases that the ANN should
generalize to.
– A re-run with the random test set provides an unbiased estimate
of the generalization error.
– The disadvantage of split-sample validation is that it reduces the
amount of data available for both training and validation.
38
Estimation of Generalization Error
• Cross-validation (e.g., leave one out).
– In k-fold cross-validation the data is divided into k subsets, and
the network is trained k times, each time leaving one subset out
for computing the error.
– The “crossing” makes this method an improvement over split-
sampling; it allows all data to be used for training.
– The disadvantage of cross-validation is that the network must be
re-trained many times (k times in k-fold crossing).
• Bootstrapping.
– Bootstrapping works on random sub-samples (random shares)
that are chosen from the full data set.
– Any data item may be selected any number of times for
validation.
– The sub-samples are repeatedly analyzed.
• No matter which method is applied, the estimate of the
generalization error of the best network will be optimistic.
39
Algorithm for ANN Learning
• Learning is based on training data, and aims at appropriate
weights for the perceptrons in a network.
• Direct computation is in the general case not feasible.
• An initial random assignment of weights simplifies the learning
process that becomes an iterative adjustment process.
• In the case of single perceptrons, learning becomes the
process of moving hyperplanes around; parametrized over
time t: Wi(t+1) = Wi(t) + ΔWi(t)
40
Perceptron Learning
• The squared error for an example with input x and desired output y
is:
• Perform optimization search by gradient descent:
• Simple weight update rule:
• Positive error ⇒ increase network output
• increase weights on positive inputs,
• decrease on negative inputs
2
2
))
(
(
2
1
2
1
x
g
y
Err
E W



))
(
( 0 j
j
n
j
j
j
j
x
W
g
y
W
Err
W
Err
Err
W
E













j
x
in
g
Err 


 )
(
'
j
j
j x
in
g
Err
W
W 



 )
(
'

41
Perceptron Learning (2)
• The weight updates need to be applied repeatedly for each weight
Wj in the network, and for each training suite in the training set.
• One such cycle through all weighty is called an epoch of training.
• Eventually, mostly after many epochs, the weight changes converge
towards zero and the training process terminates.
• The perceptron learning process always finds a set of weights for a
perceptron that solves a problem correctly with a finite number of
epochs, if such a set of weights exists.
• If a problem can be solved with a separation hyperplane, then the
set of weights is found in finite iterations and solves the problem
correctly.
42
Perceptron Learning (3)
• Perceptron learning rule converges to a consistent function for
any linearly separable data set
• Perceptron learns majority function easily, Decision-Tree is
hopeless
• Decision-Tree learns restaurant function easily, perceptron
cannot represent it.
43
Example Functions
• Majority Function: the output is false when n/2 or more inputs are
false, and true otherwise.
• Restaurant Function: decide whether to wait for a table or not:
– How many guests (patron) are in the restaurant?
– What is the estimated waiting time?
– Any alternative restaurant nearby?
– Are we hungry?
– Do we have a reservation?
– Is it Friday or Saturday?
– Is there a comfortable bar area
to wait in?
– Is it raining outside?
44
Perceptron Learning (3)
• Perceptron learning rule converges to a consistent function for
any linearly separable data set
• Perceptron learns majority function easily, Decision-Tree is
hopeless
• Decision-Tree learns restaurant function easily, perceptron
cannot represent it.
45
Back-Propagation Learning
• The errors (and therefore the learning) propagate backwards from
the output layer to the hidden layers.
• Learning at the output layer is the same as for single-layer
perceptron:
• Hidden layer neurons get a "blame" assigned for the error (back-
propagation of error), giving greater responsibility to neurons
connected by stronger weight.
• Back-propagation of error updates the weights of the hidden layer;
the principle thus stays the same.
j
j
j x
in
g
Err
W
W 



 )
(
'

46
Back-Propagation Learning (2)
• Training curve for 100 restaurant examples converges to a
perfect fit to the training data
• MLPs are quite good for complex pattern recognition tasks,
but resulting hypotheses cannot be understood easily
• Typical problems: slow convergence, local minima
47
ILLUSTRATION BY AN
EXAMPLE
47
48
Perceptron Learning Example
• The applied algorithm is as follows
– Initialize the weights and threshold to small random numbers.
– Present a vector x to the neuron inputs and calculate the
output.
– Update the weights according to the error.
• Applied learning function:
• Example with two inputs x1, x2
j
W
j
j x
x
g
y
t
W
t
W 




 ))
(
(
)
(
)
1
( 
1
o
+
1
0
+
x2
x1
o
49
Perceptron Learning Example
• Data: (0,0)0, (1,0)0, (0,1)1, (1,1)1
• Initialization: W1(0) = 0.9286, W2(0) = 0.62, W0(0) = 0.2236, α = 0.1
• Training – epoch 1:
out1 = sign(0.92*0 + 0.62*0 – 0.22) = sign(-0.22) = 0
out2 = sign(0.92*1 + 0.62*0 – 0.22) = sign(0.7) = 1
W1(1) = 0.92 + 0.1 * (0 – 1) * 1 = 0.82
W2(1) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62
W0(1) = 0.22 + 0.1 * (0 – 1) * (-1)= 0.32
out3 = sign(0.82*0 + 0.62*1 – 0.32) = sign(0.5) = 1
out4 = sign(0.82*1 + 0.62*1 – 0.32) = 1
X
50
Perceptron Learning Example
• Training – epoch 2:
out1 = sign(0.82*0 + 0.62*0 – 0.32) = sign(0.32) = 0
out2 = sign(0.82*1 + 0.62*0 – 0.32) = sign(0.5) = 1
W1(2) = 0.82 + 0.1 * (0 – 1) * 1 = 0.72
W2(2) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62
W0(2) = 0.32 + 0.1 * (0 – 1) * (-1)= 0.42
out3 = sign(0.72*0 + 0.62*1 – 0.42) = sign(0.3) = 1
out4 = sign(0.72*1 + 0.62*1 – 0.42) = 1
• Training – epoch 3:
out1 = sign(0.72*0 + 0.62*0 – 0.42) = 0
out2 = sign(0.72*1 + 0.62*0 – 0.42) = 1
W1(3) = 0.72 + 0.1 * (0 – 1) * 1 = 0.62
W2(3) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62
W0(3) = 0.42 + 0.1 * (0 – 1) * (-1)= 0.52
X
X
51
Perceptron Learning Example
out3 = sign(0.62*0 + 0.62*1 – 0.52) = 1
out4 = sign(0.62*1 + 0.62*1 – 0.52) = 1
• Training – epoch 4:
out1 = sign(0.62*0 + 0.62*0 – 0.52) = 0
out2 = sign(0.62*1 + 0.62*0 – 0.52) = 1
W1(4) = 0.62 + 0.1 * (0 – 1) * 1 = 0.52
W2(4) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62
W0(4) = 0.52 + 0.1 * (0 – 1) * (-1)= 0.62
out3 = sign(0.52*0 + 0.62*1 – 0.62) = 0
W1(4) = 0.52 + 0.1 * (1 – 0) * 0 = 0.52
W2(4) = 0.62 + 0.1 * (1 – 0) * 1 = 0.72
W0(4) = 0.62 + 0.1 * (1 – 0) * (-1)= 0.52
out4 = sign(0.52*1 + 0.72*1 – 0.52) = 1
X
X
52
Perceptron Learning Example
• Finally:
out1 = sign(0.12*0 + 0.82*0 – 0.42) = 0
out2 = sign(0.12*1 + 0.82*0 – 0.42) = 0
out3 = sign(0.12*0 + 0.82*1 – 0.42) = 1
out4 = sign(0.12*1 + 0.82*1 – 0.42) = 1
+ +
1
o 1
0
x2
x1
o
+ +
1
o 1
0
x2
x1
o
53
Further Application Examples
• There are endless further examples: :
– Handwriting Recognition
– Time Series Prediction
– Kernel Machines (Support Vectore Machines)
– Data Compression
– Financial Predication
– Speech Recognition
– Computer Vision
– Protein Structures
– ...
54
SUMMARY
54
55
Summary
• Most brains have lots of neurons, each neuron approximates
a linear-threshold unit.
• Perceptrons (one-layer networks) approximate neurons, but
are as such insufficiently expressive.
• Multi-layer networks are sufficiently expressive; can be trained
to deal with generalized data sets, i.e. via error back-
propagation.
• Multi-layer networks allow for the modeling of arbitrary
separation boundaries, while single-layer perceptrons only
provide linear boundaries.
• Endless number of applications: Handwriting Recognition,
Time Series Prediction, Bioinformatics, Kernel Machines
(Support Vectore Machines), Data Compression, Financial
Predication, Speech Recognition, Computer Vision, Protein
Structures...
56
REFERENCES
56
57
References
Mandatory Readings:
• McCulloch, W.S. & Pitts, W. (1943): A Logical Calculus of the Ideas
Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5, pp.
115-133.
• Rosenblatt, F. (1958): The perceptron: a probabilistic model for information
storage and organization in the brain. Psychological Reviews 65, pp. 386-
408.
Further Readings:
• Elman, J. L. (1990): Finding structure in time. Cognitive Science 14,
pp.179–211.
• Gallant, S. I. (1990): Perceptron-based learning algorithms. IEEE
Transactions on Neural Networks 1 (2), pp. 179-191.
• Rumelhart, D.E., Hinton, G. E. & Williams, R. J. (1986): Learning
representations by back-propagating errors. Nature 323, pp. 533-536.
• Supervised learning demo (perceptron learning rule) at
http://lcn.epfl.ch/tutorial/english/perceptron/html/
58
References
Wikipedia References:
• http://en.wikipedia.org/wiki/Biological_neural_networks
• http://en.wikipedia.org/wiki/Artificial_neural_network
• http://en.wikipedia.org/wiki/Perceptron
• http://en.wikipedia.org/wiki/Feedforward_neural_network
• http://en.wikipedia.org/wiki/Recurrent_neural_networks
• http://en.wikipedia.org/wiki/Back_propagation
59
Next lecture
# Title
1 Introduction
2 Propositional Logic
3 Predicate Logic
4 Reasoning
5 Search Methods
6 CommonKADS
7 Problem-Solving Methods
8 Planning
9 Agents
10 Rule Learning
11 Inductive Logic Programming
12 Formal Concept Analysis
13 Neural Networks
14 Semantic Web and Services
60
60
Questions?
1 von 60

Recomendados

13 intelligent systems-neural_networks von
13 intelligent systems-neural_networks13 intelligent systems-neural_networks
13 intelligent systems-neural_networksSTI Innsbruck
58 views55 Folien
A neuro fuzzy decision support system von
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support systemR A Akerkar
3.4K views15 Folien
Ppt on artifishail intelligence von
Ppt on artifishail intelligencePpt on artifishail intelligence
Ppt on artifishail intelligencesnehal_gongle
2.2K views12 Folien
Neural network von
Neural networkNeural network
Neural networkMahmoud Hussein
811 views54 Folien
Artificial Neural Networks: Applications In Management von
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementIOSR Journals
547 views9 Folien
02 Fundamental Concepts of ANN von
02 Fundamental Concepts of ANN02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANNTamer Ahmed Farrag, PhD
881 views46 Folien

Más contenido relacionado

Was ist angesagt?

deona von
deonadeona
deonaDeona Noble
538 views32 Folien
2011 0480.neural-networks von
2011 0480.neural-networks2011 0480.neural-networks
2011 0480.neural-networksParneet Kaur
38 views119 Folien
Introduction to Neural networks (under graduate course) Lecture 1 of 9 von
Introduction to Neural networks (under graduate course) Lecture 1 of 9Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9Randa Elanwar
2.4K views20 Folien
Introduction to deep learning von
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
370 views68 Folien
Introduction to Interpretable Machine Learning von
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningNguyen Giang
675 views53 Folien
Neuro-fuzzy systems von
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systemsSagar Ahire
24K views14 Folien

Was ist angesagt?(20)

2011 0480.neural-networks von Parneet Kaur
2011 0480.neural-networks2011 0480.neural-networks
2011 0480.neural-networks
Parneet Kaur38 views
Introduction to Neural networks (under graduate course) Lecture 1 of 9 von Randa Elanwar
Introduction to Neural networks (under graduate course) Lecture 1 of 9Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Randa Elanwar2.4K views
Introduction to deep learning von Amr Rashed
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed370 views
Introduction to Interpretable Machine Learning von Nguyen Giang
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
Nguyen Giang675 views
Neuro-fuzzy systems von Sagar Ahire
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systems
Sagar Ahire24K views
Introduction to Neural Networks von Databricks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks21.6K views
Soft Computing von MANISH T I
Soft ComputingSoft Computing
Soft Computing
MANISH T I4.1K views
Introduction to Neural networks (under graduate course) Lecture 3 of 9 von Randa Elanwar
Introduction to Neural networks (under graduate course) Lecture 3 of 9Introduction to Neural networks (under graduate course) Lecture 3 of 9
Introduction to Neural networks (under graduate course) Lecture 3 of 9
Randa Elanwar1.8K views
Introduction to Neural networks (under graduate course) Lecture 9 of 9 von Randa Elanwar
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar1.6K views
X trepan an extended trepan for von ijaia
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
ijaia396 views
Artificial Neural Network von Prakash K
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Prakash K2K views
Artificial Neural Networks - ANN von Mohamed Talaat
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
Mohamed Talaat10.5K views

Similar a 13 intelligent systems-neural_networks

ann-ics320Part4.ppt von
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.pptGayathriRHICETCSESTA
68 views62 Folien
ann-ics320Part4.ppt von
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.pptGayathriRHICETCSESTA
45 views62 Folien
Neural von
NeuralNeural
NeuralVaibhav Shah
802 views32 Folien
Neural networks 1 von
Neural networks 1Neural networks 1
Neural networks 1Vaibhav Shah
1.2K views36 Folien
From neural networks to deep learning von
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
6.9K views95 Folien
UNIT 5-ANN.ppt von
UNIT 5-ANN.pptUNIT 5-ANN.ppt
UNIT 5-ANN.pptSivam Chinna
15 views55 Folien

Similar a 13 intelligent systems-neural_networks(20)

From neural networks to deep learning von Viet-Trung TRAN
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN6.9K views
Artificial neural network by arpit_sharma von Er. Arpit Sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharma
Er. Arpit Sharma217 views
Machine Learning - Neural Networks - Perceptron von Andrew Ferlitsch
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - Perceptron
Andrew Ferlitsch1.3K views
Machine Learning - Introduction to Neural Networks von Andrew Ferlitsch
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural Networks
Andrew Ferlitsch271 views
Introduction to Neural networks (under graduate course) Lecture 2 of 9 von Randa Elanwar
Introduction to Neural networks (under graduate course) Lecture 2 of 9Introduction to Neural networks (under graduate course) Lecture 2 of 9
Introduction to Neural networks (under graduate course) Lecture 2 of 9
Randa Elanwar2.8K views
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ... von Simplilearn
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn1.4K views
Artificial Neural Network_VCW (1).pptx von pratik610182
Artificial Neural Network_VCW (1).pptxArtificial Neural Network_VCW (1).pptx
Artificial Neural Network_VCW (1).pptx
pratik61018213 views
Neural networks von Slideshare
Neural networksNeural networks
Neural networks
Slideshare6.9K views
Basics of Artificial Neural Network von Subham Preetam
Basics of Artificial Neural Network Basics of Artificial Neural Network
Basics of Artificial Neural Network
Subham Preetam66 views

Más de STI Innsbruck

Lecture5a von
Lecture5aLecture5a
Lecture5aSTI Innsbruck
135 views49 Folien
Lecture5 von
Lecture5Lecture5
Lecture5STI Innsbruck
72 views27 Folien
Lecture4a von
Lecture4aLecture4a
Lecture4aSTI Innsbruck
87 views15 Folien
Lecture4 von
Lecture4Lecture4
Lecture4STI Innsbruck
67 views21 Folien
Lecture3 von
Lecture3Lecture3
Lecture3STI Innsbruck
71 views47 Folien
Lecture2 von
Lecture2Lecture2
Lecture2STI Innsbruck
57 views26 Folien

Más de STI Innsbruck(20)

11 -web_application_development_process_and_project_management_ von STI Innsbruck
11  -web_application_development_process_and_project_management_11  -web_application_development_process_and_project_management_
11 -web_application_development_process_and_project_management_
STI Innsbruck101 views
06 testing and-usability_on_the_web von STI Innsbruck
06 testing and-usability_on_the_web06 testing and-usability_on_the_web
06 testing and-usability_on_the_web
STI Innsbruck68 views
04a developing applications-with_web_ml von STI Innsbruck
04a developing applications-with_web_ml04a developing applications-with_web_ml
04a developing applications-with_web_ml
STI Innsbruck57 views

Último

JRN 362 - Lecture Twenty-Two von
JRN 362 - Lecture Twenty-TwoJRN 362 - Lecture Twenty-Two
JRN 362 - Lecture Twenty-TwoRich Hanley
39 views157 Folien
12.5.23 Poverty and Precarity.pptx von
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptxmary850239
514 views30 Folien
The Future of Micro-credentials: Is Small Really Beautiful? von
The Future of Micro-credentials:  Is Small Really Beautiful?The Future of Micro-credentials:  Is Small Really Beautiful?
The Future of Micro-credentials: Is Small Really Beautiful?Mark Brown
102 views35 Folien
Papal.pdf von
Papal.pdfPapal.pdf
Papal.pdfMariaKenney3
73 views24 Folien
Interaction of microorganisms with vascular plants.pptx von
Interaction of microorganisms with vascular plants.pptxInteraction of microorganisms with vascular plants.pptx
Interaction of microorganisms with vascular plants.pptxMicrobiologyMicro
58 views33 Folien
Meet the Bible von
Meet the BibleMeet the Bible
Meet the BibleSteve Thomason
81 views80 Folien

Último(20)

JRN 362 - Lecture Twenty-Two von Rich Hanley
JRN 362 - Lecture Twenty-TwoJRN 362 - Lecture Twenty-Two
JRN 362 - Lecture Twenty-Two
Rich Hanley39 views
12.5.23 Poverty and Precarity.pptx von mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239514 views
The Future of Micro-credentials: Is Small Really Beautiful? von Mark Brown
The Future of Micro-credentials:  Is Small Really Beautiful?The Future of Micro-credentials:  Is Small Really Beautiful?
The Future of Micro-credentials: Is Small Really Beautiful?
Mark Brown102 views
Interaction of microorganisms with vascular plants.pptx von MicrobiologyMicro
Interaction of microorganisms with vascular plants.pptxInteraction of microorganisms with vascular plants.pptx
Interaction of microorganisms with vascular plants.pptx
Nelson_RecordStore.pdf von BrynNelson5
Nelson_RecordStore.pdfNelson_RecordStore.pdf
Nelson_RecordStore.pdf
BrynNelson550 views
Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf von TechSoup
 Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf
Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf
TechSoup 62 views
What is Digital Transformation? von Mark Brown
What is Digital Transformation?What is Digital Transformation?
What is Digital Transformation?
Mark Brown41 views
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx von Niranjan Chavan
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Niranjan Chavan42 views
ANGULARJS.pdf von ArthyR3
ANGULARJS.pdfANGULARJS.pdf
ANGULARJS.pdf
ArthyR352 views
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv... von Taste
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Taste62 views

13 intelligent systems-neural_networks

  • 1. 1 © Copyright 2010 Dieter Fensel and Reto Krummenacher Intelligent Systems Neural Networks
  • 2. 2 Where are we? # Title 1 Introduction 2 Propositional Logic 3 Predicate Logic 4 Reasoning 5 Search Methods 6 CommonKADS 7 Problem-Solving Methods 8 Planning 9 Software Agents 10 Rule Learning 11 Inductive Logic Programming 12 Formal Concept Analysis 13 Neural Networks 14 Semantic Web and Services
  • 3. 3 Agenda • Motivation • Technical Solution – (Artificial) Neural Networks – Perceptron – Artificial Neural Network Structures – Learning and Generalization – Expressiveness of Multi-Layer Perceptrons • Illustration by Larger Examples • Summary
  • 5. 5 Motivation • A main motivation behind neural networks is the fact that symbolic rules do not reflect reasoning processes performed by humans. • Biological neural systems can capture highly parallel computations based on representations that are distributed over many neurons. • They learn and generalize from training data; no need for programming it all... • They are very noise tolerant – better resistance than symbolic systems.
  • 6. 6 Motivation • Neural networks are stong in: – Learning from a set of examples – Optimizing solutions via constraints and cost functions – Classification: grouping elements in classes – Speech recognition, pattern matching – Non-parametric statistical analysis and regressions
  • 9. 9 What are Neural Networks? • A biological neural network is composed of groups of chemically connected or functionally associated neurons. • A single neuron may be connected to many other neurons and the total number of neurons and connections in a network may be extensive. • Neurons are highly specialized cells that transmit impulses within animals to cause a change in a target cell such as a muscle effector cell or a glandular cell. • Impulses are noisy “spike trains” of electronical potential. • Apart from the electrical signaling, neurotransmitter diffusion (endogenous chemicals) can effectuate the impulses.
  • 10. 10 What are Neural Networks? • Connections, called synapses, are usually formed from axons to dendrites; also other connections are possible. • The axon is the primary conduit through which the neuron transmits impulses to neurons downstream in the signal chain • A human has 1011 neurons of > 20 types, 1014 synapses, 1ms- 10ms cycle time. • There are 1011 in a galaxy, and 1011 galaxies in the universe. • Not only the impulse carries value but also the frequency of oscillation, hence neural networks have an even higher complexity.
  • 11. 11 What are Neural Networks? • Metaphorically speaking, a thought is a specific self-oscillation of a network of neurons. • The topology of the network determines its resonance. • However, it is the resonance in the brain's interaction with the environment and with itself that creates, reinforces or decouples interaction patterns. • The brain is not a static device, but a device that is created through usage…
  • 12. 12 What are Neural Networks? • What we refer to as Neural Networks, in this course, are mostly Artificial Neural Networks (ANN). • ANN are approximations of biological neural networks and are built of physical devices, or simulated on computers. • ANN are parallel computational entities that consist of multiple simple processing units that are connected in specific ways in order to perform the desired tasks. • Remember: ANN are computationally primitive approximations of the real biological brains.
  • 14. 14 Artificial Neurons - McCulloch-Pitts „Unit“ • Output is a „squashed“ linear function of the inputs: • A clear oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do.
  • 15. 15 Activation Functions (a) is a step function or threshold function, signum function (b) is a sigmoid function 1/(1+e-x) • Changing the bias weight W0,i moves the threshold location
  • 16. 16 Perceptron • McCulloch-Pitts neurons can be connected together in any desired way to build an artificial neural network. • A construct of one input layer of neurons that feed forward to one output layer of neurons is called Perceptron. Perceptron Network Ij Wj O Single Perceptron Ij Wj,i Oi Input units Output units Input units Ouput unit
  • 17. 17 Expressiveness of Perceptrons • A perceptron with g = step function can model Boolean functions and linear classification: – As we will see, a perceptron can represent AND, OR, NOT, but not XOR • A perceptron represents a linear separator for the input space ∑jWjaj > 0 a1 a2 a1 a2 a1 a2 a1 a1 a1 a2 a2 a2
  • 18. 18 Expressiveness of Perceptrons (2) • Threshold perceptrons can represent only linearly separable functions (i.e. functions for which such a separation hyperplane exists) • Such perceptrons have limited expressivity. • There exists an algorithm that can fit a threshold perceptron to any linearly separable training set. a2 a1 0 0
  • 19. 19 Example: Logical Functions • McCulloch and Pitts: Boolean function can be implemented with a artificial neuron (not XOR). W0 = 1.5 W1 = 1 W2 = 1 AND a1 a2 a0 W0 = 0.5 W1 = 1 W2 = 1 OR a1 a2 a0 W0 = -0.5 W1 = -1 NOT a0 a1
  • 20. 20 Example: Finding Weights for AND Operation • There are two input weights W1 and W2 and a treshold W0. For each training pattern the perceptron needs to satisfay the following equation: out = g(W1*a1 + W2*a2 – W0) = sign(W1*a1 + W2*a2 – W0) • For a binary AND there are four training data items available that lead to four inequalities: – W1*0 + W2*0 – W0 < 0 ⇒ W0 > 0 – W1*0 + W2*1 – W0 < 0 ⇒ W2 < W0 – W1*1 + W2*0 – W0 < 0 ⇒ W1 < W0 – W1*1 + W2*1 – W0 ≥ 0 ⇒ W1 + W2 ≥ W0 • There is an obvious infinite number of solutions that realize a logical AND; e.g. W1 = 1, W2 = 1 and W0 = 1.5.
  • 21. 21 Limitations of Simple Perceptrons • XOR: – W1*0 + W2*0 – W0 < 0 ⇒ W0 > 0 – W1*0 + W2*1 – W0 ≥ 0 ⇒ W2 ≥ W0 – W1*1 + W2*0 – W0 ≥ 0 ⇒ W1 ≥ W0 – W1*1 + W2*1 – W0 < 0 ⇒ W1 + W2 < W0 • The 2nd and 3rd inequalities are not compatible with inequality 4, and there is no solution to the XOR problem. • XOR requires two separation hyperplanes! • There is thus a need for more complex networks that combine simple perceptrons to address more sophisticated classification tasks.
  • 23. 23 Neural Network Structures • Mathematically artificial neural networks are represented by weighted directed graphs. • In more practical terms, a neural network has activations flowing between processing units via one-way connections. • There are three common artificial neural network architectures known: – Single-Layer Feed-Forward (Perceptron) – Multi-Layer Feed-Forward – Recurrent Neural Network
  • 24. 24 Single-Layer Feed-Forward • A Single-Layer Feed-Forward Structure is a simple perceptron, and has thus – one input layer – one output layer – NO feed-back connections • Feed-forward networks implement functions, have no internal state (of course also valid for multi-layer perceptrons).
  • 25. 25 Single-Layer Feed-Forward: Example • Output units all operate separately – no shared weights • Adjusting weights moves the location, orientation, and steepness of cliff (i.e., the separation hyperplane). Wj,i Input units Output units a1 a2
  • 26. 26 Multi-Layer Feed-Forward • Multi-Layer Feed-Forward Structures have: – one input layer – one output layer – one or many hidden layers of processing units • The hidden layers are between the input and the output layer, and thus hidden from the outside world: no input from the world, not output to the world.
  • 27. 27 Multi-Layer Feed-Forward (2) • Multi-Layer Perceptrons (MLP) have fully connected layers. • The numbers of hidden units is typically chosen by hand; the more layers, the more complex the network (see Step 2 of Building Neural Networks) • Hidden layers enlarge the space of hypotheses that the network can represent. • Learning done by back-propagation algorithm → errors are back-propagated from the output layer to the hidden layers.
  • 28. 28 Simple MLP Example • XOR Problem: Recall that XOR cannot be modeled with a Single-Layer Feed-Forward perceptron. 3 2 1
  • 29. 29 Expressiveness of MLPs • 1 hidden layer can represent all continuous functions • 2 hidden layers can represent all functions • Combining two opposite-facing threshold functions makes a ridge. • Combining two perpendicular ridges makes a bump. • Add bumps of various sizes and locations to fit any surface. • The more hidden units, the more bumps.
  • 30. 30 Number of Hidden Layers • Rule of Thumb 1 Even if the function to learn is slightly non-linear, the generalization may be better with a simple linear model than with a complicated non-linear model; if there is too little data or too much noise to estimate the non-linearities accurately. • Rule of Thumb 2 If there is only one input, there seems to be no advantage to using more than one hidden layer; things get much more complicated when there are two or more inputs.
  • 31. 31 Example: Number of Hidden Layers 1st layer draws linear boundaries 2nd layer combines the boundaries. 3rd layer can generate arbitrarily boundaries.
  • 32. 32 Recurrent Network • Recurrent networks have at least one feedback connection: – They have directed cycles with delays: they have internal states (like flip flops), can oscillate, etc. – The response to an input depends on the initial state which may depend on previous inputs. – This creates an internal state of the network which allows it to exhibit dynamic temporal behaviour; offers means to model short-time memory
  • 33. 33 Building Neural Networks • Building a neural network for particular problems requires multiple steps: 1. Determine the input and outputs of the problem; 2. Start from the simplest imaginable network, e.g. a single feed-forward perceptron; 3. Find the connection weights to produce the required output from the given training data input; 4. Ensure that the training data passes successfully, and test the network with other training/testing data; 5. Go back to Step 3 if performance is not good enough; 6. Repeat from Step 2 if Step 5 still lacks performance; or 7. Repeat from Step 1 if the network in Step 6 does still not perform well enough.
  • 35. 35 Learning and Generalization • Neural networks have two important aspects to fulfill: – They must learn decision surfaces from training data, so that training data (and test data) are classified correctly; – They must be able to generalize based on the learning process, in order to classify data sets it has never seen before. • Note that there is an important trade-off between the learning behavior and the generalization of a neural network (called over- fitting) • The better a network learns to successfully classify a training sequence (that might contain errors) the less flexible it is with respect to arbitrary data.
  • 36. 36 Learning vs. Generalization • Noise in the actual data is never a good thing, since it limits the accuracy of generalization that can be achieved no matter how extensive the training set is. • Non-perfect learning is better in this case! • However, injecting artificial noise (so-called jitter) into the inputs during training is one of several ways to improve generalization „Perfect“ learning achieves the dotted separation, while the desired one is in fact given by the solid line. Regression Classification
  • 37. 37 Estimation of Generalization Error • There are many methods for estimating generalization errors. • Single-sample statistics – Statistical theory provides estimators for the generalization error in non-linear models with a "large" training set. • Split-sample or hold-out validation. – The most commonly used method: reserve some data as a "test set”, which must not be used during training. – The test set must represent the cases that the ANN should generalize to. – A re-run with the random test set provides an unbiased estimate of the generalization error. – The disadvantage of split-sample validation is that it reduces the amount of data available for both training and validation.
  • 38. 38 Estimation of Generalization Error • Cross-validation (e.g., leave one out). – In k-fold cross-validation the data is divided into k subsets, and the network is trained k times, each time leaving one subset out for computing the error. – The “crossing” makes this method an improvement over split- sampling; it allows all data to be used for training. – The disadvantage of cross-validation is that the network must be re-trained many times (k times in k-fold crossing). • Bootstrapping. – Bootstrapping works on random sub-samples (random shares) that are chosen from the full data set. – Any data item may be selected any number of times for validation. – The sub-samples are repeatedly analyzed. • No matter which method is applied, the estimate of the generalization error of the best network will be optimistic.
  • 39. 39 Algorithm for ANN Learning • Learning is based on training data, and aims at appropriate weights for the perceptrons in a network. • Direct computation is in the general case not feasible. • An initial random assignment of weights simplifies the learning process that becomes an iterative adjustment process. • In the case of single perceptrons, learning becomes the process of moving hyperplanes around; parametrized over time t: Wi(t+1) = Wi(t) + ΔWi(t)
  • 40. 40 Perceptron Learning • The squared error for an example with input x and desired output y is: • Perform optimization search by gradient descent: • Simple weight update rule: • Positive error ⇒ increase network output • increase weights on positive inputs, • decrease on negative inputs 2 2 )) ( ( 2 1 2 1 x g y Err E W    )) ( ( 0 j j n j j j j x W g y W Err W Err Err W E              j x in g Err     ) ( ' j j j x in g Err W W      ) ( ' 
  • 41. 41 Perceptron Learning (2) • The weight updates need to be applied repeatedly for each weight Wj in the network, and for each training suite in the training set. • One such cycle through all weighty is called an epoch of training. • Eventually, mostly after many epochs, the weight changes converge towards zero and the training process terminates. • The perceptron learning process always finds a set of weights for a perceptron that solves a problem correctly with a finite number of epochs, if such a set of weights exists. • If a problem can be solved with a separation hyperplane, then the set of weights is found in finite iterations and solves the problem correctly.
  • 42. 42 Perceptron Learning (3) • Perceptron learning rule converges to a consistent function for any linearly separable data set • Perceptron learns majority function easily, Decision-Tree is hopeless • Decision-Tree learns restaurant function easily, perceptron cannot represent it.
  • 43. 43 Example Functions • Majority Function: the output is false when n/2 or more inputs are false, and true otherwise. • Restaurant Function: decide whether to wait for a table or not: – How many guests (patron) are in the restaurant? – What is the estimated waiting time? – Any alternative restaurant nearby? – Are we hungry? – Do we have a reservation? – Is it Friday or Saturday? – Is there a comfortable bar area to wait in? – Is it raining outside?
  • 44. 44 Perceptron Learning (3) • Perceptron learning rule converges to a consistent function for any linearly separable data set • Perceptron learns majority function easily, Decision-Tree is hopeless • Decision-Tree learns restaurant function easily, perceptron cannot represent it.
  • 45. 45 Back-Propagation Learning • The errors (and therefore the learning) propagate backwards from the output layer to the hidden layers. • Learning at the output layer is the same as for single-layer perceptron: • Hidden layer neurons get a "blame" assigned for the error (back- propagation of error), giving greater responsibility to neurons connected by stronger weight. • Back-propagation of error updates the weights of the hidden layer; the principle thus stays the same. j j j x in g Err W W      ) ( ' 
  • 46. 46 Back-Propagation Learning (2) • Training curve for 100 restaurant examples converges to a perfect fit to the training data • MLPs are quite good for complex pattern recognition tasks, but resulting hypotheses cannot be understood easily • Typical problems: slow convergence, local minima
  • 48. 48 Perceptron Learning Example • The applied algorithm is as follows – Initialize the weights and threshold to small random numbers. – Present a vector x to the neuron inputs and calculate the output. – Update the weights according to the error. • Applied learning function: • Example with two inputs x1, x2 j W j j x x g y t W t W       )) ( ( ) ( ) 1 (  1 o + 1 0 + x2 x1 o
  • 49. 49 Perceptron Learning Example • Data: (0,0)0, (1,0)0, (0,1)1, (1,1)1 • Initialization: W1(0) = 0.9286, W2(0) = 0.62, W0(0) = 0.2236, α = 0.1 • Training – epoch 1: out1 = sign(0.92*0 + 0.62*0 – 0.22) = sign(-0.22) = 0 out2 = sign(0.92*1 + 0.62*0 – 0.22) = sign(0.7) = 1 W1(1) = 0.92 + 0.1 * (0 – 1) * 1 = 0.82 W2(1) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W0(1) = 0.22 + 0.1 * (0 – 1) * (-1)= 0.32 out3 = sign(0.82*0 + 0.62*1 – 0.32) = sign(0.5) = 1 out4 = sign(0.82*1 + 0.62*1 – 0.32) = 1 X
  • 50. 50 Perceptron Learning Example • Training – epoch 2: out1 = sign(0.82*0 + 0.62*0 – 0.32) = sign(0.32) = 0 out2 = sign(0.82*1 + 0.62*0 – 0.32) = sign(0.5) = 1 W1(2) = 0.82 + 0.1 * (0 – 1) * 1 = 0.72 W2(2) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W0(2) = 0.32 + 0.1 * (0 – 1) * (-1)= 0.42 out3 = sign(0.72*0 + 0.62*1 – 0.42) = sign(0.3) = 1 out4 = sign(0.72*1 + 0.62*1 – 0.42) = 1 • Training – epoch 3: out1 = sign(0.72*0 + 0.62*0 – 0.42) = 0 out2 = sign(0.72*1 + 0.62*0 – 0.42) = 1 W1(3) = 0.72 + 0.1 * (0 – 1) * 1 = 0.62 W2(3) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W0(3) = 0.42 + 0.1 * (0 – 1) * (-1)= 0.52 X X
  • 51. 51 Perceptron Learning Example out3 = sign(0.62*0 + 0.62*1 – 0.52) = 1 out4 = sign(0.62*1 + 0.62*1 – 0.52) = 1 • Training – epoch 4: out1 = sign(0.62*0 + 0.62*0 – 0.52) = 0 out2 = sign(0.62*1 + 0.62*0 – 0.52) = 1 W1(4) = 0.62 + 0.1 * (0 – 1) * 1 = 0.52 W2(4) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W0(4) = 0.52 + 0.1 * (0 – 1) * (-1)= 0.62 out3 = sign(0.52*0 + 0.62*1 – 0.62) = 0 W1(4) = 0.52 + 0.1 * (1 – 0) * 0 = 0.52 W2(4) = 0.62 + 0.1 * (1 – 0) * 1 = 0.72 W0(4) = 0.62 + 0.1 * (1 – 0) * (-1)= 0.52 out4 = sign(0.52*1 + 0.72*1 – 0.52) = 1 X X
  • 52. 52 Perceptron Learning Example • Finally: out1 = sign(0.12*0 + 0.82*0 – 0.42) = 0 out2 = sign(0.12*1 + 0.82*0 – 0.42) = 0 out3 = sign(0.12*0 + 0.82*1 – 0.42) = 1 out4 = sign(0.12*1 + 0.82*1 – 0.42) = 1 + + 1 o 1 0 x2 x1 o + + 1 o 1 0 x2 x1 o
  • 53. 53 Further Application Examples • There are endless further examples: : – Handwriting Recognition – Time Series Prediction – Kernel Machines (Support Vectore Machines) – Data Compression – Financial Predication – Speech Recognition – Computer Vision – Protein Structures – ...
  • 55. 55 Summary • Most brains have lots of neurons, each neuron approximates a linear-threshold unit. • Perceptrons (one-layer networks) approximate neurons, but are as such insufficiently expressive. • Multi-layer networks are sufficiently expressive; can be trained to deal with generalized data sets, i.e. via error back- propagation. • Multi-layer networks allow for the modeling of arbitrary separation boundaries, while single-layer perceptrons only provide linear boundaries. • Endless number of applications: Handwriting Recognition, Time Series Prediction, Bioinformatics, Kernel Machines (Support Vectore Machines), Data Compression, Financial Predication, Speech Recognition, Computer Vision, Protein Structures...
  • 57. 57 References Mandatory Readings: • McCulloch, W.S. & Pitts, W. (1943): A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5, pp. 115-133. • Rosenblatt, F. (1958): The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Reviews 65, pp. 386- 408. Further Readings: • Elman, J. L. (1990): Finding structure in time. Cognitive Science 14, pp.179–211. • Gallant, S. I. (1990): Perceptron-based learning algorithms. IEEE Transactions on Neural Networks 1 (2), pp. 179-191. • Rumelhart, D.E., Hinton, G. E. & Williams, R. J. (1986): Learning representations by back-propagating errors. Nature 323, pp. 533-536. • Supervised learning demo (perceptron learning rule) at http://lcn.epfl.ch/tutorial/english/perceptron/html/
  • 58. 58 References Wikipedia References: • http://en.wikipedia.org/wiki/Biological_neural_networks • http://en.wikipedia.org/wiki/Artificial_neural_network • http://en.wikipedia.org/wiki/Perceptron • http://en.wikipedia.org/wiki/Feedforward_neural_network • http://en.wikipedia.org/wiki/Recurrent_neural_networks • http://en.wikipedia.org/wiki/Back_propagation
  • 59. 59 Next lecture # Title 1 Introduction 2 Propositional Logic 3 Predicate Logic 4 Reasoning 5 Search Methods 6 CommonKADS 7 Problem-Solving Methods 8 Planning 9 Agents 10 Rule Learning 11 Inductive Logic Programming 12 Formal Concept Analysis 13 Neural Networks 14 Semantic Web and Services

Hinweis der Redaktion

  1. Note, a galaxy has 10^11 stars and there are 10^11 galaxies Note, ants have roughly 10^5 neurons; i.e. 1M ants have the brain power of a human