1. Artificial Intelligence Amit purohit
Evidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with the development of the
electronic computer in 1941, the technology finally became available to create machine intelligence. The term
artificial intelligence was first coined in 1956, at the Dartmouth conference, and since then Artificial Intelligence has
expanded because of the theories and principles developed by its dedicated researchers. Through its short modern
history, advancement in the fields of AI have been slower than first estimated, progress continues to be made. From
its birth 4 decades ago, there have been a variety of AI programs, and they have impacted other technological
advancements.
Definition
AI is the science and engineering of making intelligent machines, especially intelligent computer
programs. It is related to the similar task of using computers to understand human intelligence,
but AI does not have to confine itself to methods that are biologically observable.
Intelligence is the computational part of the ability to achieve goals in the world. Varying kinds
and degrees of intelligence occur in people, many animals and some machines.
Objectives
1).To formally define AI.
2).To discuss the character features of AI.
3).To get the student acquainted with the essence of AI.
4).To be able to distinguish betwee the human intelligence and AI.
5).To give an overview of the applications where the AI technology can be used.
6).To import the knowledge about the representation schemes like Production System, Problem
Reduction.
Turing Test
Alan Turing's 1950 article Computing Machinery and Intelligence [Tur50] discussed conditions
for considering a machine to be intelligent. He argued that if the machine could successfully
pretend to be human to a knowledgeable observer then you certainly should consider it
intelligent. This test would satisfy most people but not all philosophers. The observer could
interact with the machine and a human by teletype (to avoid requiring that the machine imitate
the appearance or voice of the person), and the human would try to persuade the observer that it
was human and the machine would try to fool the observer.
2. The Turing test is a one-sided test. A machine that passes the test should certainly be considered
intelligent, but a machine could still be considered intelligent without knowing enough about
humans to imitate a human.
Daniel Dennett's book Brainchildren [Den98] has an excellent discussion of the Turing test and
the various partial Turing tests that have been implemented, i.e. with restrictions on the
observer's knowledge of AI and the subject matter of questioning. It turns out that some people
are easily led into believing that a rather dumb program is intelligent.
Background and History
Evidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with the
development of the electronic computer in 1941, the technology finally became available to
create machine intelligence. The term artificial intelligence was first coined in 1956, at the
Dartmouth conference, and since then Artificial Intelligence has expanded because of the
theories and principles developed by its dedicated researchers. Through its short modern history,
advancement in the fields of AI have been slower than first estimated, progress continues to be
made. From its birth 4 decades ago, there have been a variety of AI programs, and they have
impacted other technological advancements.
In 1941 an invention revolutionized every aspect of the storage and processing of information.
That invention, developed in both the US and Germany was the electronic computer. The first
computers required large, separate air-conditioned rooms, and were a programmers nightmare,
involving the separate configuration of thousands of wires to even get a program running.
The 1949 innovation, the stored program computer, made the job of entering a program easier,
and advancements in computer theory lead to computer science, and eventually Artificial
intelligence. With the invention of an electronic means of processing data, came a medium that
made AI possible.
Although the computer provided the technology necessary for AI, it was not until the early
1950's that the link between human intelligence and machines was really observed. Norbert
Wiener was one of the first Americans to make observations on the principle of feedback theory
feedback theory. The most familiar example of feedback theory is the thermostat: It controls the
temperature of an environment by gathering the actual temperature of the house, comparing it to
the desired temperature, and responding by turning the heat up or down. What was so important
about his research into feedback loops was that Wiener theorized that all intelligent behavior was
the result of feedback mechanisms. Mechanisms that could possibly be simulated by machines.
This discovery influenced much of early development of AI.
In late 1955, Newell and Simon developed The Logic Theorist, considered by many to be the
first AI program. The program, representing each problem as a tree model, would attempt to
solve it by selecting the branch that would most likely result in the correct conclusion. The
impact that the logic theorist made on both the public and the field of AI has made it a crucial
stepping stone in developing the AI field.
3. In 1956 John McCarthy regarded as the father of AI, organized a conference to draw the talent
and expertise of others interested in machine intelligence for a month of brainstorming. He
invited them to Vermont for "The Dartmouth summer research project on artificial intelligence."
From that point on, because of McCarthy, the field would be known as Artificial intelligence.
Although not a huge success, (explain) the Dartmouth conference did bring together the founders
in AI, and served to lay the groundwork for the future of AI research.
In the seven years after the conference, AI began to pick up momentum. Although the field was
still undefined, ideas formed at the conference were re-examined, and built upon. Centers for AI
research began forming at Carnegie Mellon and MIT, and a new challenges were faced: further
research was placed upon creating systems that could efficiently solve problems, by limiting the
search, such as the Logic Theorist. And second, making systems that could learn by themselves.
In 1957, the first version of a new program The General Problem Solver(GPS) was tested. The
program developed by the same pair which developed the Logic Theorist. The GPS was an
extension of Wiener's feedback principle, and was capable of solving a greater extent of common
sense problems. A couple of years after the GPS, IBM contracted a team to research artificial
intelligence. Herbert Gelerneter spent 3 years working on a program for solving geometry
theorems.
While more programs were being produced, McCarthy was busy developing a major
breakthrough in AI history. In 1958 McCarthy announced his new development; the LISP
language, which is still used today. LISP stands for LISt Processing, and was soon adopted as the
language of choice among most AI developers.
During the 1970's Many new methods in the development of AI were tested, notably Minsky's
frames theory. Also David Marr proposed new theories about machine vision, for example, how
it would be possible to distinguish an image based on the shading of an image, basic information
on shapes, color, edges, and texture. With analysis of this information, frames of what an image
might be could then be referenced. another development during this time was the PROLOGUE
language. The language was proposed for In 1972
During the 1980's AI was moving at a faster pace, and further into the corporate sector. In 1986,
US sales of AI-related hardware and software surged to $425 million. Expert systems in
particular demand because of their efficiency. Companies such as Digital Electronics were using
XCON, an expert system designed to program the large VAX computers. DuPont, General
Motors, and Boeing relied heavily on expert systems Indeed to keep up with the demand for the
computer experts, companies such as Teknowledge and Intellicorp specializing in creating
software to aid in producing expert systems formed. Other expert systems were designed to find
and correct flaws in existing expert systems.
Overview of AI Application Areas
Game Playing
4. You can buy machines that can play master level chess for a few hundred dollars. There is some
AI in them, but they play well against people mainly through brute force computation--looking at
hundreds of thousands of positions. To beat a world champion by brute force and known reliable
heuristics requires being able to look at 200 million positions per second.
Speech Recognition
In the 1990s, computer speech recognition reached a practical level for limited purposes. Thus
United Airlines has replaced its keyboard tree for flight information by a system using speech
recognition of flight numbers and city names. It is quite convenient. On the the other hand, while
it is possible to instruct some computers using speech, most users have gone back to the
keyboard and the mouse as still more convenient.
Understanding Natural Language
Just getting a sequence of words into a computer is not enough. Parsing sentences is not enough
either. The computer has to be provided with an understanding of the domain the text is about,
and this is presently possible only for very limited domains.
Computer Vision
The world is composed of three-dimensional objects, but the inputs to the human eye and
computers' TV cameras are two dimensional. Some useful programs can work solely in two
dimensions, but full computer vision requires partial three-dimensional information that is not
just a set of two-dimensional views. At present there are only limited ways of representing three-
dimensional information directly, and they are not as good as what humans evidently use.
Expert Systems
A "knowledge engineer" interviews experts in a certain domain and tries to embody their
knowledge in a computer program for carrying out some task. How well this works depends on
whether the intellectual mechanisms required for the task are within the present state of AI.
When this turned out not to be so, there were many disappointing results. One of the first expert
systems was MYCIN in 1974, which diagnosed bacterial infections of the blood and suggested
treatments. It did better than medical students or practicing doctors, provided its limitations were
observed. Namely, its ontology included bacteria, symptoms, and treatments and did not include
patients, doctors, hospitals, death, recovery, and events occurring in time. Its interactions
depended on a single patient being considered. Since the experts consulted by the knowledge
engineers knew about patients, doctors, death, recovery, etc., it is clear that the knowledge
engineers forced what the experts told them into a predetermined framework. In the present state
of AI, this has to be true. The usefulness of current expert systems depends on their users having
common sense.
Heuristic Classification
5. One of the most feasible kinds of expert system given the present knowledge of AI is to put some
information in one of a fixed set of categories using several sources of information. An example
is advising whether to accept a proposed credit card purchase. Information is available about the
owner of the credit card, his record of payment and also about the item he is buying and about
the establishment from which he is buying it (e.g., about whether there have been previous credit
card frauds at this establishment).
Production System
Production systems are applied to problem solving programs that must perform a wide-range of
seaches. Production ssytems are symbolic AI systems. The difference between these two terms is
only one of semantics. A symbolic AI system may not be restricted to the very definition of
production systems, but they can't be much different either.
Production systems are composed of three parts, a global database, production rules and a control
structure.
The global database is the system's short-term memory. These are collections of facts that are to
be analyzed. A part of the global database represents the current state of the system's
environment. In a game of chess, the current state could represent all the positions of the pieces
for example.
Production rules (or simply productions) are conditional if-then branches. In a production system
whenever a or condition in the system is satisfied, the system is allowed to execute or perform a
specific action which may be specified under that rule. If the rule is not fufilled, it may perform
another action. This can be simply paraphrased:
WHEN (condition) IS SATISFIED, PERFORM (action)
A Production System Algorithm
DATA (binded with initial global data base)
when DATA satisfies the halting condition do
begin
select some rule R that can be applied to DATA
return DATA (binded with the result of when R was applied to DATA)
end
Types of Production System
There are two basic types of production System:
• Commutative Production System
• Decomposable Production System
Commutative Production System
6. A production system is commutative if it has the following properties with respect to a database
D:
1. Each member of the set of rules applicable to D is also applicable to any database produced by
applying an applicable rule to D.
2. If the goal condition is satisfied by D, then it is also satisfied by any database produced by
applying any applicable rule to D.
3. The database that results by applying to D any sequence composed of rules that are applicable
to D is invariant under permutations of the sequence.
Decomposable Production System
Initial database can be decomposed or split into separate components that can be processed
independently.
Search Process
Searching is defined as a sequence of steps that transforms the initial state to the goal state. To
do a search process, the following are needed:
• The initial state description of the problem
• A set of legal operators that changes the state.
• The final or goal state.
The searching process in AI can be classified into two types:
1. Uniformed Search/ Blind Search
2. Heuristic Search/ Informed Search
Uniformed/ Blind Search
A uniformed search algorithm is one that do not have any domain specific knowledge. They use
information like initial state, final state and a set of logical operators. this search shoul proceed in
a systemic way by exploring nodes in some predetermined orders. It can be classified in to two
search technologies:
1. Breadth First search
2. Depth First Search
7. Depth First Search !
Depth first search works by taking a node, checking its neighbors, expanding the first node it
finds among the neighbors, checking if that expanded node is our destination, and if not,
continue exploring more nodes.
The above explanation is probably confusing if this is your first exposure to depth first search. I
hope the following demonstration will help more. Using our same search tree, let's find a path
between nodes A and F:
Step 0
Let's start with our root/goal node:
We will be using two lists to keep track of what we are doing - an Open list and a Closed List.
An Open list keeps track of what you need to do, and the Closed List keeps track of what you
have already done. Right now, we only have our starting point, node A. We haven't done
anything to it yet, so let's add it to our Open list.
Open List: A
Closed List: <empty>
Step 1
8. Now, let's explore the neighbors of our A node. To put another way, let's take the first item from
our Open list and explore its neighbors:
Node A's neighbors are the B and C nodes. Because we are now done with our A node, we can
remove it from our Open list and add it to our Closed List. You aren't done with this step though.
You now have two new nodes B and C that need exploring. Add those two nodes to our Open
list.
Our current Open and Closed Lists contain the following data:
Open List: B, C
Closed List: A
Step 2
Our Open list contains two items. For depth first search and breadth first search, you always
explore explore the first item from our Open list. The first item in our Open list is the B node. B
is not our destination, so let's explore its neighbors:
Because I have now expanded B, I am going to remove it from the Open list and add it to the
Closed List. Our new nodes are D and E, and we add these nodes to the beginning of our Open
list:
Open List: D, E, C
Closed List: A, B
Step 3
You should start to see a pattern forming. Because D is at the beginning of our Open List, we
expand it. D isn't our destination, and it does not contain any neighbors. All you do in this step is
remove D from our Open List and add it to our Closed List:
9. Open List: E, C
Closed List: A, B, D
Step 4
We now expand the E node from our Open list. E is not our destination, so we explore its
neighbors and find out that it contains the neighbors F and G. Remember, F is our target, but we
don't stop here though. Despite F being on our path, we only end when we are about to expand
our target Node - F in this case:
Our Open list will have the E node removed and the F and G nodes added. The removed E node
will be added to our Closed List:
Open List: F, G, C
Closed List: A, B, D, E
Step 5
We now expand the F node. Since it is our intended destination, we stop:
We remove F from our Open list and add it to our Closed List. Since we are at our destination,
there is no need to expand F in order to find its neighbors. Our final Open and Closed Lists
contain the following data:
10. Open List: G, C
Closed List: A, B, D, E, F
The final path taken by our depth first search method is what the final value of our Closed List
is: A, B, D, E, F.
Breadth First Search
In depth first search, newly explored nodes were added to the beginning of your Open list. In
breadth first search, newly explored nodes are added to the end of your Open list.
For example, here is our original search tree:
The above explanation is probably confusing if this is your first exposure to depth first search. I
hope the following demonstration will help more. Using our same search tree, let's find a path
between nodes A and F:
Step 0
Let's start with our root/goal node:
We will be using two lists to keep track of what we are doing - an Open list and a Closed List.
An Open list keeps track of what you need to do, and the Closed List keeps track of what you
have already done. Right now, we only have our starting point, node A. We haven't done
anything to it yet, so let's add it to our Open list.
11. Open List: A
Closed List: <empty>
Step 1
Now, let's explore the neighbors of our A node. To put another way, let's take the first item from
our Open list and explore its neighbors:
Node A's neighbors are the B and C nodes. Because we are now done with our A node, we can
remove it from our Open list and add it to our Closed List. You aren't done with this step though.
You now have two new nodes B and C that need exploring. Add those two nodes to our Open
list.
Our current Open and Closed Lists contain the following data:
Open List: B, C
Closed List: A
Step 2
Our Open list contains two items. For depth first search and breadth first search, you always
explore explore the first item from our Open list. The first item in our Open list is the B node. B
is not our destination, so let's explore its neighbors:
Because I have now expanded B, I am going to remove it from the Open list and add it to the
Closed List. Our new nodes are D and E, and we add these nodes to the beginning of our Open
list:
Open List: D, E, C
Closed List: A, B
12. Step 3
You should start to see a pattern forming. Because D is at the beginning of our Open List, we
expand it. D isn't our destination, and it does not contain any neighbors. All you do in this step is
remove D from our Open List and add it to our Closed List:
Open List: E, C
Closed List: A, B, D
Step 4
We now expand the E node from our Open list. E is not our destination, so we explore its
neighbors and find out that it contains the neighbors F and G. Remember, F is our target, but we
don't stop here though. Despite F being on our path, we only end when we are about to expand
our target Node - F in this case:
Our Open list will have the E node removed and the F and G nodes added. The removed E node
will be added to our Closed List:
Open List: F, G, C
Closed List: A, B, D, E
Step 5
We now expand the F node. Since it is our intended destination, we stop:
13. We remove F from our Open list and add it to our Closed List. Since we are at our destination,
there is no need to expand F in order to find its neighbors. Our final Open and Closed Lists
contain the following data:
Open List: G, C
Closed List: A, B, D, E, F
The final path taken by our depth first search method is what the final value of our Closed List
is: A, B, D, E, F.
iterative Deepening Depth-First Search !
Iterative deepening depth-first search (IDDFS) is a state space search strategy in which a depth-
limited search is run repeatedly, increasing the depth limit with each iteration until it reaches d,
the depth of the shallowest goal state. On each iteration, IDDFS visits the nodes in the search
tree in the same order as depth-first search, but the cumulative order in which nodes are first
visited, assuming no pruning, is effectively breadth-first.
IDDFS combines depth-first search's space-efficiency and breadth-first search's completeness
(when the branching factor is finite). It is optimal when the path cost is a non-decreasing
function of the depth of the node.
The space complexity of IDDFS is O(bd), where b is the branching factor and d is the depth of
shallowest goal. Since iterative deepening visits states multiple times, it may seem wasteful, but
it turns out to be not so costly, since in a tree most of the nodes are in the bottom level, so it does
not matter much if the upper levels are visited multiple times.
The main advantage of IDDFS in game tree searching is that the earlier searches tend to improve
the commonly used heuristics, such as the killer heuristic and alpha-beta pruning, so that a more
accurate estimate of the score of various nodes at the final depth search can occur, and the search
completes more quickly since it is done in a better order. For example, alpha-beta pruning is
most efficient if it searches the best moves first.
A second advantage is the responsiveness of the algorithm. Because early iterations use small
values for d, they execute extremely quickly. This allows the algorithm to supply early
indications of the result almost immediately, followed by refinements as d increases. When used
14. in an interactive setting, such as in a chess-playing program, this facility allows the program to
play at any time with the current best move found in the search it has completed so far. This is
not possible with a traditional depth-first search.
The time complexity of IDDFS in well-balanced trees works out to be the same as Depth-first
search: O(bd).
In an iterative deepening search, the nodes on the bottom level are expanded once, those on the
next to bottom level are expanded twice, and so on, up to the root of the search tree, which is
expanded d + 1 times.[1] So the total number of expansions in an iterative deepening search is
All together, an iterative deepening search from depth 1 to depth d expands only about 11%
more nodes than a single breadth-first or depth-limited search to depth d, when b = 10. The
higher the branching factor, the lower the overhead of repeatedly expanded states, but even when
the branching factor is 2, iterative deepening search only takes about twice as long as a complete
breadth-first search. This means that the time complexity of iterative deepening is still O(bd), and
the space complexity is O(bd). In general, iterative deepening is the preferred search method
when there is a large search space and the depth of the solution is not known.
Informed Search
It is not difficult to see that uninformed search will pursue options that lead away from the goal
as easily as it pursues options that lead to wards the goal. For any but the smallest problems this
leads to searches that take unacceptable amounts of time and/or space. Informed search tries to
reduce the amount of search that must be done by making intelligent choices for the nodes that
are selected for expansion. This implies the existence of some way of evaluating the likelyhood
that a given node is on the solution path. In general this is done using a heuristic function.
Hill Climbing
Hill climbing is a mathematical optimization technique which belongs to the family of local
search. It is relatively simple to implement, making it a popular first choice. Although more
advanced algorithms may give better results, in some situations hill climbing works just as well.
Hill climbing can be used to solve problems that have many solutions, some of which are better
than others. It starts with a random (potentially poor) solution, and iteratively makes small
15. changes to the solution, each time improving it a little. When the algorithm cannot see any
improvement anymore, it terminates. Ideally, at that point the current solution is close to optimal,
but it is not guaranteed that hill climbing will ever come close to the optimal solution.
For example, hill climbing can be applied to the traveling salesman problem. It is easy to find a
solution that visits all the cities but will be very poor compared to the optimal solution. The
algorithm starts with such a solution and makes small improvements to it, such as switching the
order in which two cities are visited. Eventually, a much better route is obtained.
Hill climbing is used widely in artificial intelligence, for reaching a goal state from a starting
node. Choice of next node and starting node can be varied to give a list of related algorithms.
Mathematical description
Hill climbing attempts to maximize (or minimize) a function f(x), where x are discrete states.
These states are typically represented by vertices in a graph, where edges in the graph encode
nearness or similarity of a graph. Hill climbing will follow the graph from vertex to vertex,
always locally increasing (or decreasing) the value of f, until a local maximum (or local
minimum) xm is reached. Hill climbing can also operate on a continuous space: in that case, the
algorithm is called gradient ascent (or gradient descent if the function is minimized).*.
Variants
In simple hill climbing, the first closer node is chosen, whereas in steepest ascent hill climbing
all successors are compared and the closest to the solution is chosen. Both forms fail if there is
no closer node, which may happen if there are local maxima in the search space which are not
solutions. Steepest ascent hill climbing is similar to best-first search, which tries all possible
extensions of the current path instead of only one.
Stochastic hill climbing does not examine all neighbors before deciding how to move. Rather, it
selects a neighbour at random, and decides (based on the amount of improvement in that
neighbour) whether to move to that neighbour or to examine another.
16. Random-restart hill climbing is a meta-algorithm built on top of the hill climbing algorithm. It is
also known as Shotgun hill climbing. It iteratively does hill-climbing, each time with a random
initial condition x0. The best xm is kept: if a new run of hill climbing produces a better xm than
the stored state, it replaces the stored state.
Random-restart hill climbing is a surprisingly effective algorithm in many cases. It turns out that
it is often better to spend CPU time exploring the space, than carefully optimizing from an initial
condition.
Local Maxima
A problem with hill climbing is that it will find only local maxima. Unless the heuristic is
convex, it may not reach a global maximum. Other local search algorithms try to overcome this
problem such as stochastic hill climbing, random walks and simulated annealing.
Ridges
A ridge is a curve in the search place that leads to a maximum, but the orientation of the ridge
compared to the available moves that are used to climb is such that each move will lead to a
smaller point. In other words, each point on a ridge looks to the algorithm like a local maximum,
even though the point is part of a curve leading to a better optimum.
Plateau
Another problem with hill climbing is that of a plateau, which occurs when we get to a "flat" part
of the search space, i.e. we have a path where the heuristics are all very close together. This kind
of flatness can cause the algorithm to cease progress and wander aimlessly.
Pseudocode
Hill Climbing Algorithm
currentNode = startNode;
loop do
L = NEIGHBORS(currentNode);
17. nextEval = -INF;
nextNode = NULL;
for all x in L
if (EVAL(x) > nextEval)
nextNode = x;
nextEval = EVAL(x);
if nextEval <= EVAL(currentNode)
//Return current node since no better neighbors exist
return currentNode;
currentNode = nextNode;
Best-First Search
Best-first search is a search algorithm which explores a graph by expanding the most promising
node chosen according to a specified rule.
Judea Pearl described best-first search as estimating the promise of node n by a "heuristic
evaluation function f(n) which, in general, may depend on the description of n, the description of
the goal, the information gathered by the search up to that point, and most important, on any
extra knowledge about the problem domain."
Some authors have used "best-first search" to refer specifically to a search with a heuristic that
attempts to predict how close the end of a path is to a solution, so that paths which are judged to
be closer to a solution are extended first. This specific type of search is called greedy best-first
search.
Efficient selection of the current best candidate for extension is typically implemented using a
priority queue.
Examples of best-first search algorithms include the A* search algorithm, and in turn, Dijkstra's
algorithm (which can be considered a specialization of A*). Best-first algorithms are often used
for path finding in combinatorial search.
Code
open = initial state
while open != null
do
1. Pick the best node on open.
2. Create open's successors
3. For each successor do:
a. If it has not been generated before: evaluate it, add it to OPEN, and record its parent
b. Otherwise: change the parent if this new path is better than previous one.
done
18. Syntax of Propositional Logic
Logic is used to represent properties of objects in the world about which we are going to reason.
When we say Miss Piggy is plump we are talking about the object Miss Piggy and a property
plump. Similarly when we say Kermit's voice is high-pitched then the object is Kermit's voice
and the property is high-pitched. It is normal to write these in logic as:
plump(misspiggy)
highpitched(voiceof(kermit))
So misspiggy and kermit are constants representing objects in our domain. Notice that plump and
highpitched is different from voiceof:
plump and highpitched are represent properties and so are boolean valued functions. They are
often called predicates or relations.
voiceof is a function that returns an object (not true/false). To help us differentiate we shall use
``of'' at the end of a function name.
The predicates plump and highpitched are unary predicates but of course we can have binary or
n-ary predicates; e.g. loves(misspiggy, voiceof(kermit))
Simple Sentences
The fundamental components of logic are
• object constants; e.g. misspiggy, kermit
• function constants; e.g. voiceof
• predicate constants; e.g. plump, highpitched, loved
Predicate and function constants take arguments which are objects in our domain. Predicate
constants are used to describe relationships concerning the objects and return the value true/false.
Function constants return values that are objects.
More Complex Sentences
We need to apply operators to construct more complex sentences from atoms.
Negation
applied to an atom negates the atom:
19. loves(kermit, voiceof(misspiggy))
'Kermit does not love Miss Piggy's voice''
Conjunction
combines two conjuncts:
loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))
''Miss Piggy loves Kermit and Miss Piggy loves Kernit's voice''
Notice it is not correct syntax to write in logic
loves(misspiggy, kermit) voiceof(kermit)
because we have tried to conjoin a sentence (truth valued) with an object. Logic operators must
apply to truth-valued sentences.
Disjunction
combines two disjuncts:
loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))
''Miss Piggy loves Kermit or Miss Piggy loves Kermit's voice''
Implication
combines a condition and conclusion
loves(misspiggy, voiceof(kermit)) loves(misspiggy, kermit)
''If Miss Piggy loves Kermit's voice then Miss Piggy loves Kermit''
The language we have described so far contains atoms and the connectives , , and .
This defines the syntax of propositional Logic. It is normal to represent atoms in propositional
logic as single upper-case letters but here we have used a more meaningful terminology for the
atoms that extends easily to Predicate Logic.
Semantics of Propositional Logic
We have defined the syntax of propositional Logic. However, this is of no use without talking
about the meaning, or semantics, of the sentences. Suppose our logic contained only atoms; e.g.
no logical connectives. This logic is very silly because any subset of these atoms is consistent;
e.g. beautiful(misspiggy) and ugly(misspiggy) are consistent because we cannot represent
20. ugly(misspiggy) beautiful(misspiggy) So we now need a way in our logic to define which
sentences are true.
Example: Models Define Truth
Suppose a language contains only one object constant misspiggy and two relation constants ugly
and beautiful. The following models define different facts about Miss Piggy.
M=ø: In this model Miss Piggy is neither ugly nor beautiful.
M={ugly(misspiggy)}: In this model Miss Piggy is ugly and not beautiful.
M={beautiful(misspiggy)}: In this model Miss Piggy is beautiful and not ugly.
M={ugly(misspiggy), beautiful(misspiggy)}: In this model Miss Piggy is both ugly and
beautiful. The last statement is intuitively wrong but the model selected commits the truth of the
atoms in the language.
Compound Sentences
So far we have restricted our attention to the semantics of atoms: an atom is true if it is a member
of the model M; otherwise it is false. Extending the semantics to compound sentences is easy.
Notice that in the definitions below p and q do not need to be atoms because these definitions
work recursively until atoms are reached.
Conjunction
p q is true in M iff p and q are true in M individually.
So the conjunct
loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))
is true only when both
Miss Piggy loves Kermit; and
Miss Piggy loves Kermit's voice
Disjunction
p q is true in M iff at least one of p or q is true in M.
So the disjunct
loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))
is true whenever
21. Miss Piggy loves Kermit;
Miss Piggy loves Kermit's voice; or
Miss Piggy loves both Kermit and his voice.
Therefore the disjunction is weaker than either disjunct and the conjunction of these disjuncts.
Negation
p is true in M iff p is not true in M.
Implication
p q is true in M iff p is not true in M or q is true in M.
We have been careful about the definition of . When people use an implication p q they
normally imply that p causes q. So if p is true we are happy to say that p q is true iff q is true.
But if p is false the causal link causes confusion because we can't tell whether q should be true or
not. Logic requires that the connectives are truth functional and so the truth of the compound
sentence must be determined from the truth of its component parts. Logic defines that if p is false
then p q is true regardless of the truth of q.
So both of the following implications are true (provided you believe pigs do not fly!):
fly(pigs) beautiful(misspiggy)
fly(pigs) beautiful(misspiggy)
Example: Implications and Models
In which of the following models is
ugly(misspiggy) beautiful(misspiggy) true?
M=Ø
Miss Piggy is not ugly and so the antecedent fails. Therefore the implication holds. (Miss Piggy
is also not beautiful in this model.)
M={beautiful(misspiggy)}
Again, Miss Piggy is not ugly and so the implication holds.
M={ugly(misspiggy)}
Miss Piggy is not beautiful and so the conclusion is valid and hence the implication holds.
M={ugly(misspiggy), beautiful(misspiggy)}
22. Miss Piggy is ugly and so the antecedent holds. But she is also beautiful and so
beautiful(misspiggy) is not true. Therefore the conclusion does not hold and so the implication
fails in this (and only this) case.
Truth Tables
Truth tables are often used to calculate the truth of complex propositional sentences. A truth
table represents all possible combinations of truths of the atoms and so contains all possible
models. A column is created for each of the atoms in the sentence, and all combinations of truth
values for these atoms are assigned one per row. So if there are $n$ atoms then there are $n$
initial columns and $2^n$ rows. The final column contains the truth of the sentence for each
combination of truths for the atoms. Intervening columns can be added to store intermediate truth
calculations. Below are two sample truth tables:
Equivalence
Two sentences are equivalence if they hold in exactly the same models.
Therefore we can determine equivalence by drawing truth tables that represent the sentences in
the various models. If the initial and final columns of the truth tables are identical then the
sentences are equivalent. Examples of equivalences include:
23. Unlike and , is not commutative:
loves(misspiggy, voiceof(kermit)) loves(misspiggy, kermit)
is very different from
loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))
Similarly is not associative. )
Syntax & Semantics for Predicate Logic
Syntax of Predicate Logic
Propositional logic is fairly powerful but we must add variables and quantification to be able to
reason about objects in atoms and express properties of a set of objects without listing the atom
corresponding to each object.
We shall adopt the Prolog convention that variables have an initial capital letter. (This is contrary
to many Mathematical Logic books where variables are lower case and constants have an initial
capital.)
When we include variables we must specify their scope or quantification. The first quantifier we
want is the universal quantifier (for all).
X.loves(misspiggy, X)
This allows X to range over all the objects and asserts that Miss Piggy loves each of them. We
have introduced one variable but any number is allowed:
XY.loves(X, Y)
Each of the objects love all of the objects, even itself! Therefore XY. is the same as X. Y.
Quantifiers, like connectives, act on sentences. So if Miss Piggy loves all cute things (not just
Kermit!) we would write
C.[cute(C) -> loves(misspiggy, C)]
rather than
loves(misspiggy, C.cute(C))
because the second argument to loves must be an object, not a sentence.
24. When the world contains a finite set of objects then a universally quantified sentence can be
converted into a sentence without the quantifier; e.g. X.loves(misspiggy, X) becomes
loves(misspiggy, misspiggy) loves(misspiggy, kermit)
loves(misspiggy, animal) ...
Contrast this with the infinite set of positive integers and the sentence
N.[odd(N) $vee$ even(N)]
The other quantifier is the existential quantifier (there exists).
X.loves(misspiggy, X)
This allows X to range over all the objects and asserts that Miss Piggy loves (at least) one of
them. Similarly
XY.loves(X, Y)
asserts that there is at least one loving couple (or self-loving object).
We shall be using First Order Predicate Logic where quantified variables range over object
constants only. We are defining Second Order Predicate Logic if we allow quantified variables to
range over functions or predicates as well; e.g.
X.loves(misspiggy, X(kermit)) includes loves(misspiggy, voiceof(kermit))
X.X(misspiggy, kermit) (there exists some relationship linking Miss Piggy and Kermit!)
Semantics of First Order Predicate Logic
Now we must deal with quantification.
:
X.p(X) holds in a model iff $p(z)$ holds for all objects $z$ in our domain.
:
X.p(X) holds in a model iff there is some object z in our domain so that p(z) holds.
Example: Available Objects affects Quantification
If misspiggy is the only object in our domain then
ugly(misspiggy) beautiful(misspiggy) is equivalent to
25. X.ugly(X) beautiful(X)
If there were other objects then there would be more atoms and so the set of models would be
larger; e.g. with objects misspiggy and kermit the possible models are all combinations of the
atoms ugly(misspiggy), beautiful(misspiggy) ugly(kermit), beautiful(kermit). Now the 2
sentences are no longer equivalent.
1). Although, every model in which
X.ugly(X) beautiful(X) holds,
ugly(misspiggy) beautiful(misspiggy) also holds
2).There are models in which ugly(misspiggy) beautiful(misspiggy) holds,
but X.ugly(X) beautiful(X) does not hold; e.g.
M = {ugly(kermit), beautiful(kermit)}.
What about M = {ugly(misspiggy)}, beautiful(misspiggy)?
Clausal Form for Predicate Calculus !
In order to prove a formula in the predicate calculus by resolution,we
1.Negate the formula.
2.Put the negated formula into CNF, by doing the following:
i.Get rid of all operators.
ii.Push the operators in as far as possible.
iii.Rename variables as necessary (see the step below).
iv.Move all of the quantifiers to the left (the outside) of the expression using the following rules
(where Q is either or and G is a formula that does not contain x):
26. This leaves the formula in what is called prenex form which consists of a series of quantifiers
followed by a quantifier-free formula, called the matrix.
v.Remove all quantifiers from the formula. First we remove the existentially quantified variables
by using Skolemization. Each existentially quantified variable, say x is replaced by a function
term which begins with a new, n-ary function symbol, say f where n is the number of universally
quantified variables that occur before x is quantified in the formula. The arguments to the
function term are precisely these variables. For example, if we have the formula
then z would be replaced by a function term f(x,y) where f is a new function symbol. The result
is:
This new formula is satisfiable if and only if the original formula is satisfiable.
The new function symbol is called a Skolem function. If the existentially quantified variable has
no preceding universally quantified variables, then the function is a 0-ary function and is often
called a Skolem constant.
After removing all existential quantifiers, we simply drop all the universal quantifiers as we
assume that any variable appearing in a formula is universally quantified.
vi.The remaining formula (the matrix) is put in CNF by moving any operators outside of any
operations.
3.Finally, the CNF formula is written in clausal format by writing each conjunct as a set of
literals (a clause), and the whole formula as a set clauses (the clause set).
For example, if we begin with the proposition
27. we have:
1.Negate the theorem:
i.Push the operators in. No change.
ii).Rename variables if necessary:
iii)Move the quantifiers to the outside: First, we have
Then we get
iv)Remove the quantifiers, first by Skolemizing the existentially quantified variables. As these
have no universally quantified variables to their left, they are replaced by Skolem constants:
Drop the universal quantifiers:
v)Put the matrix into CNF. No change.
2.Write the formula in clausal form:
Inference Rules !
Complex deductive arguments can be judged valid or invalid based on whether or not the steps in
that argument follow the nine basic rules of inference. These rules of inference are all relatively
simple, although when presented in formal terms they can look overly complex.
Conjunction:
1. P
2. Q
3. Therefore, P and Q.
1. It is raining in New York.
2. It is raining in Boston
3. Therefore, it is raining in both New York and Boston
28. Simplification
1. P and Q.
2. Therefore, P.
1. It is raining in both New York and Boston.
2. Therefore, it is raining in New York.
Addition
1. P
2. Therefore, P or Q.
1. It is raining
2. Therefore, either either it is raining or the sun is shining.
Absorption
1. If P, then Q.
2. Therfore, If P then P and Q.
1. If it is raining, then I will get wet.
2. Therefore, if it is raining, then it is raining and I will get wet.
Modus Ponens
1. If P then Q.
2. P.
3. Therefore, Q.
1. If it is raining, then I will get wet.
2. It is raining.
3. Therefore, I will get wet.
Modus Tollens
1. If P then Q.
2. Not Q. (~Q).
3. Therefore, not P (~P).
1. If it had rained this morning, I would have gotten wet.
2. I did not get wet.
3. Therefore, it did not rain this morning.
Hypothetical Syllogism
29. 1. If P then Q.
2. If Q then R.
3. Therefore, if P then R.
1. If it rains, then I will get wet.
2. If I get wet, then my shirt will be ruined.
3. If it rains, then my shirt will be ruined.
Disjunctive Syllogism
1. Either P or Q.
2. Not P (~P).
3. Therefore, Q.
1. Either it rained or I took a cab to the movies.
2. It did not rain.
3. Therefore, I took a cab to the movies.
Constructive Dilemma
1. (If P then Q) and (If R then S).
2. P or R.
3. Therefore, Q or S.
1. If it rains, then I will get wet and if it is sunny, then I will be dry.
2. Either it will rain or it will be sunny.
3. Therefore, either I will get wet or I will be dry.
The above rules of inference, when combined with the rules of replacement, mean that
propositional calculus is "complete." Propositional calculus is simply another name for formal
logic.
Resolution !
Resolution is a rule of inference leading to a refutation theorem-proving technique for sentences
in propositional logic and first-order logic. In other words, iteratively applying the resolution rule
in a suitable way allows for telling whether a propositional formula is satisfiable and for proving
that a first-order formula is unsatisfiable; this method may prove the satisfiability of a first-order
satisfiable formula, but not always, as it is the case for all methods for first-order logic.
Resolution was introduced by John Alan Robinson in 1965.
Resolution in propositional logic
The resolution rule in propositional logic is a single valid inference rule that produces a new
clause implied by two clauses containing complementary literals. A literal is a propositional
variable or the negation of a propositional variable. Two literals are said to be complements if
30. one is the negation of the other (in the following, ai is taken to be the complement to bj). The
resulting clause contains all the literals that do not have complements. Formally:
where
all as and bs are literals,
ai is the complement to bj, and
the dividing line stands for entails
The clause produced by the resolution rule is called the resolvent of the two input clauses.
When the two clauses contain more than one pair of complementary literals, the resolution rule
can be applied (independently) for each such pair. However, only the pair of literals that are
resolved upon can be removed: all other pair of literals remain in the resolvent clause.
A resolution technique
When coupled with a complete search algorithm, the resolution rule yields a sound and complete
algorithm for deciding the satisfiability of a propositional formula, and, by extension, the validity
of a sentence under a set of axioms.
This resolution technique uses proof by contradiction and is based on the fact that any sentence
in propositional logic can be transformed into an equivalent sentence in conjunctive normal
form. The steps are as follows:
1).All sentences in the knowledge base and the negation of the sentence to be proved (the
conjecture) are conjunctively connected.
2).The resulting sentence is transformed into a conjunctive normal form with the conjuncts
viewed as elements in a set, S, of clauses.
For example
would give rise to a set
3).The resolution rule is applied to all possible pairs of clauses that contain complementary
literals. After each application of the resolution rule, the resulting sentence is simplified by
removing repeated literals. If the sentence contains complementary literals, it is discarded (as a
31. tautology). If not, and if it is not yet present in the clause set S, it is added to S, and is considered
for further resolution inferences.
4).If after applying a resolution rule the empty clause is derived, the complete formula is
unsatisfiable (or contradictory), and hence it can be concluded that the initial conjecture follows
from the axioms.
5).If, on the other hand, the empty clause cannot be derived, and the resolution rule cannot be
applied to derive any more new clauses, the conjecture is not a theorem of the original
knowledge base.
One instance of this algorithm is the original Davis–Putnam algorithm that was later refined into
the DPLL algorithm that removed the need for explicit representation of the resolvents.
This description of the resolution technique uses a set S as the underlying data-structure to
represent resolution derivations. Lists, Trees and Directed Acyclic Graphs are other possible and
common alternatives. Tree representations are more faithful to the fact that the resolution rule is
binary. Together with a sequent notation for clauses, a tree representation also makes it clear to
see how the resolution rule is related to a special case of the cut-rule, restricted to atomic cut-
formulas. However, tree representations are not as compact as set or list representations, because
they explicitly show redundant subderivations of clauses that are used more than once in the
derivation of the empty clause. Graph representations can be as compact in the number of clauses
as list representations and they also store structural information regarding which clauses were
resolved to derive each resolvent.
Example
In English: if a or b is true, and a is false or c is true, then either b or c is true.
If a is true, then for the second premise to hold, c must be true. If a is false, then for the first
premise to hold, b must be true.
So regardless of a, if both premises hold, then b or c is true.
Unification
We also need some way of binding variables to values in a consistent way so that components of
sentences can be matched. This is the process of Unification.
Knowledge Representation
32. Network Representations
Networks are often used in artificial intelligence as schemes for representation. One of the
advantages of using a network representation is that theorists in computer science have studied
such structures in detail and there are a number of efficient and robust algorithms that may be
used to manipulate the representations.
Trees and Graphs
A tree is a collection of nodes in which each node may be expanded into one or more unique
subnodes until termination occurs. There may be no termination and an infinite tree results. A
graph is simply a tree in which non-unique nodes are generated; in other words, a tree is a graph
with no loops. The representation of the nodes and links is arbitrary. In a computer chess player,
for example, nodes might represent individual board positions and the links from each node the
legal moves from that position. This is a specific instance of a problem space. In general,
problem spaces are graphs in which the nodes represent states and the connections between states
represented by an operator that makes the state transformation.
IS-A Links and Semantic Networks
In constructing concept hierarchies, often the most important means of showing inclusion in a set
is to use what is called an IS-A link, in which X is a member in some more general set Y. For
example, a DOG ISA MAMMAL. As one travels up the link, the more general concept is
defined. This is generally the simplest type of link between concepts in concept or semantic
hierarchies. The combination of instances and classes connected by ISA links in a graph or tree
is generally known as a semantic network. Semantic networks are useful, in part, because they
provide a natural structure for inheritance. For instance, if a DOG ISA MAMMAL then those
properties that are true for MAMMALs and DOGs need not be specified for the DOG; instead
they may be derived via an inheritance procedure. This greatly reduces the amount of
information that must be stored explicitly although there is an increase in the time required to
access knowledge through the inheritance mechanism. Frames are a special type of semantic
network representation.
Associative Network
A means of representing relational knowledge as a labeled directed graph. Each vertex of the
graph represents a concept and each label represents a relation between concepts. Access and
updating procedures traverse and manipulate the graph. A semantic network is sometimes
regarded as a graphical notation for logical formulas.
Conceptual Graphs !
A conceptual graph (CG) is a graph representation for logic based on the semantic networks of
artificial intelligence.
33. A conceptual graph consists of concept nodes and relation nodes.
• The concept nodes represent entities, attributes, states, and events
• The relation nodes show how the concepts are interconnected
Conceptual Graphs are finite, connected, bipartite graphs.
Finite: because any graph (in 'human brain' or 'computer storage') can only have a finite number
of concepts and conceptual relations.
Connected: because two parts that are not connected would simply be called two conceptual
graphs.
Bipartite: because there are two different kinds of nodes: concepts and conceptual relations, and
every arc links a node of one kind to a node of another kind
Example
Following CG display form for John is going to Boston by bus.
The conceptual graph in Figure represents a typed or sorted version of logic. Each of the four
concepts has a type label, which represents the type of entity the concept refers to: Person, Go,
Boston, or Bus. Two of the concepts have names, which identify the referent: John or Boston.
Each of the three conceptual relations has a type label that represents the type of relation: agent
(Agnt), destination (Dest), or instrument (Inst). The CG as a whole indicates that the person John
is the agent of some instance of going, the city Boston is the destination, and a bus is the
instrument. Figure 1 can be translated to the following formula:
As this translation shows, the only logical operators used in Figure are conjunction and the
existential quantifier. Those two operators are the most common in translations from natural
languages, and many of the early semantic networks could not represent any others.
34. Structured Representation
Structure representation can be done in various ways like:
• Frames
• Scripts
Frames
A frame is a method of representation in which a particular class is defined by a number of
attributes (or slots) with certain values (the attributes are filled in for each instance). Thus,
frames are also known as slot-and-filler structures. Frame systems are also somewhat equivalent
to semantic networks although frames are usually associated with more defined structure than the
networks.
Like a semantic network, one of the chief properties of frames is that they provide a natural
structure for inheritance. ISA-Links connect classes to larger parent classes and properties of the
subclasses may be determined at both the level of the class itself and from parent classes.
This leads into the idea of defaults. Frames may indicate specific values for some attributes or
instead indicate a default. This is especially useful when values are not always known but can
generally be assumed to be true for most of the class. For example, the class BIRD may have a
default value of FLIES set to TRUE even though instances below it (say, for example, an
OSTRICH) have FLIES values of FALSE.
In addition, the values of particular attribute need not necessarily be filled with a value but may
also indicate a procedure to run to obtain a value. This is known as an attached procedure.
Attached procedures are especially useful when there is a high cost associated with computing a
particular value, when the value changes with time or when the expected access frequency is
low. Instead of computing the value for each instance, the values are computed only when
needed. However, this computation is run during execution (rather than during the establishment
of the frame network) and may be costly.
Scripts
A script is a remembered precedent, consisting of tightly coupled, expectation-suggesting
primitive-action and state-change frames.
A script is a structured representation describing a stereotyped sequence of events in a particular
context. That is, extend frames by explicitly representing expectations of actions and state
changes.
Why represent knowledge in this way?
1) Because real-world events do follow stereotyped patterns. Human beings use previous
experiences to understand verbal accounts; computers can use scripts instead.
35. 2) Because people, when relating events, do leave large amounts of assumed detail out of their
accounts. People don't find it easy to converse with a system that can't fill in missing
conversational detail.
Min Max Algorithm
There are plenty of applications for AI, but games are the most interesting to the public.
Nowadays every major OS comes with some games. So it is no surprise that there are some
algorithms that were devised with games in mind.
The Min-Max algorithm is applied in two player games, such as tic-tac-toe, checkers, chess, go,
and so on. All these games have at least one thing in common, they are logic games. This means
that they can be described by a set of rules and premisses. With them, it is possible to know from
a given point in the game, what are the next available moves. So they also share other
characteristic, they are ‘full information games’. Each player knows everything about the
possible moves of the adversary.
Before explaining the algorithm, a brief introduction to search trees is required. Search trees are
a way to represent searches. The squares are known as nodes and they represent points of the
decision in the search. The nodes are connected with branches. The search starts at the root node,
the one at the top of the figure. At each decision point, nodes for the available search paths are
generated, until no more decisions are possible. The nodes that represent the end of the search
are known as leaf nodes.
There are two players involved, MAX and MIN. A search tree is generated, depth-first, starting
with the current game position upto the end game position. Then, the final game position is
evaluated from MAX’s point of view, as shown in Figure 1. Afterwards, the inner node values of
the tree are filled bottom-up with the evaluated values. The nodes that belong to the MAX player
receive the maximun value of it’s children. The nodes for the MIN player will select the
minimun value of it’s children.
MinMax (GamePosition game) {
return MaxMove (game);
}
MaxMove (GamePosition game) {
if (GameEnded(game)) {
return EvalGameState(game);
36. }
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MinMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}
return best_move;
}
}
MinMove (GamePosition game) {
best_move <- {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MaxMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}
return best_move;
}
So what is happening here? The values represent how good a game move is. So the MAX player
will try to select the move with highest value in the end. But the MIN player also has something
to say about it and he will try to select the moves that are better to him, thus minimizing MAX’s
outcome.
Optimisation
However only very simple games can have their entire search tree generated in a short time. For
most games this isn’t possible, the universe would probably vanish first. So there are a few
optimizations to add to the algorithm.
First a word of caution, optimization comes with a price. When optimizing we are trading the full
information about the game’s events with probabilities and shortcuts. Instead of knowing the full
path that leads to victory, the decisions are made with the path that might lead to victory. If the
optimization isn’t well choosen, or it is badly applied, then we could end with a dumb AI. And it
would have been better to use random moves.
37. One basic optimization is to limit the depth of the search tree. Why does this help? Generating
the full tree could take ages. If a game has a branching factor of 3, which means that each node
has tree children, the tree will have the folling number of nodes per depth:
The sequence shows that at depth n the tree will have 3^n nodes. To know the total number of
generated nodes, we need to sum the node count at each level. So the total number of nodes for a
tree with depth n is sum (0, n, 3^n). For many games, like chess that have a very big branching
factor, this means that the tree might not fit into memory. Even if it did, it would take to long to
generate. If each node took 1s to be analyzed, that means that for the previous example, each
search tree would take sum (0, n, 3^n) * 1s. For a search tree with depth 5, that would mean
1+3+9+27+81+243 = 364 * 1 = 364s = 6m! This is too long for a game. The player would give
up playing the game, if he had to wait 6m for each move from the computer.
The second optimization is to use a function that evaluates the current game position from the
point of view of some player. It does this by giving a value to the current state of the game, like
counting the number of pieces in the board, for example. Or the number of moves left to the end
of the game, or anything else that we might use to give a value to the game position.
Instead of evaluating the current game position, the function might calculate how the current
game position might help ending the game. Or in another words, how probable is that given the
current game position we might win the game. In this case the function is known as an estimation
function.
This function will have to take into account some heuristics. Heuristics are knowledge that we
have about the game, and it can help generate better evaluation functions. For example, in
checkers, pieces at corners and sideways positions can’t be eaten. So we can create an evaluation
function that gives higher values to pieces that lie on those board positions thus giving higher
outcomes for game moves that place pieces in those positions.
One of the reasons that the evaluation function must be able to evalute game positions for both
players is that you don’t know to which player the limit depth belongs.
However having two functions can be avoided if the game is symetric. This means that the loss
of a player equals the gains of the other. Such games are also known as ZERO-SUM games. For
these games one evalution function is enough, one of the players just have to negate the return of
the function.
38. The revised algorithm is:
MinMax (GamePosition game) {
return MaxMove (game);
}
MaxMove (GamePosition game) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MAX);
}
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MinMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}
return best_move;
}
}
MinMove (GamePosition game) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MIN);
}
else {
best_move <- {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MaxMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}
return best_move;
}
}
Even so the algorithm has a few flaw, some of them can be fixed while other can only be solved
by choosing another algorithm.
One of flaws is that if the game is too complex the answer will always take too long even with a
depth limit. One solution it limit the time for search. If the time runs out choose the best move
found until the moment.
39. A big flaw is the limited horizon problem. A game position that appears to be very good might
turn out very bad. This happens because the algorithm wasn’t able to see that a few game moves
ahead the adversary will be able to make a move that will bring him a great outcome. The
algorithm missed that fatal move because it was blinded by the depth limit.
Speeding the Algorithm
There are a few things can still be done to reduce the search time. Take a look at figure 2. The
value for node A is 3, and the first found value for the subtree starting at node B is 2. So since
the B node is at a MIN level, we know that the selected value for the B node must be less or
equal than 2. But we also know that the A node has the value 3, and both A and B nodes share
the same parent at a MAX level. This means that the game path starting at the B node wouldn’t
be selected because 3 is better than 2 for the MAX node. So it isn’t worth to pursue the search
for children of the B node, and we can safely ignore all the remaining children.
This all means that sometimes the search can be aborted because we find out that the search
subtree won’t lead us to any viable answer.
This optimization is know as alpha-beta cuttoffs and the algorithm is as follows:
1. Have two values passed around the tree nodes:
i)the alpha value which holds the best MAX value found;
ii)the beta value which holds the best MIN value found.
2. At MAX level, before evaluating each child path, compare the returned value with of the
previous path with the beta value. If the value is greater than it abort the search for the current
node;
3. At MIN level, before evaluating each child path, compare the returned value with of the
previous path with the alpha value. If the value is lesser than it abort the search for the current
node.
Full pseudocode for MinMax with alpha-beta cuttoffs.
MinMax (GamePosition game) {
return MaxMove (game);
}
40. MaxMove (GamePosition game, Integer alpha, Integer beta) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MAX);
}
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MinMove(ApplyMove(game), alpha, beta);
if (Value(move) > Value(best_move)) {
best_move < - move;
alpha <- Value(move);
}
// Ignore remaining moves
if (beta > alpha)
return best_move;
}
return best_move;
}
}
MinMove (GamePosition game) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MIN);
}
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MaxMove(ApplyMove(game), alpha, beta);
if (Value(move) > Value(best_move)) {
best_move < - move;
beta <- Value(move);
}
// Ignore remaining moves
if (beta < alpha)
return best_move;
}
return best_move;
}
}
How better does a MinMax with alpha-beta cuttoffs behave when compared with a normal
MinMax? It depends on the order the search is searched. If the way the game positions are
41. generated doesn’t create situations where the algorithm can take advantage of alpha-beta cutoffs
then the improvements won’t be noticible. However, if the evaluation function and the
generation of game positions leads to alpha-beta cuttoffs then the improvements might be great.
Alpha-Beta Cutoff
With all this talk about search speed many of you might be wondering what this is all about.
Well, the search speed is very important in AI because if an algorithm takes too long to give a
good answer the algorithm may not be suitable.
For example, a good MinMax algorithm implementation with an evaluation function capable to
give very good estimatives might be able to search 1000 positions a second. In tourament chess
each player has around 150 seconds to make a move. So it would probably be able to analyze
150 000 positions during that period. But in chess each move has around 35 possible branchs! In
the end the program would only be able to analyze around 3, to 4 moves ahead in the game. Even
humans with very few pratice in chess can do better than this.
But if we use MinMax with alpha-beta cutoffs, again a decent implementation with a good
evaluation function, the result behaviour might be much better. In this case, the program might
be able to double the number of analyzed positions and thus becoming a much toughter
adversary.
Example
Example of a board with the values estimated for each position.
The game uses MinMax with alpha-beta cutoffs for the computer moves. The evaluation function
is an weighted average of the positions occupied by the checker pieces. The figure shows the
values for each board position. The value of each board position is multiplied by the type of the
piece that rests on that position, described in first table.
Rule based Expert System
42. Expert System !
"An expert system is an interactive computer-based decision tool that uses both facts and
heuristics to solve difficult decision problems based on knowledge acquired from an expert."
An expert system is a computer program that simulates the thought process of a human expert to
solve complex decision problems in a specific domain. This chapter addresses the characteristics
of expert systems that make them different from conventional programming and traditional de-
cision support tools. The growth of expert systems is expected to continue for several years.
With the continuing growth, many new and exciting applications will emerge. An expert system
operates as an interactive system that responds to questions, asks for clarification, makes
recommendations, and generally aids the decision-making process. Expert systems provide
expert advice and guidance in a wide variety of activities, from computer diagnosis
An expert system may be viewed as a computer simulation of a human expert. Expert systems
are an emerging technology with many areas for po- tential applications. Past applications range
from MYCIN, used in the medical field to diagnose infectious blood diseases, to XCON, used to
configure com- puter systems. These expert systems have proven to be quite successful. Most
applications of expert systems will fall into one of the following categories:
• Interpreting and identifying
• Predicting
• Diagnosing
• Designing
• Planning
• Monitoring
• Debugging and testing
• Instructing and training
• Controlling
Applications that are computational or deterministic in nature are not good candidates for expert
systems. Traditional decision support systems such as spreadsheets are very mechanistic in the
way they solve problems. They operate under mathematical and Boolean operators in their
execution and arrive at one and only one static solution for a given set of data. Calculation
intensive applications with very exacting requirements are better handled by traditional decision
support tools or conventional programming. The best application candidates for expert systems
are those dealing with expert heuristics for solving problems. Conventional computer programs
are based on factual knowledge, an indisputable strength of computers. Humans, by contrast,
solve problems on the basis of a mixture of factual and heuristic knowledge. Heuristic
knowledge, composed of intuition, judgment, and logical inferences, is an indisputable strength
of humans. Successful expert systems will be those that combine facts and heuristics and thus
merge human knowledge with computer power in solving problems. To be effective, an expert
system must focus on a particular problem domain, as discussed below
Domain Specificity
43. Expert systems are typically very domain specific. For example, a diagnostic expert system for
troubleshooting computers must actually perform all the necessary data manipulation as a human
expert would. The developer of such a system must limit his or her scope of the system to just
what is needed to solve the target problem. Special tools or programming languages are often
needed to accomplish the specific objectives of the system.
Special Programming Languages
Expert systems are typically written in special programming languages. The use of languages
like LISP and PROLOG in the development of an expert system simplifies the coding process.
The major advantage of these languages, as compared to conventional programming languages,
is the simplicity of the addition, elimination, or substitution of new rules and memory
management capabilities. Some of the distinguishing characteristics of programming languages
needed for expert systems work are:
• Efficient mix of integer and real variables
• Good memory-management procedures
• Extensive data-manipulation routines
• Incremental compilation
• Tagged memory architecture
• Optimization of the systems environment
• Efficient search procedures
Architecture of Expert System !
Expert systems typically contain the following four components:
• Knowledge-Acquisition Interface
• User Interface
• Knowledge Base
• Inference Engine
This architecture differs considerably from traditional computer programs, resulting in several
characteristics of expert systems.
44. # Expert System Components #
Knowledge-Acquisition Interface
The knowledge-acquisition interface controls how the expert and knowledge engineer interact
with the program to incorporate knowledge into the knowledge base. It includes features to assist
experts in expressing their knowledge in a form suitable for reasoning by the computer.
This process of expressing knowledge in the knowledge base is called knowledge acquisition.
Knowledge acquisition turns out to be quite difficult in many cases--so difficult that some
authors refer to the knowledge acquisition bottleneck to indicate that it is this aspect of expert
system development which often requires the most time and effort.
45. Debugging faulty knowlege bases is facilitated by traces (lists of rules in the order they were
fired), probes (commands to find and edit specific rules, facts, and so on), and bookkeeping
functions and indexes (which keep track of various features of the knowledge base such as
variables and rules). Some rule-based expert system shells for personal computers monitor data
entry, checking the syntactic validity of rules. Expert systems are typically validated by testing
their preditions for several cases against those of human experts. Case facilities--permitting a file
of such cases to be stored and automatically evaluated after the program is revised--can greatly
speed the vaidation process. Many features that are useful for the user interface, such as on-
screen help and explanations, are also of benefit to the developer of expert systems and are also
part of knowledge-acquisition interfaces.
Expert systems in the literature demonstrate a wide range of modes of knowledge acquisition
(Buchanan, 1985). Expert system shells on microcomputers typically require the user to either
enter rules explicitly or enter several examples of cases with appropriate conclusions, from
which the program will infer a rule.
User Interface
The user interface is the part of the program that interacts with the user. It prompts the user for
information required to solve a problem, displays conclusions, and explains its reasoning.
Features of the user interface often include:
• Doesn't ask "dumb" questions
• Explains its reasoning on request
• Provides documentation and references
• Defines technical terms
• Permits sensitivity analyses, simulations, and what-if analyses
• Detailed report of recommendations
• Justifies recommendations
• Online help
• Graphical displays of information
• Trace or step through reasoning
The user interface can be judged by how well it reproduces the kind of interaction one might
expect between a human expert and someone consulting that expert.
Knowledge Base
The knowledge base consists of specific knowledge about some substantive domain. A
knowledge base differs from a data base in that the knowledge base includes both explicit
knowledge and implicit knowledge. Much of the knowledge in the knowledge base is not stated
explicitly, but inferred by the inference engine from explicit statements in the knowledge base.
This makes knowledge bases have more efficient data storage than data bases and gives them the
power to exhaustively represent all the knowledge implied by explicit statements of knowledge.
46. There are several important ways in which knowledge is represented in a knowledge base. For
more information, see knowledge representation strategies.
Knowledge bases can contain many different types of knowledge and the process of acquiring
knowledge for the knowledge base (this is often called knowledge acquisition) often needs to be
quite different depending on the type of knowledge sought.
Types of Knpwledge
There are many different kinds of knowledge considered in expert systems. Many of these form
dimensions of contrasting knowledge:
• explicit knowledge
• implicit knowledge
• domain knowledge
• common sense or world knowledge
• heuristics
• algorithms
• procedural knowledge
• declarative or semantic knowledge
• public knowledge
• private knowledge
• shallow knowledge
• deep knowledge
• metaknowledge
Inference Engine
The inference engine uses general rules of inference to reason from the knowledge base and
draw conclusions which are not explicitly stated but can be inferred from the knowledge base.
Inference engines are capable of symbolic reasoning, not just mathematical reasoning. Hence,
they expand the scope of fruitful applications of computer programs.
The specific forms of inference permitted by different inference engines varies, depending on
several factors, including the knowledge representation strategies employed by the expert
system.
Expert System Development !
Most expert systems are developed by a team of people, with the number of members varying
with the complexity and scope of the project. Of course, a single individual can develop a very
simple system. But usually at least two people are involved.
There are two essential roles that must filled by the development: knowledge engineer and
substantive expert.
47. • The Knowledge Engineer
• The Substantive Expert
The Knowledge Engineer
Criteria for selecting the Knowledge Engineer
• Competent
• Organized
• Patient
Problem with Knowledge Engineer
• Technician with little social skill
• Sociable with low technical skill
• Disorganized
• Unwilling to challeng expert to produce clarity
• Unable to listen carefully to expert
• Undiplomatic when discussing flaws in system or expert's knowledge
• Unable to quickly understand diverse substantive areas
The Substantive Expert
Criteria for selecting the expert
• Competent
• Available
• Articulate
• Self-Confident
• Open-Minded
Varieties of experts
• No expert
• Multiple experts
• Book knowledge only
• The knowledge engineer is also the expert
Problem Experts
• The unavailable expert
• The reluctant expert
• The cynical expert
• The arrogant expert
• The rambling expert
• The uncommunicative expert
• The too-cooperative expert
48. • The would-be-knowledge-engineer expert
Development Process
The systems development process often used for traditional software such as management
information systems often employs a process described as the "System Development Life Cycle"
or "Waterfall" Model. While this model identifies a number of important tasks in the
development process, many developers of expert systems have found it to be inadequate for
expert systems for a number of important reasons. Instead, many expert systems are developed
using a process called "Rapid Prototyping and Incremental Development."
System Development Life-Cycle
Problem Analysis
Is the problem solvable? Is it feasible with this approach? cost-benefit analysis
Requirement Specification
What are the desired features and goals of the proposed system? Who are the users? What
constraints must be considered? What development and delivery environments will be used?
Design
Preliminary Design - overall structure, data flow diagram, perhaps language
Detailed Design - details of each module
Implementation
Writing and debugging code, integrating modules, creating interfaces
Testing
Comparing system to its specifications and assessing validity
Maintenance
Corrections, modifications, enhancements
Managing Uncertainty in Expert Systems
Sources of uncertainty in Expert System
• Weak implication
• Imprecise language
49. • Unknown data
• Difficulty in combining the views of different experts
Uncertainty in AI
• Information is partial
• Information is not fully reliable
• Representation language is inherently imprecise
• Information comes from multiple sources and it is conflicting
• Information is approximate
• Non-absolute cause-effect relationship exist
Representing uncertain information in Expert System
• Probabilistic
• Certainty factors
• Theory of evidence
• Fuzzy logic
• Neural Network
• GA
• Rough set
Bayesian Probability Theory
Bayesian probability is one of the most popular interpretations of the concept of probability. The
Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning
with uncertain statements. To evaluate the probability of a hypothesis, the Bayesian probabilist
specifies some prior probability, which is then updated in the light of new relevant data. The
Bayesian interpretation provides a standard set of procedures and formulae to perform this
calculation.
Bayesian probability interprets the concept of probability as "a measure of a state of knowledge",
in contrast to interpreting it as a frequency or a physical property of a system. Its name is derived
from the 18th century statistician Thomas Bayes, who pioneered some of the concepts. Broadly
speaking, there are two views on Bayesian probability that interpret the state of knowledge
concept in different ways. According to the objectivist view, the rules of Bayesian statistics can
be justified by requirements of rationality and consistency and interpreted as an extension of
logic. According to the subjectivist view, the state of knowledge measures a "personal belief".
Many modern machine learning methods are based on objectivist Bayesian principles. One of the
crucial features of the Bayesian view is that a probability is assigned to a hypothesis, whereas
under the frequentist view, a hypothesis is typically rejected or not rejected without directly
assigning a probability.
The probability of a hypothesis given the data (the posterior) is proportional to the product of the
likelihood times the prior probability (often just called the prior). The likelihood brings in the
50. effect of the data, while the prior specifies the belief in the hypothesis before the data was
observed.
More formally, Bayesian inference uses Bayes' formula for conditional probability:
where
H is a hypothesis, and D is the data.
P(H) is the prior probability of H: the probability that H is correct before the data D was seen.
P(D | H) is the conditional probability of seeing the data D given that the hypothesis H is true.
P(D | H) is called the likelihood.
P(D) is the marginal probability of D.
P(H | D) is the posterior probability: the probability that the hypothesis is true, given the data and
the previous state of belief about the hypothesis.
Stanford Certainty Factor !
Uncertainty is represented as a degree of belief in two steps:
• Express the degree of belief
• Manipulate the degrees of belief during the use of knowledge based systems
It is also based on evidence (or the expert’s assessment).
Form of certainty factors in ES
IF <evidence>
THEN <hypothesis> {cf }
cf represents belief in hypothesis H given that evidence E has occurred
It is based on 2 functions
i) Measure of belief MB(H, E)
ii) Measure of disbelief MD(H, E)
Indicate the degree to which belief/disbelief of hypothesis H is increased if evidence E were
observed
51. Uncertain term and their intepretation
Total strength of belief and disbelief in a hypothesis:
Nonmonotonic logic and Reasoning with Beliefs
A non-monotonic logic is a formal logic whose consequence relation is not monotonic. Most
studied formal logics have a monotonic consequence relation, meaning that adding a formula to a
theory never produces a reduction of its set of consequences. Intuitively, monotonicity indicates
that learning a new piece of knowledge cannot reduce the set of what is known. A monotonic
logic cannot handle various reasoning tasks such as reasoning by default (consequences may be
derived only because of lack of evidence of the contrary), abductive reasoning (consequences are
only deduced as most likely explanations) and some important approaches to reasoning about
knowledge (the ignorance of a consequence must be retracted when the consequence becomes
known) and similarly belief revision (new knowledge may contradict old beliefs).
Default reasoning
An example of a default assumption is that the typical bird flies. As a result, if a given animal is
known to be a bird, and nothing else is known, it can be assumed to be able to fly. The default
assumption must however be retracted if it is later learned that the considered animal is a
penguin. This example shows that a logic that models default reasoning should not be
monotonic. Logics formalizing default reasoning can be roughly divided in two categories:
logics able to deal with arbitrary default assumptions (default logic, defeasible logic/defeasible
reasoning/argument (logic), and answer set programming) and logics that formalize the specific
52. default assumption that facts that are not known to be true can be assumed false by default
(closed world assumption and circumscription).
Abductive reasoning
Abductive reasoning is the process of deriving the most likely explanations of the known facts.
An abductive logic should not be monotonic because the most likely explanations are not
necessarily correct. For example, the most likely explanation for seeing wet grass is that it
rained; however, this explanation has to be retracted when learning that the real cause of the
grass being wet was a sprinkler. Since the old explanation (it rained) is retracted because of the
addition of a piece of knowledge (a sprinkler was active), any logic that models explanations is
non-monotonic.
Reasoning about knowledge
If a logic includes formulae that mean that something is not known, this logic should not be
monotonic. Indeed, learning something that was previously not known leads to the removal of
the formula specifying that this piece of knowledge is not known. This second change (a removal
caused by an addition) violates the condition of monotonicity. A logic for reasoning about
knowledge is the autoepistemic logic.
Belief revision
Belief revision is the process of changing beliefs to accommodate a new belief that might be
inconsistent with the old ones. In the assumption that the new belief is correct, some of the old
ones have to be retracted in order to maintain consistency. This retraction in response to an
addition of a new belief makes any logic for belief revision to be non-monotonic. The belief
revision approach is alternative to paraconsistent logics, which tolerate inconsistency rather than
attempting to remove it.
What makes belief revision non-trivial is that several different ways for performing this
operation may be possible. For example, if the current knowledge includes the three facts “A is
true”, “B is true” and “if A and B are true then C is true”, the introduction of the new information
“C is false” can be done preserving consistency only by removing at least one of the three facts.
In this case, there are at least three different ways for performing revision. In general, there may
be several different ways for changing knowledge.
Fuzzy Logic
The concept of Fuzzy Logic (FL) was conceived by Lotfi Zadeh, a professor at the University of
California at Berkley, and presented not as a control methodology, but as a way of processing
data by allowing partial set membership rather than crisp set membership or non-membership.
This approach to set theory was not applied to control systems until the 70's due to insufficient
small-computer capability prior to that time. Professor Zadeh reasoned that people do not require
precise, numerical information input, and yet they are capable of highly adaptive control. If
feedback controllers could be programmed to accept noisy, imprecise input, they would be much
53. more effective and perhaps easier to implement. Unfortunately, U.S. manufacturers have not
been so quick to embrace this technology while the Europeans and Japanese have been
aggressively building real products around it.
WHAT IS FUZZY LOGIC?
In this context, FL is a problem-solving control system methodology that lends itself to
implementation in systems ranging from simple, small, embedded micro-controllers to large,
networked, multi-channel PC or workstation-based data acquisition and control systems. It can
be implemented in hardware, software, or a combination of both. FL provides a simple way to
arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input
information. FL's approach to control problems mimics how a person would make decisions,
only much faster.
HOW IS FL DIFFERENT FROM CONVENTIONAL CONTROL METHODS?
FL incorporates a simple, rule-based IF X AND Y THEN Z approach to a solving control
problem rather than attempting to model a system mathematically. The FL model is empirically-
based, relying on an operator's experience rather than their technical understanding of the
system. For example, rather than dealing with temperature control in terms such as "SP =500F",
"T <1000F", or "210C <TEMP <220C", terms like "IF (process is too cool) AND (process is
getting colder) THEN (add heat to the process)" or "IF (process is too hot) AND (process is
heating rapidly) THEN (cool the process quickly)" are used. These terms are imprecise and yet
very descriptive of what must actually happen. Consider what you do in the shower if the
temperature is too cold: you will make the water comfortable very quickly with little trouble. FL
is capable of mimicking this type of behavior but at very high rate.
HOW DOES FL WORK?
FL requires some numerical parameters in order to operate such as what is considered significant
error and significant rate-of-change-of-error, but exact values of these numbers are usually not
critical unless very responsive performance is required in which case empirical tuning would
determine them. For example, a simple temperature control system could use a single
temperature feedback sensor whose data is subtracted from the command signal to compute
"error" and then time-differentiated to yield the error slope or rate-of-change-of-error, hereafter
called "error-dot". Error might have units of degs F and a small error considered to be 2F while a
large error is 5F. The "error-dot" might then have units of degs/min with a small error-dot being
5F/min and a large one being 15F/min. These values don't have to be symmetrical and can be
"tweaked" once the system is operating in order to optimize performance. Generally, FL is so
forgiving that the system will probably work the first time without any tweaking.
Dempster/Shafer Theory
The Dempster-Shafer theory, also known as the theory of belief functions, is a generalization of
the Bayesian theory of subjective probability. Whereas the Bayesian theory requires probabilities
for each question of interest, belief functions allow us to base degrees of belief for one question
54. on probabilities for a related question. These degrees of belief may or may not have the
mathematical properties of probabilities; how much they differ from probabilities will depend on
how closely the two questions are related.
The Dempster-Shafer theory owes its name to work by A. P. Dempster (1968) and Glenn Shafer
(1976), but the kind of reasoning the theory uses can be found as far back as the seventeenth
century. The theory came to the attention of AI researchers in the early 1980s, when they were
trying to adapt probability theory to expert systems. Dempster-Shafer degrees of belief resemble
the certainty factors in MYCIN, and this resemblance suggested that they might combine the
rigor of probability theory with the flexibility of rule-based systems. Subsequent work has made
clear that the management of uncertainty inherently requires more structure than is available in
simple rule-based systems, but the Dempster-Shafer theory remains attractive because of its
relative flexibility.
The Dempster-Shafer theory is based on two ideas: the idea of obtaining degrees of belief for one
question from subjective probabilities for a related question, and Dempster's rule for combining
such degrees of belief when they are based on independent items of evidence.
To illustrate the idea of obtaining degrees of belief for one question from subjective probabilities
for another, suppose I have subjective probabilities for the reliability of my friend Jon. My
probability that he is reliable is 0.9, and my probability that he is unreliable is 0.1. Suppose he
tells me a limb fell on my car. This statement, which must true if she is reliable, is not
necessarily false if she is unreliable. So his testimony alone justifies a 0.9 degree of belief that a
limb fell on my car, but only a zero degree of belief (not a 0.1 degree of belief) that no limb fell
on my car. This zero does not mean that I am sure that no limb fell on my car, as a zero
probability would; it merely means that jon's testimony gives me no reason to believe that no
limb fell on my car. The 0.9 and the zero together constitute a belief function.
Knowledge Acquisition
Knowledge Acquisition is concerned with the development of knowledge bases based on the
expertise of a human expert. This requires to express knowledge in a formalism suitable for
automatic interpretation. Within this field, research at UNSW focusses on incremental
knowledge acquisition techniques, which allow a human expert to provide explanations of their
decisions that are automatically integrated into sophisticated knowledge bases.
Types of Learning
Learning is acquiring new knowledge, behaviors, skills, values, preferences or understanding,
and may involve synthesizing different types of information. The ability to learn is possessed by
humans, animals and some machines. Progress over time tends to follow learning curves.
Human learning may occur as part of education or personal development. It may be goal-
oriented and may be aided by motivation. The study of how learning occurs is part of
neuropsychology, educational psychology, learning theory, and pedagogy.
55. Learning may occur as a result of habituation or classical conditioning, seen in many animal
species, or as a result of more complex activities such as play, seen only in relatively intelligent
animals and humans. Learning may occur consciously or without conscious awareness. There is
evidence for human behavioral learning prenatally, in which habituation has been observed as
early as 32 weeks into gestation, indicating that the central nervous system is sufficiently
developed and primed for learning and memory to occur very early on in development.
Play has been approached by several theorists as the first form of learning. Children play,
experiment with the world, learn the rules, and learn to interact. Vygotsky agrees that play is
pivotal for children's development, since they make meaning of their environment through play.
Types of Learning
Habituation
In psychology, habituation is an example of non-associative learning in which there is a
progressive diminution of behavioral response probability with repetition of a stimulus. It is
another form of integration. An animal first responds to a stimulus, but if it is neither rewarding
nor harmful the animal reduces subsequent responses. One example of this can be seen in small
song birds - if a stuffed owl (or similar predator) is put into the cage, the birds initially react to it
as though it were a real predator. Soon the birds react less, showing habituation. If another
stuffed owl is introduced (or the same one removed and re-introduced), the birds react to it again
as though it were a predator, demonstrating that it is only a very specific stimulus that is
habituated to (namely, one particular unmoving owl in one place). Habituation has been shown
in essentially every species of animal, including the large protozoan Stentor Coeruleus.
Sensitization
Sensitization is an example of non-associative learning in which the progressive amplification of
a response follows repeated administrations of a stimulus (Bell et al., 1995). An everyday
example of this mechanism is the repeated tonic stimulation of peripheral nerves that will occur
if a person rubs his arm continuously. After a while, this stimulation will create a warm sensation
that will eventually turn painful. The pain is the result of the progressively amplified synaptic
response of the peripheral nerves warning the person that the stimulation is harmful.
Sensitization is thought to underlie both adaptive as well as maladaptive learning processes in the
organism.
Asociative learning
Associative learning is the process by which an element is learned through association with a
separate, pre-occurring element.
Operant conditioning
Operant conditioning is the use of consequences to modify the occurrence and form of behavior.
Operant conditioning is distinguished from Pavlovian conditioning in that operant conditioning