1. Semantic Nets
Frames
Slots Exceptions
Slot – Values as Objects
Handling Uncertainties
Probabilistic Reasoning
Use of Certainty factor
Fuzzy Logic
Unit 6
Structured Knowledge Representation
Semantic Net
Learning Objectives
After reading this unit you should appreciate the following:
• Semantic Nets
• Frames
• Slots Exceptions
• Handling Uncertainties
• Probabilistic Reasoning
• Use of Certainty Factors
• Fuzzy Logic
Top
Semantic Nets
In a semantic net, information is represented as a set of nodes connected to each other by a set
of labelled ones, which represent relationships among the nodes. A fragment of a typical
semantic net is shown in Figure 6.1.
2. 140 ARTIFICIAL INTELLIGENCE
Figure 6.1: A Semantic Network
This network contains example of both the isa and instance relations, as well as some other, more
domain-specific relations like team and uniform-color. In this network we would use inheritance to
derive the additional relation.
has-part (Pee-Wee-Reese, Nose)
Intersection Search
Intersection search is one of the early ways that semantic nets used to find relationships among
objects by spreading activation out from each of two nodes and seeing where the activation met.
Using this process, it is possible to use the network of Figure 6.1 to answer questions such as
“What is the connection between the Brooklyn Dodgers and blue?” This kind of reasoning exploits
one of the important advantages that slot-and-filler structures have over purely logical
representation because it takes advantage of the entity –based organization of knowledge that
slot-and-filler representation provide.
To answer more structured questions, however, requires networks that are themselves more
highly structured. In the next few sections we expand and refine our notion of a network in order
to support more sophisticated reasoning.
Representing Nonbinary Predicates
Semantic nets are a natural way to represent relationships that would appear as ground instance
of binary predicates in predicate logic. For example, some of the areas from Figure 6.1 could be
represented in logic as. (Figure 6.2)
3. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 141
Figure 6.2: A Semantic Net for an n-Place Predicate
isa (Person, Mammal)
instance (Pee-Wee-Reese, Person)
team (Pee-Wee-Reese, Broklyn-Dodgers)
uniform-color(Pee-Wee-Reese,Blue)
But the knowledge expressed by predicates of other airties can also be expressed in semantic
nets. We have already seen that many unary predicates in logic can be though of a binary
predicates using some very general-purpose predicates, such as isa and instance. So for
example,
man(Marcus)
could be rewritten as
instance (Marcus, Man)
thereby making it easy to represent in a semantic net.
Three or more place predicates can also be converted to a binary form by creating one new
object representing the entire predicate statement and then introducing binary predicates to
describe the relationship to this new object of each of the original arguments. For example,
suppose we know that
score (Cubs, Dodgers, 5-3)
This can be represented in a semantic net by creating a node to represent the specific game and
then relating each of the three pieces of information to it. Doing this produces the network shown
in Figure 6.2.
This technique is particularly useful for representing the contents of a typical declarative sentence
that describes several aspects of a particular event. The sentence could be represented by the
network show in Figure 6.3.
4. 142 ARTIFICIAL INTELLIGENCE
John gave the book to Mary
Figure 6.3: A Semantic Net Representing a Sentence
Making Some Important Distinctions
In the networks we have described so far, we have glossed over some distinctions that are
important in reasoning. For example, there should be a difference between a link that defines a
new entity and one that relates two existing entities. Consider the net
Both nodes represent objects that exist independent of their relationship to each other. But now
suppose we want to represent the fact that John is taller than Bill, using the net.
The nodes H1 and H2 are new concepts representing John’s height and Bills’ height,
respectively. They are defined by their relationships to the nodes John and Bill. Using these
defined concepts, it is possible to represent such facts as that John’s height increased, which we
could not do before. (The number 72 increased.)
Sometimes it is useful to introduce the arc value to make this distinction clear. Thus we might use
the following net to represent the fact that John is 6 feet tall and that he is taller than Bill:
5. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 143
The procedures that operate on nets such as this can exploit the fact that some arcs, such as
height, define new entities, while others, such as greater-than and value, merely describe
relationship among existing entities.
Yet another example that we have missed is the difference between the properties of a node itself
and the properties that a node simply holds and passes on to its instances. For example, it is a
properly of the node Person that it is a subclass of the node Mammal. But the node Person does
not have as one of its parts a nose. Instances of the node Person do, and we want them to inherit
it.
It’s easier said than done to capture these distinctions without assigning more structure to our
notions of node, link, and value. But first, we discuss a network-oriented solution to a simpler
problem; this solution illustrates what can be done in the network model but at what price in
complexity.
Partitioned Semantic Nets
Suppose we want to represent simple quantified expressions in semantic nets. One was to do this
is to partition the semantic net into a hierarchical set of spaces, each of which corresponds to the
scope of one or more variables. To see how this works, consider first the simple net shown in
Figure 6.4. This net corresponds to the statement.
The dog bit the mail carrier.
The nodes Dogs, Bite, and Mail-Carrier represent the classes of dogs, bitings, and mail carriers,
respectively, while the nodes d, b, and m represent a particular dog, a particular biting, and a
particular mail carrier. A single net with no partitioning can easily represent this fact.
But now suppose that we want to represent the fact
Every dog has bitten a mail carrier.
6. 144 ARTIFICIAL INTELLIGENCE
Figure 6.4: Using Partitioned Semantic Nets
∀ : Dog ( x ) →∃ : Mail− Carrier( y ) ∧ Bite( x , y )
x y
It is necessary to encode the scope of the universally quantified variable x in order to represent
this variable. This can be done using partitioning as shown in Figure 6.4(b). The node stands for
the assertion given above. Node g is an instance of the special class GS of general statements
about the world (i.e., those with universal quantifiers). Every element to GS has at least two
attributes: a form, which states the relation that is being assert one or more ∀ connections, one
for each of the universally quatified variable. In this example, there is only one such variable d,
which can stand for any element the class Dogs. The other two variables in the form, b and m,
are understood to existentially qualified. In other words, for every dog d, there exists a biting
event and a mail carrier m, such that d is the assailant of b and m is the victim.
To see how partitioning makes variable quantification explicit, consider next similar sentence:
Every dog in town has bitten the constable
The representation of this sentence is shown in Figure 6.4 (c). In this net, the node representing
the victim lies outside the form of the general statement. Thus it is not viewed as an existentially
quantified variable whose value may depend on the value of d. Instead it is interpreted as
standing for a specific entity (in this case, a particular table), just as do other nodes in a standard,
no partitioned net. Figure 6.4(d) shows how yet another similar sentences
Every dog has bitten every mail carrier.
7. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 145
should be represented. In this case, g has two ∀ links, one pointing to d, which represents dog,
and one pointing to m, representing ay mail carrier.
The spaces of a partitioned semantic net are related to each other by an inclusion Searchy. For
example, in Figure 6.4(d), space SI is included in space SA. Whenever search process operates
in a partitioned semantic net, it can explore nodes and arcs in space from which it starts and in
other spaces that contain the starting point, but it does not go downward, except in special
circumstances, such as when a form arc is being traversed. So, returning to figure 6.4(d), from
node d it can be determined that d must be a dog. But if we were to start at the node Dogs and
search for all known instances dogs by traversing is a likes, we would not find d since it and the
link to it are in space SI, which is at a lower level than space SA, which contains Dogs. This is
constant, since d does not stand for a particular dog; it is merely a variable that can be initiated
with a value that represents a dog.
Student Activity 6.1
Before reading next section, answer the following questions.
1. How semantic nets are useful in knowledge representation?
2. Draw semantic nets for the following facts:
a. Every player hits the ball.
b. Every dog eats a cat.
If your answers are correct, then proceed to the next section.
The Evolution into Frames
The idea of a semantic net started out simply as a way to represent labelled connections among
entities. But, as we have just seen, as we expand the rage of problem-solving tasks that the
representation must support, the representation itself necessarily begins to become more
complex. In particular, it becomes useful to assign more structure to nodes as well as to links.
Although there is no clear distinction between a semantic net and a frame system, the more
structure the system has, the more likely it is to be termed a frame system. In the next section we
continue our discussion of structured slot and-filler representations by describing some of the
most important capabilities that frame systems offer.
Top
8. 146 ARTIFICIAL INTELLIGENCE
Frames
A frame is a collection of attributes called slots and associated values (and possibly constraints
on values) that describe some entity in the world. Sometimes a frame describes an entity in some
absolute sense; sometimes it represents the entity from particular point of view. A single frame
taken alone is rarely useful. Instead, we build frame systems out of collections of frames that are
connected to each other by virtue of the fact that the value of an attribute of one frame may be
another frame. In the rest of this section, we expand on this simple definition and explore ways
that frame systems can be used to encode knowledge and support reasoning.
Default Frames
Set theory provides a good basis for understanding frame systems. Although not all frame
systems are defined this way, we do so here. In this view, each frame represents either a class (a
set) or an instance (an element of a class). To see how this works, consider the frame system
shown in Figure 6.5. In this example, the frames Person, Adult-Male, ML-Baseball-Player
(corresponding to major league baseball players) Pitter, and ML-Baseball-Team (for major league
baseball team) are all classes. The frame Pee-Wee-Reese and Brooklyn-Dodgers are instances.
The is a relation that we have been using without a precise definition is in fact the subset relation.
The isa of adult males is a subset of the set of people. The set of major league baseball players is
a subset of the set of adult males, and so forth. Our instance relation corresponds to the relation
element of Pee Wee Reese that is an element of the set of fielders. Thus he is also an element of
all of the superset of fielders, including major league baseball players and people. The transitivity
of isa that we have taken for granted in our description of property inheritance follows directly
from the transitivity of the subset relation.
Both the isa and instance relations have inverse attributes, where we call subclasses and all-
instances. We do not bother to write them explicitly in our examples unless we need to refer to
them. We assume that the frame system maintains them automatically. Either explicitly or by
computing them if necessary.
Since a class represents a set, there are two kinds of attributes that can be associated with it.
There are attributes about the set itself, and there are attributes that are to be inherited by each
element of the set. We indicate the difference between these two by prefixing the latter with an
asterisk (*). For example, consider the class ML-Baseball-Player. We have shown only two
properties of it as a set: It is a subset of the set of adult males. And it has cardinality 624 (i.e.,
9. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 147
there are 624 major league baseball players). We have listed five properties that all major league
baseball players have (height, bats, batting-average, team, and uniform-colour), and we have
specified default values for the first three of them. By providing both kinds of slots, we allow a
class both to define a set of objects and to describe a prototypical object of the set.
Sometimes, the distinction between a set and an individual instance may not be seen clearly. For
example, the team Brookln-Dodgers, which we have described as a instance of the class of major
league baseball teams, could be thought of as a set of players in fact, notice that the value of the
slot players is a set. Suppose, instead, what we want to represent the Dodgers as a class instead
of an instance. Then its instances would be the individual players. It cannot stay where it is in the
isa hierarchy; it cannot be a subclass of ML-Baseball-Team, because if it were, then its elements,
namely the players would also, by the transitivity of subclass, be elements of ML-Baseball-team,
which is not what we want so say. We have to put it somewhere else in the isa hierarchy. For
example, we could make it a subclass of major league baseball players. Then its elements, the
players, are also elements of ML-Baseball-Players, Adult-Male, and Person. That is acceptable.
But if we do that, we lose the ability to inherit properties of the Dodges from general information
about baseball teams. We can still inherit attributes for the elements of the team, but we cannot
inherit properties of the team as a whole, i.e., of the set of players. For example, we might like to
know what the default size of the team is, that it has a manager, and so on. The easiest way to
allow for this is to go back to the idea of the Dodgers as an instance of ML-Baseball-Team, with
the set of players given as a slot value.
Person
isa: Mammal
cardinality: 6,000,000,000
*handed: Right
Adult-Male
isa: Person
cardinality: 2,000,000,000
*height; 5-10
ML-Baseball-Player
10. 148 ARTIFICIAL INTELLIGENCE
isa: Adult-Male
cardinality: 624
*height: 6-1
*bats: equal to handed
*batting-average: . 252
*team:
*uniform-color:
Fielder
Isa: ML-Baseball-Player
cardinality: 36
*batting-average .262
Johan
insance: Fielder
height: 5-10
bats: Right
batting-average: . 309
team: Brooklyn-Dodgers
uniform-color: Blue
ML-Baseball-Team
isa: Team
cardinality: 26
*team-size: 24
*manager: 24
Brooklyn-Dodgers
instance: ML-Baseball-Team
team-size: 24
manager: Leo-Durocher
players: (Johan,Pee-Wee-Reese,…)
Figure 6.5: A Simplified Frame System
But what we have encountered here is an example of a more general problem. A class is a set,
and we want to be able to talk about properties that its elements possess. We want to use
inheritance to infer those properties from general knowledge about the set. But a class is also an
entity in itself. It may possess properties that belong not to the individual instances but rather to
the class as a whole. In the case of Brooklyn-Dodgers, such properties included team size and
the existence of a manager. We may even want to inherit some of these properties from a more
general kind of set. For example, the Dodgers can inherit a default team size from the set of all
major league baseball teams. To support this, we need to view a class as two things
simultaneously: a subset (isa) of a larger class that also contains its elements and an instance
(instance) of a class of sets, from which it inherits its set-level properties.
To make this distinction clear, it is useful to distinguish between regular classes, whose elements
are individual entities, and meta classes, which are special classes whose elements are
themselves classes. A class is now an element of (instance) some class (or classes) as well as a
subclass (isa) of one or more classes. A class inherits properties from the class of which it is an
11. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 149
instance, just as any instance does. In addition, a class passes inheritable properties down from
is super classes to its instances.
Let’s consider an example. Figure 6.6 shows how we would represent teams as classes using
this distinction. Figure 6.7 shows a graphic view of the same classes. The most basic met class in
the class represents the set of all classes. All classes are instance of it, either directly or through
one of its subclasses. In the example, Team in a subclass (subset) of Class and ML-Baseball-
Team is a subclass of Team. The class introduces the attribute cardinality, which is to be
inherited by all instances of Class (including itself). This makes sense that all the instances of
Class are sets and all sets have cardinality.
Team represents a subset of the set of all sets, namely those elements are sets of players on a
team. It inherits the property of having cardinality from Class. Team introduces the attribute team-
size, which all its elements possess. Notice that team-size is like cardinality in that it measures
the size of a set. But it applies to something different cardinality applies to sets of sets and is
inherited by all elements of Class. The slot team-size applies to the element of those sets that
happen to be teams. Those elements are of individuals.
ML-Baseball-Team is also an instance of Class, since it is a set. It inherits the property of having
cardinality from the set of which it is an instance, namely Class. But it is a subset of Team. All of
its instances will have the property of having a team-size since they are also instances of the
super class Team. We have added at this level the additional fact that the default team size is 24;
so all instance of ML-Baseball-Team will inherit that as well. In addition, we have added the
inheritable slot manager.
Brooklyn-Dodgers in an instance of a ML-Baseball-Team. It is not an instance of Class because
its elements are individuals, not sets. Brooklyn-Dodgers is a subclass of ML-Baseball-Player
since all of its elements are also elements of that set. Since it is an instance of a ML-Baseball-
Team, it inherits the properties team-size and manager, as well as their default values. It specifies
a new attribute uniform-colour, which is to be inherited by all of its instances (who will be
individual players).
Class
instance : Class
isa : Class
*cardinanality :
Team
12. 150 ARTIFICIAL INTELLIGENCE
istance : Class
isa : Class
cardinality : { the number of teams that exist}
* team size : { each team has a size}
ML – Baseball – Team
instance : Class
isa : Team
cardinality : 26 { the number of baseball teams that exist}
* team-size : 24 { default 24 players on a team}
* manager :
Brooklyn-Dodgers
instance : ML – Baseball – Team
isa : ML-Baseball – Player
team-size : 24
manager : Leo – Durocher
* unifoirm-color Blue
Pee-Wee – Reese
instance : Broklyn – Dodgers
instance : Fielder
uniform-color: Blue
batting –average : 309
Figure 6.6 : Representing the Class of All Teams as a Metaclass
Figure 6.7: Classes and Metaclasses
Finally, Pee-Wee-Reese is an instance of Brooklyn-Dodgers. That makes him also, by transitivity up isa
links, an instance of ML-Baseball-Player. But recall that in earlier example we also used the class
Fielder, to which we attached the fact that fielders have above average batting averages. To allow
that here, we simply make Pee Wee an instance of Fielder as well. He will thus inherit properties
13. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 151
from both Brooklyn-Dodgers and from fielder, as well as from the classes above these. We need
to guarantee that when multiple inheritance occurs, as it does here, that it works correctly.
Specified in this case, we need to assure that batting – average gets inherited from Fielder and
not from ML-Baseball-Player through Brooklyn-Dodgers.
In all the frame system we illustrate, all classes are instances of the metaclass Class. As a result,
they all have the attribute cardinality out of our description of our examples, though unless there
is some particular reason to include them.
Every class is a set. But not every set should be described as a class. A class describes a set of
entries that share significant properties. In particular, the default information associated with a
class can be used as a basis for inferring values for the properties if it’s in individual elements. So
there is an advantage to representing as a class these sets for which membership serves as a
basis for nonmonotonic inheritance. Typically, these are sets in which membership is not highly
ephemeral. Instead, membership is on some fundamental structural or functional properties. To
see the difference, consider the following sets:
• People
• People who are major league baseball players
• People who are on my plane to New York
The first two sets can be advantageously represented as classes, with which a sub-statistical
number of inheritable attributes can be associated. The last, though, is different. The only
properties that all the elements of that set probably share are the definition of the set itself and
some other properties that follow from the definition (e.g. they are being transported from one
place to another). A simple set, with some associated assertions, is adequate to represent these
facts: non-monotonic inheritance is not necessary.
Other Ways of Relating Classes to Each Other
We have talked up to this point about two ways in which classes(sets) can be related to each
other. Class1 can be a subset of Class2. Or, if Class2 is a metaclass, then Class1 can be an
instance of Class2. But there are other ways that classes can be related to each other,
corresponding to ways that sets of objects in the world can be related.
One such relationship is mutually– disjoint– with, which relates a class to one or more other classes
that are guaranteed to have no elements in common with it. Another important relationship is is-
14. 152 ARTIFICIAL INTELLIGENCE
covered-by, which relates class to a set of subclasses, the man of which is equal to it. If a class is
covered by a set S of mutually disjoint classes, then S is called a partition of the class.
For examples of these relationships, consider the classes shown in Figure 6.7, which represent
two orthogonal ways of decomposing the class of major league baseball players. Everyone is
either a pitcher a catcher, or a fielder (and no one is more than one if these). In addition,
everyone plays in either the Normal League or the American League but not both.
Top
Slots Exceptions
We have provided a way to describe sets of objects and individual objects, both in terms of
attributes and values. Thus we have made extensive use of attributes, which we have
represented as slots attached to frames. But it turns out that there are several means why we
would like to be able to represent attributes explicitly and describe their properties. Some of the
properties we would like to be able to represent and use in meaning include:
• The classes to which the attribute can be attached, i.e. for what classes does it make
sense? For example, weight makes sense for physical objects but not for conceptual ones
(except in some metaphorical sense).
• A value that all instances of a class must have by the definition of the class.
• A default value for the attribute.
• Rules for inheriting values for the attribute. The usual rule is to inherit down isa and
instance links. But some attributes inherit in other ways. For example last-name inherits
down the child of link.
ML-Baseball-Player
is-covered by : { Pitcher,Catcher, Fielder}
{ American-Leaguer , National-Leagwer}
Pitcher
isa : ML-Baseball –Player
mutually – disjoint with: { catcher, Fielder}
15. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 153
Catcher
isa : ML-Baseball – Player
mutually-disjoint –with : {Pitcher, Fielder}
Fielder
isa : ML-Baseball Player
mutually –disjoint-with : { Pitcher, Catcher}
American – Leaguer
isa : ML-Baseball –Player
mutually-disjoint-with { National-Leaguer }
National Leaguer
isa : ML-Baseball-Player
mutually-disjoint-with : {american-Leaguer}
Three-Finger-Brown
instance : Pitcher
instance : National – Leaguer
Figure 6.8 : Representing Relationships among Classes
• Rules for computing a value separately from inheritance. One extreme form of such a rule
is a procedure written in some procedural programming language such as LISP.
• An inverse attribute.
• Whether the slot is single – valued or multivalued.
In order to be able to represent these attributes of attributes, we need to describe attributes (slots)
as frames. These frames will be organized into an isa hierarchy, just as any other frames for
attributes of slots. Before we can describe such a hierarchy in detail, we need to formalize our
notion of a slot.
A slot is a relation. It maps from elements of its domain (the classes for which it makes sense) to
elements of its range (its possible values). A relation is a set of ordered pair. Thus it makes sense
to say that one relation (R1) is a subset of another (R2). In the case, R1 is a specification of R2, so
in our terminology is a (R1, R2). Since a slot is yet the set of all slots, which we will call Slot, is a
metaclass. Its instances are slots, which may have subslots.
Figure 6.8 and 6.9 illustrate several examples of slots represented as frames of slot metaclass. Its
instances are slots (each of which is a set of ordered pairs). Associated with the metaclass are
attributes that each instance(i.e. each actual slot) will inherit. Each slot, since it is a relation, has a
domain and a range. We represent the domain in the slot labelled domain. We break up the
representation of the range into two parts: contains logical expressions that further constrain the
range to be TRUE. The advantage to breaking the description apart into these two pieces is that
16. 154 ARTIFICIAL INTELLIGENCE
type checking a such cheaper than is arbitrary constraint checking, so it is useful to be able to do
it separately and early during some reasoning processes.
The other slots do what you would expect from their names. If there is a value for definition, it
must be propagated to all instances of the slot. If there is a value for default, that value is
inherited to all instance of the slot unless there is an overriding value. The attribute transfers lists
other slots from which values for this slot can be derived through inheritance. The to-compute slot
contains a procedure for deriving its value. Inverse, sometimes they are not useful enough in
reasoning to be worth representing. And single valued is used to mark the special cases in which
the slot is a function and can have only one value.
Of course, there is a no advantage of representing these properties of slots if there is a reasoning
mechanism that exploits them. In the rest of our discussion we assume for the frame system
interpreter knows how to reason with all of these slots of slots as part of its built-in reasoning
capability. In particular, we assume that it is capable of forming the following reasoning actions.
• Consistency checking to verify that when a slot value is added to a frame
- The slot makes sense for the frame. This relies on the domain attribute of the slot.
Slot
isa : Class
instance : Class
domain :
range :
range-constraint :
definition :
default :
17. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 155
trangulars-through :
to-compute :
inverse :
single-valued :
manager
instance : Slot
domain : ML-Baseball –Team
range : Person
range-constraint : kx (baseball-experience x, manager)
default :
inverse : manager – of
single – valued : TRUE
Figure 6. 9 ; Representing Slots as Frames , I
My – manager
instance : Slot
domain : ML-Baseball Player
range : Person
range-constraint : kx(baseball-experience x any
manager)
to-compute : kx( x,team), manager
single-valued : TRUE
Colour
instance : Slot
domain : Physical-Object
range : Colour-Set
transfer-through : top-level-part-of
visual-salience : High
single-valued : FALSE
Uniform-colour
instance : Slot
isa : colour
domain : team – Player
range : Colour – Set
range-constraint : non-Pink
visual-sailence : High
single-valued : FALSE
Bats
instance : Slot
domain : ML-Baseball-Player
range : ( Left,Right, Switch)
to-compute : kx x, handed
single-valued : TRUE
Figure 6.10 : Representing Slots as Frames II
- The value is a legal value for the slot. This relies on the range and range –
constraint attributes.
• Maintenance of consistency between the values for slots and their inverses whenever one
is updated.
• Propagation of definition values along isa and instance links.
18. 156 ARTIFICIAL INTELLIGENCE
• Inheritance of default values along isa and instance links.
• Computation of a value of a slot as needed. This relies on the to-compute and transfers
through attributes.
• Checking that only a single value is asserted for single –valued slots. Replacing an old
value by the new one when it is asserted usually does this. An alternative is to force
explicit retraction of the old value and to signal a contradiction if a new value is asserted
when another is already there.
There is something slightly counterintuitive about this way of defining slots. We have defined
properties range – constraint and default as parts of a slot. But we then think of them as being
properties of a slot associated with a particular class. For example in Figure 6.11, we listed two
defaults for the batting – average slot, one associated with major league baseball players and one
associated with fielders. Figure 6.10 shows how this can be represented correctly by creating a
specialization of batting average that can he associated with a specialization of ML-Baseball-
Player to represent the more special information that is known about the specialized class. This
seems cumbersome. It is natural, though given our definition of a slot as relation. There are really
two relations here, one a specialization of the other. And below we will define inheritance so that it
looks for values of either the slot it is given or any of that slot’s generations.
Unfortunately, although this model of slots is simple and it is internally consisted it is not easy to
see. So we introduce some notational shorthand that allows the four most important properties of
a slot (domain range definition and default) to be defined implicitly by how the slot is used in the
definitions of the classes in its domain. We describe the domain implicitly to be the class where
the slot appears. We describe the range and any range constraints with the clause MUST BE, as
the value of an inherited slot. Figure 6.12 shows an example of this notation. And we describe the
definition and the default. If they are present by inserting them as the value of the slot when one
appears. The two will be distinguished by perplexing a definitional value with an assts (“). We
then let the underlying book keeping of the frame system create the frames to represent slots as
they are needed.
Now let’s look at examples of how these slots can be used. The slots bats and my manager
illustrate the use of the to-compute attribute of a slot. The variable x will be bound to the frame to
which the slot is attached. We are the notation to specify the value of a slot of a frame. Specially,
x, y describes the value (s) of the y slot it frame x. So we know that to compute a frame a value
for any manager, it is necessary find the frame’s value for team, then find the resulting team’s
manager. We have simply composed two slots to form a new one. Computing the value of the
bats slots is a even simpler. Just go get the value of the hand slot.
Batting average
instance : Slot
domain : ML-Baseball Player
range : Number
range-constraint : kx( 0 < x range-constraint < 1 )
default : 252
19. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 157
single-valued : TRUE
Fielder batting average
instance : Slot
isa : batting-average
domain : Fielder
range : Number
range-constraint : kx 9o < x,range – constraint < 1)
default : 262
single-valued : TRUE
Figure 6.11 Associating Defaults with Slots
ML-Baseball-Player
Bats : MUST BE (Left, Right, Switch)
Figure 6.12. A Shorthand Notation For Slot – Range Specification
The manager slots illustrate the use of a range constraint. It is stated in terms of a variable x,
which is bound to the frame whose manager slot is being described. It requires that any manager
be not only a person but someone with baseball experience relies on the domain-specific
function baseball experience, which must be defined somewhere in the system.
The slots colour and uniform–colour illustrate the arrangements of slots in is history. The relation
colour is a fairly general one that holds between physical objects colour. The attribute uniform-
colour is a restricted form of colour that applies only to team players and ones that are allowed for
team uniform (anything but pink). Arranging slots in a hierarchy is useful for the same reason than
arranging any thing else in a hierarchy is, it supports inheritance. In this example the general slot
is known to have high visual salience. The more specific slot uniform colour then tests this
property, so it too is known to have high visual salience.
The slot colour also illustrates the use of the transfer-through slot, which defines a way of
computing a slot’s value by retrieving it from the same slot of a related object as its example. We
used transfers through to capture the fact that if you take an object and chop it up into several top
level parts (in other words, parts that are not contained for each other) then they will all be the
same colour. For example, the arm of a sofa is the colour as the sofa. Formally what transfers
through means in this example is
John
Height : 72
Bill
Height :
Figure 6.13 Representing Slot-Values
20. 158 ARTIFICIAL INTELLIGENCE
color(x,y) ∧ top-level–part-of( z,x)→ color(z,y)
In addition to these domain independent slot attributes slots may have domain specific properties
that support problem solving in a particular domain. Since these slots are not treated explicity by
the frame system interpreter, they will be useful precisely to the extent that the domain problem
solver exploits them.
Top
Slot – Values as Objects
We have already retired with the notion of a slot by making it an explicit object that we could
make assertions about. In some scene this was not necessary. A finite relation can be completely
described by listing its elements. But in practical knowledge based system one often does not
have that list. So, it can be very important to be able to make assertions about the list without
knowing all of its elements. Rectification gave us a way to do this.
The next step along this path is to do the same thing to a particular attribute-value (an instance of
a relation) that we did to the relation itself. We can rectify it and make it an object about which
assertion can be made. To see why we might want to do this let us return to the example of John
and Bill’s height that we discussed in previous section. Figure 6.13 shows a frame-based
representation of some of the facts. We could easily record Bill’s height if we knew it. Suppose
that we do not know it. All we know is that John is taller than Bill. We need a way to make an
assertion about the and its value as an object.
We could attempt to do this the same way we made slots themselves into object namely by
representing them explicitly as frames. There seems little advantages to doing that in this case,
though, because the main advantage of frames does not apply to slot values, frames are
organized into an in hierarchy and thus support inheritance. There is no basis for such an
organization of slot values. So instead we augment our value representation language to allow
the value of a slot to be stated as either or both:
• A value of the type required by the slot.
• A logical constraint on the value. This constraint may relate the slot’s value to the values
of other slots or to domain constants.
John
Height : 72 : kx ( x , height > Bill height)
Bill
21. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 159
Height : kx ( x, height < John height)
Figure 6.14: Representating Slot – Values with Lambda Notation
If we do this to the frames of Figure 6.13 then we get the frames of Figure 6.14. We again use the
lambda notation as a way to pick up the name of the frame that is being described.
Student Activity 6.2
Before reading next section, answer the following questions.
1. Describe the database of a cricket team using frames.
2. What is the difference between frames and meta-frames?
3. Write short note on slot exceptions.
If your answers are correct, then proceed to the next section.
Top
Handling Uncertainties
It was implicitly assumed that a sufficient amount of reliable knowledge (facts, rules, and the like)
was available to defend confident conclusions. While this form of reasoning is important, it suffers
from several limitations.
• It is not possible to describe many envisioned or real-world concepts; that is, it is limited in
expressive power.
• There is no way to express uncertain, imprecise, hypothetical or vague knowledge, or the
truth or falsity of such statements.
• Available inference methods are known to be inefficient.
• There is no way to produce new knowledge about the world. It is only possible to add what
is derivable from the axioms and theorems in the knowledge base.
In other words, strict classical logic formalisms do not provide realistic representations of the
world in which we live. On the contrary, intelligent beings are continuously required to make
decisions under a veil of uncertainty.
22. 160 ARTIFICIAL INTELLIGENCE
Uncertainty can arise from a variety of sources. For one thing, the information we have available
may be incomplete or highly volatile. Important facts and details that have bearing on the
problems at hand may be missing or may change rapidly. In addition, many of the “facts’ available
may be imprecise, vague, or fuzzy Indeed, some of the available information may be
contradictory or even unbelievable. However, despite these shortcomings, we human
miraculously deal with uncertainties on a daily basis and usually arrive at reasonable solutions. If
it were otherwise, we would not be able to cope with the continually changing situations of our
world.
In this and the following chapter, we shall discuss methods with which to accurately represent and
deal with different forms of inconsistency, uncertainty, possibility, and beliefs. In other words, we
shall be interested in representations and inference methods related to what is known as
commonsense reasoning.
Consider the following real-life situation. Timothy enjoys shopping in the mall only when the
stores are not crowded. He has agreed to accompany Sue there on the following Friday evening
since this is normally a time when few people shop. Before the given date, several of the larger
stores announce a one time, special sale starting on that Friday evening. Timothy, fearing large
crowds, now retracts the offer to accompany Sue, promising to go on some future date. On the
Thursday before the sale was to commence, weather forecasts predicted heavy snow. Now,
believing the weather would discourage most shoppers. Timothy once again agreed to join Sue.
But, unexpectedly, on the given Friday, the forecasts proved to be false, so Timothy once again
declined to go.
This anecdote illustrates how one’s belief’s can change in a dynamic environment. And, while
one’s beliefs may not fluctuate as much as Timothy’s in most situations, this form of belief
revision is not uncommon. Indeed it is common enough that we label it as a form of
commonsense reasoning, that is, reasoning with uncertain knowledge.
Non-monotonic Reasoning
The logics we studied in the previous chapter are known as monotonic logics. The conclusions
derived using such logics are valid deduction, and they remain so. Adding new axioms increases
the amount of knowledge contained in the knowledge base. Therefore, the set of facts and
inferences in such systems can only grow larger, they cannot be reduced. That is they increase
monotonically. The form of reasoning performed above by Timothy, on the other hand is non-
monotonic. New facts became known which contradicted and invalidated old knowledge was
23. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 161
retracted causing other dependent knowledge to become invalid, thereby requiring further
retractions. The retractions led to a shrinkage or non-monotonic growth in the knowledge at times.
More formally, let KBL be a formal first order system consisting of a knowledge base and some
logic L. Then, if KB1 and KB2 are knowledge bases where
KB1 = KBL
KB2 = KBL ∪ F, for some wff F, then
KB1 ⊂ KB2
In other words, a first order KB system can only grow monotonically with added knowledge.
When building knowledge-based systems, it is not reasonable to expect that all the knowledge
needed for a set of tasks could be acquired, validated, and loaded into the system at the outset.
More typically, the initial knowledge will be incomplete, contain redundancies, inconsistencies,
and other sources of uncertainty. Even if it were possible to assemble complete, valid knowledge
initially, it probably would not remain valid forever, not in a continually changing environment.
In an attempt to model real-world, commonsense reasoning, researchers have proposed
extensions and alternatives to traditional logics such as PL and FOPL.
The extensions accommodate different forms of uncertainty and non-monotony. In some cases,
the proposed methods have been implemented. In other cases they are still topics of research. In
this and the following chapter, we will examine some of the more important of these methods.
We begin in the next section with a description of truth maintenance
systems (TMS), systems that have been implemented to permit a form of non-monotonic
reasoning by permitting the addition of changing (even contradictory) statements to a knowledge
base. This is followed in further part. By a description of other methods that accommodate non-
monotonic reasoning through default assumptions for incomplete knowledge bases. The
assumptions are plausible most of the time, but may have to be retracted if other conflicting facts
are learned. Methods to constrain the knowledge that must be considered for a given problem are
considered next. These methods also relate to non-monotonic reasoning. Section 5.4 gives a
brief treatment of modal and temporal logics, which extend the expressive power of classical
logics to permit representations and reasoning about necessary and possible situations, temporal,
and other related situations. The chapter is concluded with a brief presentation of a relatively new
method for dealing with vague and imprecise information, namely fuzzy logic and language
computation.
24. 162 ARTIFICIAL INTELLIGENCE
Truth Maintenance Systems
Truth maintenance systems (also known as belief revision or revision maintenance systems) are
companion components to inference systems. The main job of the TMS is to maintain consistency
of the knowledge being used by the problem solver and not to perform any inference functions.
As such, it frees the problem solver from any concerns of consistency and allows it to concentrate
on the problem solution aspects. The TMS also gives the inference component the latitude to
perform non-mononic inferences. When new discoveries are made, this more recent information
can displace previous conclusions that are no longer valid. In this way, the set of beliefs available
to the problem solver will continue to be current and consistent.
Figure 6.15 illustrates the role played by the TMS as part of the problem solver. The inference
engine (IE) solves domain problems based on its current belief set, while the TMS maintains the
currently active belief set. The updating process is incremental. After each inference, information
is exchanged between the two components. The IE tells the, TMS what deductions it has
made. The TMS, in turn, asks questions about current beliefs and reasons for failures. It
maintains a consistent set of beliefs for the IE to work with even if new knowledge is added and
removed.
For example, suppose the knowledge base (KB) contained only the propositions P, P → Q, and
modus pones. From this, the IE would rightfully conclude Q and add this conclusion to the KB.
Later, if it was learned that ~P was appropriate, it would be added to the KB resulting in a
contradiction. Consequently, it would be necessary to remove P to eliminate the inconsistency.
But, with P now removed, Q is no longer a justified belief. It too should be removed. This type of
belief revision is the job of the TMS.
Actually, the TMS does not discard conclusions like Q as suggested. That could be wasteful,
since P may again become valid, which would require that Q and facts justified by Q be rederived.
Instead, the TMS maintains dependency records for all such conclusions. These records
determine which sets of beliefs are current (which are to be used by the IE). Thus, Q would be
removed from the current belief set by making appropriate updates to the records and not by
erasing Q. Since Q would not be lost, its reservation would not be necessary if P became valid
once again.
The TMS maintains complete records of reasons or justifications for beliefs. Each proposition or
statement having at least one valid justification is made a part of the current belief set.
Statements lacking acceptable justifications are excluded from this set. When a contradiction is
25. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 163
discovered, the statements responsible for the contradiction are identified and an appropriate one
is retracted. This in turn may result in other retractions and additions. The procedure used to
perform this process is called dependency-directed backtracking. This process is described later.
Figure 6.15: Architecture of the problem solver with a TMS
The TMS maintains records to reflect retractions and additions so that the IE will always know its
current belief set. The records are maintained in the network. The nodes in the network represent
KB entries such as premises, conclusions, inference rules, and the like. Attached to the nodes are
justifications that represent the inference steps from which the node was derived. Nodes in the
belief set must have valid justifications. A premise is a fundamental belief that is assumed to be
always true. Premises need no justifications. They form a base from which all other currently
active nodes can be explained in terms of valid justifications.
There are two types of justification records maintained for nodes, support lists (SL) and
conceptual dependencies (CD). SLs are the most common type. They provide the supporting
justifications for nodes. The data structure used for the SL contains two lists of other dependent
node names, an in-list and an out-lists. It has the form
(SL < in-list> < out-list>)
In order for a node to be active and hence, labelled as IN the belief set, its SL must have at in
least one valid node 1 in its inlist, and all nodes named in its outlist, if any, must be marked OUT
of the belief set. For example, a current belief set that represents Cybil as a nonflying bird (an
ostrich) might have the nodes and justifications listed in Table 6.1.
26. 164 ARTIFICIAL INTELLIGENCE
Each IN – node given in Table 6.1 is part of the current belief set. Nodes n1 and n5 are premises.
They have empty support lists since they do not require justifications. Node n2 the belief that
Cybil can fly is out because n3, a valid node, is in the out-list of n2,
Suppose it is discovered that Cybil is not an ostrich, thereby causing n5 to be retracted (marking
its status as out). n1 which depends on n5, must also be retracted. This in turn, changes the
status of n2 to be a justified node. The resultant belief set is now that the bird Cybil can fly.
To represent a belief network, the symbol conventions shown in Figure 6.16 are sometimes used.
The meanings of the nodes shown in the figure are (1) a premises is a true proposition requiring
no justifications, (2) an assumption is a current belief that could change, (3) a datum is either a
currently assumed or 1E derived belief and (4) justifications are the beliefs (nodes) supports,
consisting of supporting antecedent node links and a consequent node link.
Table 6.1 : Example Nodes In A Dependency Network
Node Status Meaning Support list Comments
n1 IN Cybil is a bird (SL ( ) ( )) a premise
n2 OUT Cybil can fly (SL (n1)(n3)) unjustified belief
n3 IN Cybil cannot fly (SL (n5)(n4)) justified belief
n4 OUT Cybil has wings (SL ( ) ( ) ) retracted premise
n5 IN Cybil is an Ostrich (SL ( ) ( ) ) a premise
Justifications
Figure 6.16: Belief network node meanings
An example of a typical network representation is given Figure 6. 17. The nodes T. U. and W are
OUT. If the node labelled P is made IN for some reason, the TMS would update the network by
propagating the “INness” support provided by node P to make T and W IN
As noted earlier, when a contradiction is discovered, the TMS locates the source of the
contradiction and corrects it by retracting one of the contributing sources. It does this by checking
27. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 165
the support lists of the contradictory node and going directly to the source of the contradiction. It
goes directly to the source by examining the dependency structure supporting the justification and
determining the offending nodes. This is in contrast to the native backtracking approach that
would search a deduction tree sequentially, node by node until the contradictory node is reached.
Backtracking directly to the node causing the contradiction is known as dependency – directed
backtracking (DDB). This is clearly a more efficient search strategy than chronological
backtracking.
(a)
Figure 6.17: Typical Fragment of a Belief Network
Top
Probabilistic Reasoning
Here we will examine methods that use probabilistic representations for all knowledge and which
reason by propagating the uncertainties from evidence and assertions to conclusions. As before,
the uncertainties can arise from an inability to predict outcomes due to unreliable, vague,
incomplete, or inconsistent knowledge.
The probability of an uncertain event A is a measure of the degree of likelihood of occurrence of
that event. The set of all possible events is called the sample space; S A probability measure is a
function P(A) which maps event outcome E1 , E2 ,..... from S into real numbers and which
satisfies the following axioms of probability:
1. 0 ≤ P( A ≤ 1 for any event A⊆ S.
)
2. P( S ) =1, a certain outcome
28. 166 ARTIFICIAL INTELLIGENCE
∩Ej = Φ
3. For Ei ,
for all i ≠j
(the Ei are mutually exclusive),
P( Ei ∪ E2 ∪ E3 ∪ ... ) = P( E2 ) + P( E3 ) + ...
From these three axioms and the rules of set theory, the basic law of probability can be derived.
Of course, the axioms are not sufficient to compute the probability of an outcome. That requires
an understanding of the underlying distributions that must be established through one of the
following approaches:
1. Use of a theoretical argument that accurately characterizes the processes.
2. Using one’s familiarity and understanding of the basic processes to assign subjective
probabilities, or
3. Collecting experimental data from which statistical estimates of the underlying distributions
can be made.
Since much of the knowledge we deal with is uncertain in nature, a number of our beliefs must be
tenuous. Our conclusions are often based on available evidence and past experience, which is
often far from complete. The conclusions are, therefore, no more than educated guesses. In a
great many situations it is possible to obtain only partial knowledge concerning the possible
outcome of some event. But, given that knowledge, one’s ability to predict the outcome is
certainly better than with no knowledge at all. We manage quite well in drawing plausible
conclusions from incomplete knowledge and past experiences.
Probabilistic reasoning is sometimes used when outcomes are unpredictable. For example, when
a physician examines a patient, the patient’s history, symptoms, and test results provide some,
but not conclusive, evidence of possible ailments. This knowledge, together with the physician’s
experience with previous patients, improves the likelihood of predicting the unknown (disease)
event, but there is still much uncertainty in most diagnoses. Likewise, weather forecasters “guess”
at tomorrow’s weather based on available evidence such as temperature, humidity, barometric
pressure, and cloud coverage observations. The physical relationships that overrun these
phenomena are not fully understood; so predictability is far from certain. Even a business
manager must make decisions based on uncertain predictions when the market for a new product
is considered. Many interacting factors influence the market, including the target consumer’s
lifestyle, population growth, potential consumer income, the general economic climate, and many
other dependent factors.
29. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 167
In all of the above cases, the level of confidence placed in the hypothesized conclusions is
dependent on the availability of reliable knowledge and the experience of the human
prognosticator. Our objective in this chapter is to describe some approaches taken in AI systems
to deal with reasoning under similar types of uncertain conditions.
Bayesian Probabilistic Inference
The form of probabilistic reasoning described in this section is based on the Bayesian method
introduced by the clergyman Beyes in the eighteenth century. This form of reasoning depends on
the use of conditional probabilities of specified events when it is known that other events have
occurred. For two events H and E with the probability P(E) > 0 , the conditional probability of
event H, given that event E has occurred, is defined as
P( H / E) = P( H & E) / P( E) (6.1)
This expression can be given a frequency interpretation by considering a random experiment
which is repeated a large number of times, n. The number of occurrences of the event E, say No.
( E) , and of the joint even H and E, No. ( H & E) , are recorded and their relative frequencies rf
computed as
No. ( A& E) No. ( E)
rf ( H & E) = rf ( E) = (6.2)
n n
When n is large, the two expressions (6.2) approach the corresponding probabilities respectively,
and the ratio
rf ( H & E) / rf ( E) = P( H & E) / P( E)
then represents the proportion of times event H occurs relative to the occurrence of E, that is, the
approximate conditional occurrence of H with E.
The conditional probability of event E given the event H occurred can likewise be written as
P( E / H ) = P( H & E) / P( H ) (6.3)
Solving 6.3 for P( H & E) and substituting this in equation 6.1 we obtain one form of Bayes Rule
P( H / E) = P( E / H ) P( H ) / P( E) (6.4)
This equation expresses the notion that the probability of event H occurring when it is known
that even E occurred is the same as the probability that E occurs when it is known that H
30. 168 ARTIFICIAL INTELLIGENCE
occurred, multiplied by the ratio of the probabilities of the two events H and E occurring. As an
example of the use of equation 6.4, consider the problem of determining the probability that a
patient has a certain disease D1, given that a symptom E was observed. We wish to find
P( D 1 / E).
Suppose now it is known from previous experience that the prior (unconditional) probabilities
P(D1) and P(E) for randomly chosen patients are P(D1) = 0.05, and P(E) = 0.15, respectively.
Also, we assume that the conditional probability of the observed symptom given that a patient has
disease D1 is known from experience to be P(E| D1) = 0.95. Then, we easily determine the value
of P(D1| E) as
P(D1| E) =P (E|D1)P(D1) /P(E) = (0.95 x 0.05) / 0.15 =0.32
It may be the case that the probability P(E) is difficult to obtain. If that is the case, a different
−
form of Bayes Rule may be used. To see this, we write equation 6.4 with H substituted in place
of H to obtain
P( E| − H ) P ( − H )
P( −
H| E) =
P( E)
Next, we divide equation 6.4 by this result to eliminate P(E) and get
P( H| E) P( E H ) P ( H )
|
=
P( H| E) P(E − H )P ( H )
−
| (6.5)
Note that equation 6.5 has two terms that are ratios of a probability of an event to the probability
- -
of its negation, P(H | E) / P( H | E) and P (H) / P( H). The ratio of the probability of an event E
divided by the probability of its negation is called the odds of the event and is denoted as O (E).
-
The remaining ratio P(E | H) / P(E | H) in equation 6.5 is known as the likelihood ratio off with
respect to H. We denote this quantity by L(E|H). Using these two new terms, the odds-likelihood
form of Bayes' Rule for equation 6.5 may be written as
O(H | E) = L(E | H ) . O(H)
This form of Bayes Rule suggests how to compute the posterior odds 0(HE) from the prior odds
on H, 0(H). That value is proportional to the likelihood L(E | H). When L(E | H) is equal to one, the
knowledge that E is true has no effect on the odds of H. Values of L(E | H) less than or greater
than one decrease or increase the odds correspondingly. When L(E | H) cannot be computed,
31. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 169
estimates may still be made by an expert having some knowledge of H and E. Estimating the ratio
rather than the individual probabilities appears to be easier for individuals in such cases. This is
sometimes done when developing expert systems where more reliable probabilities are not
available.
In the example cited above, D1 is either true or false, and P(D1 | E) is the interpretation which
assigns a measure of confidence that D1 is true when it is known that E is true. There is a
similarity between E, P (D1 | E) and modus pones. For example, when E is known to be true and
D1 and E are known to be related, one concludes the truth of D1 with a confidence level P(D1 |
E).
One might wonder if it would not be simpler to assign probabilities to as many ground atoms E 1,
E2. . . , Ekas possible, and compute inferred probabilities (probabilities of Ej | H and H) directly
from these. The answer is that in general this is not possible. To compute an inferred probability
requires knowledge of the joint distributions of the ground predicates participating in the
inference. From the joint distributions, the required marginal distributions may then be computed.
The distributions and computations required for that approach are, in general, much more
complex than the computations involving the use of Bayes Rule.
- - -
Consider now two events A and A which are mutually exclusive (A ∩ A = φ) and exhaustive (A∪
A) = S. The probability of an arbitrary event B can always be expressed as
P(B) = P(B & A) + P(B & -A) = P(B|A)P(A) + P(B|-A)P(-A)
Using this result, equation 6.4 can be written as
P(H|E) = P(E|H)P(H) / [P(E|H)P(H) + P(E|-H)P(-H)] (6.6)
Equation 6.6 can be generalized for an arbitrary number of hypotheses Hi, i = 1,..., k. Thus,
suppose the Hi partition the universe; that is, the Hi are mutually exclusive and exhaustive. Then
for any evidence E, we have
k K
P( E) = ∑ P( E & H i ) = ∑ P( E| H i )
i=1 i=1
and hence,
32. 170 ARTIFICIAL INTELLIGENCE
P( E| H i ) P( H i )
P( H i / E) =
∑ P( E| H j )P( H j )
k
(6.7)
j =1
Finally, to be more realistic and to accommodate multiple sources of evidence E 1, E2,. . . , Em, we
generalize equation 6.7 further to obtain
P( E1 , E2 ,..., Em | H i ) P( H i )
P( H i + E1 , E2 ,..., Em ) =
∑ P( E1 , E2 ..... Em | H j ) P( H j )
k
(6.8)
j =1
If there are several plausible hypotheses and a number of evidence sources, equation 6.8 can be
fairly complex to compute. This is one of the serious drawbacks of the Bayesian approach. A
large number of probabilities must be known in advance in order to apply an equation such as
6.8. If there were k hypotheses, H i, and m sources of evidence E j, then k + m prior probabilities
must be known in addition to the k likelihood probabilities. The real question then is where does
one obtain such a large number of reliable probabilities?
To simplify equation 6.8, it is sometimes assumed that the E, are statistically independent. In that
case, the numerator and denominator probability terms (
P E1 , E2 , ..... Em H j ) factor into
(
P Ei H j )P(E
2 Hk )....... (
P Em H j )
resulting in a somewhat simpler form. But, even though the computations are straight- forward,
the number of probabilities required in a moderately large system can still be prohibitive, and one
may be forced to simply use subjective probabilities when more reliable values are not available.
Furthermore, the Ej are almost never completely independent. Consequently, this assumption
may introduce intolerable errors.
The formulas presented above suggest how probabilistic evidence would be combined to produce
a likelihood estimate for any given hypothesis. When a number of individual hypotheses are
possible and several sources of evidence are available, it may be necessary to compute two or
more alternative probabilities and select among them. This may mean that none, one, or possibly
more than one of the hypotheses could be chosen. Normally, the one having the largest
probability of occurrence would be selected, provided no other values were close. Before
accepting such a choice, however, it may be desirable to require that the value exceed some
threshold to avoid selecting weakly supported hypotheses. In previous section we had described
33. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 171
a typical system that combines similar values and chooses only those conclusions that exceed a
threshold of 0.2.
Bayesian Networks
Network representations of knowledge have been used to graphically exhibit the
interdependencies that exist between related pieces of knowledge. Much work has been done in
this area to develop a formal syntax and semantics for such representations. When we consider
associative networks and conceptual graphs. Here, however, we are more interested in network
representations that depict the degrees of belief of propositions and the causal dependencies that
exist between them. Inferencing in a network amounts to probabilities the probabilities of given
and related information through the network to one or more conclusion nodes.
Network representations for uncertain dependencies are further motivated by observations made
earlier. If we wish to represent uncertain knowledge related to a set of prepositional variables
n
x1,. . ., xn by their joint distribution P(x1, ...xn), it will require some 2 entries to store the distribution
explicitly. Furthermore, a determination of any of the marginal probabilities x, requires summing
P(x1,…..xn) over the remaining n – 1 variables. Clearly, the time and storage requirements for
such computations quickly become impractical. Inferring with such large numbers of probabilities
does not appear to model the human process either. On the contrary, humans tend to single out
only a few propositions that are known to be causally linked when reasoning with uncertain
beliefs. This metaphor leads quite naturally to a form of network representation.
One useful way to portray the problem domain is with a network of nodes that represent
propostional variables x1, connected by arcs that represent causal influences or dependencies
among the nodes. The strengths of the influences are quantified by conditional probabilities of
each variable. For example, to represent causal relationships between the prepositional variables
x1…….x6 as illustrated in Figure 6.18, one can write the joint probability P(x1…….x6) by inspection as a
product of (chain) conditional probabilities
P( x 1 .......... .. x 6 ) = P( x 6 | x 5 ) P( x 5 | x 2 x 3 ) P( x 4 | x 1, x 2 ) P( x 3 | x 1 ) P( x 2 | x 1 ) P( x 1 )
34. 172 ARTIFICIAL INTELLIGENCE
Figure 6.18: Example of Bayesian belief Network
Once such a network is constructed, an inference engine can use it to maintain and propagate
beliefs. When new information is received, the effects can be propagated throughout the network
until equilibrium probabilities are reached. Pearl (1986, 1987) has proposed simplified methods
for updating networks (trees and, more generally, graphs) of this type by fusing and propagating
the effects of new evidence and beliefs such that equilibrium is reached in time proportional to the
longest path through the network. At equilibrium, all propositions will have consistent probability
assignments. Methods for graphs are more difficult. They require the use of dummy variables to
transform them to equivalent tree structures that are then easier to work with.
To use the type of probabilistic inference we have been considering, it is first necessary to assign
probabilities to all basic facts in the knowledge base. This requires the definition of an appropriate
sample space and the assignment of a priori and conditional probabilities. In addition, some
method must be chosen to compute the combined probabilities when pooling evidence in a
sequence of inference steps (such as Pearl's method)is employed. Finally, when the outcome of
an inference chain results in one or more proposed conclusions, the alternatives must be
compared, and one or more selected on the basis of its likelihood.
Top
Use of Certainty factor
MYCIN uses measures of both belief and disbelief to represent degrees of confirmation and
disconfirmation respectively in a given hypothesis. The basic measure of belief, denoted by
MBH , E), is actually a measure of increased belief in hypothesis H due to the evidence E. This
(
is roughly equivalent to the estimated increase in probability of P( H / E) over P( H ) given by an
expert as a result of the knowledge gained by E. A value of 0 corresponds to no increase in belief
35. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 173
and 1 corresponds to maximum increase or absolute belief. Likewise, MD( H , E) is a measure of
the increased disbelief in hypothesis H due to evidence E. MD ranges from 0 to +1 with +1
representing maximum increase in disbelief, (total disbelief) and 0 representing no increase. In
both measures, the evidence E may be absent or may be replaced with another hypothesis,
( )
MBH 1 , H 2 . This represents the increased belief in H 1 given H 2 is true.
In an attempt to formalize the uncertainty measure in MYCIN, definitions of MB and MD have
been given in terms of prior and conditional probabilities. It should be remembered, however, the
actual values are often subjective probability estimates provided by a physician. We have for the
definitions.
1 If P( H ) = 1
MBH , E) =
( max P( , [
H E) P H ) − (H )
( P
1 − (H )P
otherwise (6.11)
1 If P( H ) = 1
min [ P( H| E) , P( H ) ] − P( H )
MBH , E) =
( otherwise
0 = P( H )
(6.12)
Note that when 0 < P( H ) < 1 , and E and H are independent (So P(H E) = P( H ), then MB = MD
= 0. This would be the case if E provided no useful information.
The two measures MB and MD are combined into a single measure called the certainty factor (CF),
defined by
CF H , E) = MBH , E) − MD( H , E)
( ( (6.13)
Note that the value of CF ranges from –1 (certain disbelief) to +1 (certain belief). Furthermore, a
value of CF = 0 will result if E neither confirms nor unconfirms H (E and H are independent).
Top
Fuzzy Logic
In the techniques we have discussed so far, we have not modified the mathematical
underpinnings provided by set theory and logic. We have instead augmented those ideas with
36. 174 ARTIFICIAL INTELLIGENCE
additional constructs provided by probability theory. In this section, we take a different approach
and briefly consider what happens if we make fundamental changes to our idea of set
membership and corresponding changes to our definitions of logical operations.
The motivation for fuzzy sets is provided by the need to represent such propositions as:
John is very tall.
Mary is slightly till.
Sue and Linda are close friends.
Exceptions to the rule are nearly impossible.
Most Frenchmen are not very tall.
While traditional set theory defines set membership as a Boolean predicate, fuzzy set theory
allows us to represent set membership as a possibility distribution. Once set membership has
been redefined in this way, it is possible to define a reasoning system based on techniques for
combining distributions, the papers in the journal (Fuzzy Sets and Systems). Such reasoners
have been applied in control systems for devices as diverse as trains and washing machines.
Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the
concept of partial truth -- truth-values between "completely true" and "completely false". Dr. Lotfi
Zadeh of UC/Berkeley introduced it in the 1960's as a means to model the uncertainty of natural
language. Zadeh says that rather than regarding fuzzy theory as a single theory, we should
regard the process of ``fuzzification'' as a methodology to generalize ANY specific theory from a
crisp (discrete) to a continuous (fuzzy) form.
Fuzzy Subsets
Just as there is a strong relationship between Boolean logic and the concept of a subset, there is
a similar strong relationship between fuzzy logic and fuzzy subset theory.
In classical set theory, a subset U of a set S can be defined as a mapping from the elements of S
to the elements of the set {0, 1},
U: S → {0, 1}
This mapping may be represented as a set of ordered pairs, with exactly one ordered pair present
for each element of S. The first element of the ordered pair is an element of the set S, and the
37. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 175
second element is an element of the set {0, 1}. The value zero is used to represent non-
membership, and the value one is used to represent membership. The truth or falsity of the
statement x is in U and is determined by finding the ordered pair whose first element is x.
The statement is true if the second element of the ordered pair is 1, and the statement is false if it
is 0. Similarly, a fuzzy subset F of a set S can be defined as a set of ordered pairs, each with the
first element from S, and the second element from the interval [0,1], with exactly one ordered pair
present for each element of S. This defines a mapping between elements of the set S and values
in the interval [0,1]. The value zero is used to represent complete non-membership, the value
one is used to represent complete membership, and values in between are used to represent
intermediate DEGREES OF MEMBERSHIP. The set S is referred to as the UNIVERSE OF
DISCOURSE for the fuzzy subset F. Frequently, the mapping is described as a function, the
MEMBERSHIP FUNCTION of F. The degree to which the statement x is in F is true is determined
by finding the ordered pair whose first element is x. The DEGREE OF TRUTH of the statement is
the second element of the ordered pair.
In practice, the terms "membership function" and fuzzy subset get used interchangeably.
That's a lot of mathematical baggage, so here's an example. Let's talk about people and
"tallness". In this case the set S (the universe of discourse) is the set of people. Let's define a
fuzzy subset TALL, which will answer the question "to what degree is person x tall?" Zadeh
describes TALL as a LINGUISTIC VARIABLE, which represents our cognitive category of
"tallness". To each person in the universe of discourse, we have to assign a degree of
membership in the fuzzy subset TALL. The easiest way to do this is with a membership function
based on the person's height.
tall(x) = { 0, if height(x) < 5 ft.,
(height(x)-5ft.)/2ft., if 5 ft. <= height (x) <= 7 ft.,
1, if height (x) > 7 ft. }
38. 176 ARTIFICIAL INTELLIGENCE
A graph of this looks like:
1 .0
Ta ll( x )
0 .5
5 .0 7 .0 9 .0
H e ig h t ( f t )
Given this definition, here are some example values:
Person Height degree of tallness
------------------------------------------------------------------
Billy 3' 2" 0.00 [I think]
Yoke 5' 5" 0.21
Drew 5' 9" 0.38
Erik 5' 10" 0.42
Mark 6' 1" 0.54
Kareem 7' 2" 1.00 [depends on who you ask]
Expressions like "A is X" can be interpreted as degrees of truth,
e.g., "Drew is TALL" = 0.38.
39. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 177
Note: Membership functions used in most applications almost never have as simple a shape as
tall(x). At minimum, they tend to be triangles pointing up, and they can be much more complex
than that. Also, the discussion characterizes membership functions as if they always are based
on a single criterion, but this isn't always the case, although it is quite common. One could, for
example, want to have the membership function for TALL depend on both a person's height and
their age (he's tall for his age). This is perfectly legitimate, and occasionally used in practice.
It's referred to as a two-dimensional membership function, or a "fuzzy relation". It's also possible
to have even more criteria, or to have the membership function depend on elements from two
completely different universes of discourse.
Logic Operations
Now that we know what a statement like "X is LOW" means in fuzzy logic, how do we interpret a
statement like X is LOW and Y is HIGH or (not Z is MEDIUM).
The standard definitions in fuzzy logic are:
truth (not x) = 1.0 - truth (x)
truth (x and y) = minimum (truth(x), truth(y))
truth (x or y) = maximum (truth(x), truth(y))
Some researchers in fuzzy logic have explored the use of other interpretations of the AND and
OR operations, but the definition for the NOT operation seems to be safe.
Note that if you plug just the values zero and one into these definitions, you get the same truth
tables as you would expect from conventional Boolean logic. This is known as the EXTENSION
PRINCIPLE, which states that the classical results of Boolean logic are recovered from fuzzy
logic operations when all fuzzy membership grades are restricted to the traditional set {0, 1}. This
effectively establishes fuzzy subsets and logic as a true generalization of classical set theory and
logic. In fact, by this reasoning all crisp (traditional) subsets are fuzzy subsets of this very special
type; and there is no conflict between fuzzy and crisp methods.
Some examples
Assume the same definition of TALL as above, and in addition, assume that we have a fuzzy
subset OLD defined by the membership function:
40. 178 ARTIFICIAL INTELLIGENCE
old (x) = { 0, if age(x) < 18 yr.
(age(x)-18 yr.)/42 yr., if 18 yr. <= age(x) <= 60 yr.
1, if age(x) > 60 yr. }
And for compactness, let
a = X is TALL and X is OLD
b = X is TALL or X is OLD
c = not (X is TALL)
Then we can compute the following values. (for height 6.0ft and age 50 years)
height age X is TALL X is OLD a b c
6.0 50 0.5 0.76 0.5 0.76 -0.5
Uses of fuzzy logic
Fuzzy logic is used directly in very few applications. The Sony Palmtop apparently uses a fuzzy
logic decision tree algorithm to perform and written in (well, computer light pen) Kanji character
recognition.
A fuzzy expert system
A fuzzy expert system is an expert system that uses a collection of fuzzy membership functions
and rules, instead of Boolean logic, to reason about data. The rules in a fuzzy expert system are
usually of a form similar to the following: if x is low and y is high then z = medium where x and y
are input variables (names for know data values), z is an output variable (a name for a data value
to be computed), low is a membership function (fuzzy subset) defined on x, high is a membership
function defined on y, and medium is a membership function defined on z.
The antecedent (the rule's premise) describes to what degree the rule applies, while the
conclusion (the rule's consequent) assigns a membership function to each of one or more output
variables. Most tools for working with fuzzy expert systems allow more than one conclusion per
rule. The set of rules in a fuzzy expert system is known as the rule base or knowledge base.
The general inference process proceeds in three (or four) steps.
41. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 179
1. Under FUZZIFICATION, the membership functions defined on the input variables are
applied to their actual values, to determine the degree of truth for each rule premise.
2. Under INFERENCE, the truth-value for the premise of each rule is computed, and applied
to the conclusion part of each rule. This results in one fuzzy subset to be assigned to
each output variable for each rule. Usually only MIN or PRODUCT are used as inference
rules. In MIN inferencing, the output membership function is clipped off at a height
corresponding to the rule premise's computed degree of truth (fuzzy logic AND). In
PRODUCT inferencing, the output membership function is scaled by the rule premise's
computed degree of truth.
3. Under COMPOSITION, all of the fuzzy subsets assigned to each output variable are
combined together to form a single fuzzy subset for each output variable. Again, usually
MAX or SUM are used. In MAX composition, the combined output fuzzy subset is
constructed by taking the point wise maximum over all of the fuzzy subsets assigned to
variable by the inference rule (fuzzy logic OR). In SUM composition, the combined output
fuzzy subset is constructed by taking the point wise sum over all of the fuzzy subsets
assigned to the output variable by the inference rule.
4. Finally is the (optional) DEFUZZIFICATION, which is used when it is useful to convert the
fuzzy output set to a crisp number. There are more defuzzification methods than you can
shake a stick at (at least 30). Two of the more common techniques are the CENTROID
and MAXIMUM methods. In the CENTROID method, the crisp value of the output variable
is computed by finding the variable value of the center of gravity of the membership
function for the fuzzy value. In the MAXIMUM method, one of the variable values at which
the fuzzy subset has its maximum truth-value is chosen as the crisp value for the output
variable.
Extended Example:
Assume that the variables x, y, and z all take on values in the interval [0,10], and that the
following membership functions and rules are defined:
low(t) = 1 - ( t / 10 )
high(t) = t / 10
rule 1: if x is low and y is low then z is high
42. 180 ARTIFICIAL INTELLIGENCE
rule 2: if x is low and y is high then z is low
rule 3: if x is high and y is low then z is low
rule 4: if x is high and y is high then z is high
Notice that instead of assigning a single value to the output variable z, each rule assigns an entire
fuzzy subset (low or high).
Notes:
1. In this example, low(t)+high(t)=1.0 for all t. This is not required, but it is fairly common.
2. The value of t at which low(t) is maximum is the same as the value of t at which high(t) is
minimum, and vice-versa. This is also not required, but fairly common.
3. The same membership functions are used for all variables. This isn’t required, and is also
not common.
In the fuzzification subprocess, the membership functions defined on the input variables are
applied to their actual values, to determine the degree of truth for each rule premise. The degree
of truth for a rule’s premise is sometimes referred to as its ALPHA. If a rule's premise has a
nonzero degree of truth (if the rule applies at all...) then the rule is said to FIRE. For example,
x y low(x) high(x) low(y) high(y) alpha1 alpha2 alpha3 alpha4
0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
0.0 3.2 1.0 0.0 0.68 0.32 0.68 0.32 0.0 0.0
0.0 6.1 1.0 0.0 0.39 0.61 0.39 0.61 0.0 0.0
0.0 10.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0
3.2 0.0 0.68 0.32 1.0 0.0 0.68 0.0 0.32 0.0
6.1 0.0 0.39 0.61 1.0 0.0 0.39 0.0 0.61 0.0
10.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0
3.2 3.1 0.68 0.32 0.69 0.31 0.68 0.31 0.32 0.31
3.2 3.3 0.68 0.32 0.67 0.33 0.67 0.33 0.32 0.32
10.0 10.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0
43. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 181
In the inference subprocess, the truth-value for the premise of each rule is computed, and applied
to the conclusion part of each rule. This results in one fuzzy subset to be assigned to each output
variable for each rule.
MIN and PRODUCT are two INFERENCE METHODS or INFERENCE RULES. In MIN
inferencing, the output membership function is clipped off at a height corresponding to the rule
premise's computed degree of truth. This corresponds to the traditional interpretation of the fuzzy
logic AND operation. In PRODUCT inferencing, the output membership function is scaled by the
rule premise's computed degree of truth.
For example, let's look at rule 1 for x = 0.0 and y = 3.2. As shown in the table above, the premise
degree of truth works out to 0.68. For this rule, MIN inferencing will assign z the fuzzy subset
defined by the membership function:
rule1(z) = { z / 10, if z <= 6.8
0.68, if z >= 6.8 }
For the same conditions, PRODUCT inferencing will assign z the fuzzy subset defined by the
membership function:
rule1(z) = 0.68 * high(z)
= 0.068 * z
Note: The terminology used here is slightly nonstandard. In most texts, the term "inference
method" is used to mean the combination of the things referred to separately here as "inference"
and "composition." Thus you'll see such terms as "MAX-MIN inference" and "SUM-PRODUCT
inference" in the literature. They are the combination of MAX composition and MIN inference, or
SUM composition and PRODUCT inference, respectively.
You'll also see the reverse terms "MIN-MAX" and "PRODUCT-SUM" – these mean the same
things as the reverse order. It seems clearer to describe the two processes separately.
In the composition subprocess, all of the fuzzy subsets assigned to each output variable are
combined together to form a single fuzzy subset for each output variable.
MAX composition and SUM composition are two COMPOSITION RULES. In MAX composition,
the combined output fuzzy subset is constructed by taking the pointwise maximum over all of the
fuzzy subsets assigned to the output variable by the inference rule. In SUM composition, the