For four years in the late 1990's and early 2000's I worked at Stanford University’s Section on Medical Informatics doing research in Artificial Intelligence. I was one of the primary architects on the Protege project (an open-sourced knowledge representation system) and spent quite a bit of time thinking about how to represent knowledge, the logical structure of knowledge, how to define constraints on information, and how to classify algorithms (a.k.a. “problem-solving methods”).
This talk, from 2001, describes the underlying architecture formal knowledge model used in Protege, how "slot widgets" play in the system, and goes on to describe PAL: the Protege Axiom Language. It's long, and really only for knowledge representation afficionados, but it's pretty complete.
1. 1
Formal Aspects Of Protege
William Grosso
Stanford Medical Informatics
Stanford University
2. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Overview
• Interoperability is important
– HPKB: DARPA project with many participants
– Protégé-2000: Lots of developers in many
locations
• Ray can’t write code fast enough !
– Interoperability requires common ground
• Shared semantics for common constructs
• The new Knowledge Model
3. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Proposed HPKB Scenario
Knowledge Base(s)
in a KB Server
Shared
Ontologies
Situation
Data
PSM
PSM
PSM
4. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Knowledge Bases in HPKB
• Ontologies are ways to share well-defined
information
– Define knowledge structure
– Useful as a coupling mechanism
• Knowledge Bases serve multiple roles
– Repositories of shared knowledge
– Community blackboards (with semantics).
5. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Interoperability requires
Semantics
• As long as all the developers are in the
same building, things can be
underspecified
– Rely on “group knowledge” and “established
practice”
• Larger working groups (over time, space,
or in numbers of people) can require more
precise specifications
6. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Knowledge Models
• Formal specification of the way knowledge
is represented
– Precise, human-readable definitions of
structures in a language
• Frequently unwritten
– Implied by the documentation
– Deduced via experience
7. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Knowledge Models at SMI
• Work spurred by the OKBC Specification
– Defining the Protégé Knowledge Model
– Comparing it to other knowledge models
• Goal: Enable Protégé tools to interoperate with
knowledge-based systems from other labs
– Goal is knowledge reuse
• Implicit Hypothesis: understanding knowledge
models will facilitate interoperation
8. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Example: Protégé and Loom
• Protégé: A suite of tools to simplify
knowledge base design and construction
• Design ontologies, create KA tools to acquire
instances
• Explicitly adopts notion of external PSMs in order
to focus on KA
• Loom : An environment for knowledge-
based system construction
• Everything done inside the Loom environment
9. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Frame-Based Knowledge Models
• Both Protégé and Loom use frame-based
knowledge models
– Classes, instances, slots, facets, …
• We expect differences over things like
default values and models of time
• But the knowledge models differ on more
mundane notions as well
10. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
What’s a Slot ?
• Protégé/Win
– Slots are not part of
the global namespace
• Define attributes of a
frame
• Cannot be referred to
independently of either
a class or an instance
– Which slots are
attached to an
instance is part of the
class definition
• Loom
– Slots are part of the
global namespace
• Defined by defrelation
construct
• Have attributes
– domain, range, …
– Slots can be reified
• Instances of a slot class
correspond to a specific
relation (between two
instances)
11. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
What’s an Instance ?
• Protégé/Win
– Every instance is a
direct instance of a
single specified class
• Automatically has the
own slots defined by
the class
• No other slots allowed
– Direct instance typing
cannot change.
• To change type at all,
need to do explicit
operations on the class
• Loom
– Type of an instance
does not have to be
specified
– Classifier deduces
instance types
• Types of instances can
change (without being
explicitly set)
– Instances can be
direct instances of
more than one class
12. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Interoperation ?
• Two different development environments
– Two different user models
– Two different approaches to KA
– Two different knowledge models
• Both “frame based”
• Disagree on the definitions of commonly used
structures
• Solution: ad{o,a}pt the OKBC knowledge
model
13. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Protégé-2000 Is Like HPKB
• Ray can’t write the code fast enough
– Therefore someone else has to write it
– Protégé-2000 allows everyone to customize it
using Java components
• If we glue together components written at
multiple labs, and knowledge bases
produced by many different people, we
might inadvertently introduce the same
issues
14. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Components
Central
Framework
Storage
Model
Storage
Model
Widget
WidgetWidget
Widget
Widget
Widget
Widget
Provided by SMI. “Plumbing”
that cannot be replaced or
augmented.
Every running application uses
a storage model for persistence.
SDI currently provides two
(CLIPS format and RDBMS
format).
Widgets mediate between the knowledge base
and the user. They display small pieces of the
knowledge base in a way that the user can
understand and manipulate. SMI provides a
generic set of default widgets.
15. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Widgets
• Widgets can be added to the platform
(using JavaBeans)
• There is a well-defined Widget API for
building new widgets and adding them to
a project
• Widgets can now be arbitrarily complex
– Dialogs are used to configure widgets
– State is stored into a separate knowledge
base (the project knowledge base)
16. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Storage Models
• Protege/Win stored knowledge bases in a
CLIPS-compatible format
• The goal for Protege-2000 is to use a
wide-variety of persistence mechanisms
– CLIPS-format is still useful
– OKBC servers are important
– Relational databases could be useful
• To do this, we need to isolate out the
persistence mechanism as a component
17. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Axioms and Constraints
• Protege/Win used a frame-based language
• Protege-2000 keeps the emphasis on
frames, but adds in a constraint language
– Based on KIF
– Compatible with OKBC
19. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Knowledge Models
• Formal specification of the way knowledge
is represented
– Precise, human-readable definitions of
structures in a language
• Gives guarantees of what must hold in the
knowledge base
– Other things may be true, in addition to what the
knowledge model guarantees
• Protege ad{a,o}pts the OKBC knowledge model
20. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
The Role of Logic
• Frames are intuitive for humans
– Concept / instance distinction dates back to
Plato
• But they’re not very well-defined
– What Minsky meant by frame is not what
Winograd meant by frame (and is certainly
not what Plato meant by form)
• We use logic to formalize the definitions
– Make the underlying assumptions explicit
21. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
KIF
• Knowledge Interchange Format
• Developed in early 1990’s as a standard
syntax for first order logic
– entirely ASCII and somewhat LISPy
• (forall ?x (exists ?y (......))))
• Currently a “draft standard”
• http://logic.stanford.edu/kif/dpans.html
• Slight peculiarity: relations are multiple
arity
22. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Frames
• A Frame is simply a symbol
– A symbol is simply a 0-ary relation
• That is, it can be an argument to a
function or a predicate
– That is, it is something we can make
assertions about
• Types of frames include most of the
traditional modelling constructs (classes,
instances, slots , ...)
23. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Classes
• Classes are frames (are symbols ....)
• Classes are also unary predicates
– KIF allows multiple arity predicates
– That is, classes are sets (the set of instances)
– Members of the set == instances of the class.
• You can assert things about the class
(using the fact that the class is a frame)
• You can reason about the elements of the
associated set
24. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Defining Subclasses
• “Subclass” usually means two things:
– All instances of the subclass are instances of
the superclass
– Anything that is true of the superclass (as a
class) is true of the subclass
• The first of these is simply “subset”
(=> (subclass-of ?S ?P)
(forall ?F (=> (?S ?F)
(?P ?F))))
25. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Multiple Inheritance
• Easy to define in this model
• For Set-aspects, simply use “subclass ==
subset”
– A set can be a subset of more than one class
• As frames, enforce substitutability
– Any sentence that can be asserted about the
superclass, as a class, ought to be true of the
subclass
– Winds up being union of logical statements
26. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Slots
• Slots are frames (are symbols ...)
• Slots are also binary predicates (taking a
frame and a value)
• Slots also have associated predicates:
– binary (take a slot and a frame, formalize the
notion of attachment):
– ternary (take a slot, a frame, and a value)
template-slot-value slot-value
template-slot-of slot-of
27. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Attaching a Slot
• Slots are frames that get attached to other
frames
– Attaching a slot to a class, for example
• You can attach a slot as either a
template slot or an own slot
– template slots define information that can be
propagated to elements of a class (and via
inheritance)
– own slots are strictly local information
28. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Slots Propagation
T
T T
T O
OO
O
/dev/null /dev/null
instance-ofsubclass-of
29. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Restating this in KIF
(=> (template-slot-value ?S ?C ?V)
(and (template-slot-of ?S ?C)
(=> (instance-of ?I ?C)
(holds ?S ?I ?V))
(=> (subclass-of ?X ?C)
(template-slot-value ?S ?X ?V))))
30. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Restating this in English
“If V is a template slot value of S on the class C,
then we know the following three things:
1. S has been attached to C as a template slot
2. V is an own slot value for all instances I of C
3. V is a template slot value for all subclasses X of C”
31. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Restating this in Swedish
“Om V är värdet på en mallegenskap S på klassen C,
så vet vi följande tre saker:
1. S har kopplats till C som en mallegenskap
2. V är ett eget värde på egenskapen för alla instanser I av C
3. V är värdet på mallegenskapen för alla underklasser X av C”
32. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Instances
• An instance is a frame
• The idea of “instance” is, more or less, a
GUI notion (and has no implications for
the knowledge model)
33. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Facets
• Facets are frames (and symbols ...)
• Facets are also ternary predicates (taking
a frame, a slot, and a value)
• Facets also have associated predicates:
– ternary (take a slot, a frame, and a facet;
formalize the notion of attachment):
– 4-ary (take a slot, a frame, a facet and a
value)
template-facet-of facet-of
34. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Facet Restrictions
• Template facets can only be attached to
template slots
• Having a value implies attachment
• Similarly for own slots
(=> (template-facet-of ?F ?S ?C)
(template-slot-of ?S ?C))
(=> (template-facet-value ?F ?S ?C ?V)
(template-facet-of ?F ?S ?C))
35. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Facet Propagation
• Facets are attached to
(frame, slot) pairs
• Whenever a slot
propagates, from one
frame to another, the
facets are carried
along
T
T O
O
subclass-of
/dev/null
36. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Canonical Facets
• The standard facets are local (e.g. at a
single (frame,slot) pair) constraints
:VALUE-TYPE
:CARDINALITY
:NUMERIC-MINIMUM
:NUMERIC-MAXIMUM
(=> (:VALUE-TYPE ?S ?F ?C)
(and (class ?C)
(=> (holds ?S ?F ?V)
(instance-of ?V ?C))))
37. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
OKBC Revisited
• Protégé-2000 knowledge-bases are OKBC-
compliant
• Protégé-2000 is not OKBC generic
– There are OKBC knowledge bases that
Protégé-2000 cannot handle
– It’s close, though !
• Differences are KA related
– Protégé instances have exactly one class
– The role slot
39. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Overview
• Examples of Constraints
• Design Desiderata
• The Constraint Language
• Implementation Decisions
• The Default Implementation
• Dimensions for Evolution
41. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
The Big Modular Picture of
Protege
Core
Protege
Framewor
k
Storage
Model
Widgets
Widgets
Widgets
Widgets
Widgets
Widgets
Constraint
Engine
Actual
KB
42. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Full and formal semantics
• Widgets can include “widgets for acquiring
specific types of constraints”
• Multiple constraint engines are possible
– Performing different checks at different times
– Replacing one engine with another
• The entire kb gets stored out to some
server
• Without formal semantics (a logical
theory), this is just not possible
43. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Compatibility with the OKBC
knowledge model
• OKBC does not specify an axiom language
• OKBC is specified as a set of relations in
KIF
– Classes are unary predicates, slots are binary
predicates, ...
• All of these relations should immediately
be accessible from within the constraint
language
– And the constraint engine should give them
the right semantics
44. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Ease of Translation
• Important goal: we want to be able to use
Protege as a front-end to a wide-variety of
knowledge base servers
• This means that the constraint language
ought to be easily translated into a wide-
variety of constraint languages
– At the very least, figuring out what can be
translated ought to be easy
45. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Supported by a reasonable
default implementation
• KMG will provide a default implementation
of the constraint language
– Not very efficient
– But good semantics for KA
– Good enough to bootstrap the process
• As we learn more about constraints, and
how they are used, we hope that people
with real expertise will step forward
46. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
A Deficient Syllogism
Major Premise:
Interoperability requires formal semantics (and
knowledge models based on mathematical
logic)
Minor Premise:
Humans don’t easily adapt to formal languages
Conclusion:
Widgets !!!!!!!
47. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Human Readability is a Red
Herring
• The casual user interacts with forms
– The expert user knows about classes and
instances
– Very few users know about the underlying
logical formalism
• If we design widgets for acquiring
constraints, then the user will never see
the constraint language
49. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
A Single Constraint Language
• Constraint language is really an interlingua
for communication
– Between widgets and the framework
– Between the framework and the storage
model
• If we want all the components to evolve
independently and communicate
gracefully, we need to fix a single
constraint language
50. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Logic
• We decided on a variant of KIF
• We use the KIF connectives and the KIF
syntax
• Not all the KIF constants and predicates
are included
– Our theory of arithmetic is much smaller
• (defrelation ...) is omitted
– For now ?
51. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Sorted Logic
• Two new constructs in the language
– defset: allows the user to define a “bag” of
values.
• Similar to notion of class, but with no support in
the ontology tab
• Useful for enumerated types
– defrange: all variables must have their types
declared
• “types” can include things like “is a target of [slot
name]”
52. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Reified Constraints
• There is a knowledge-base for constraints
– Acquiring a constraint is really “acquiring an
instance of :Constraint”
– You can annotate sentences and relations
with useful information
• You can store constraints out to a vanilla
frame-based system
– To a simple KB server, a constraint is just
another frame
53. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
The Constraint KB
• To use constraints, you must include the
constraint knowledge base
– Will also contain default implementation of
engine (as a tab widget)
– Will also include java code for the standard
relations
– Will also include widgets for constraint
acquisition
– Won’t include any instances
55. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Constraints and Axioms
• Constraints and Axioms use the syntax of
logic but have different semantics
– Axioms can be used to assert new knowledge
– Constraints are restrictions on existing
knowledge
• (forall ?x (exists ?y (rel-name ?x ?y)))
– Asserted as an axiom: it’s reasonable to
create a skolem constant and bind it to ?y
– Asserted as a constraint: might not want to
skolemize
56. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Multiple Interpretations of a
Single Theory:
• No engine can return “true” when “OKBC”
would return “false”
• Model theoretic terms: If an engine thinks
there is a model, then there must be one
– But engines are free to overlook models
57. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
New functions and predicates
are implemented procedurally
• KIF has the (defrelation ...) construct to
define new relations
• Our point of view: A relation is, almost
always, something that should be defined
in the ontology
• The exceptions (mostly n-ary relations)
should be annotated explicitly and defined
procedurally
60. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
The Language is defined in a
Knowledge-Base
• PAL: Protege Axiom Language
• The PAL knowledge-base contains
– The constraint ontology
– The default relations
• And the java code that implements them
– The default implementation
• Once again, taking advantage of
knowledge-base inclusion
61. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Enforcement of constraints is not
necesarily real-time
• When the user loads (or saves) a
knowledge-base, it should be consistent
• It’s not always possible for the user to
always have a consistent KB while editing
– And, even if it were possible, it might be
inconvenient.
• Therefore, the user should decide when to
check constraints
62. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Enforcement via plug-ins (and
tabs)
• The basic way users will interact with
constraint engines will be via tabs and
widgets
– We want to enable special types and
categories of constraints to be annotated
• Basic mechanism: subclassing :Constraint
– We want to have multiple possible engines,
depending on context and user preference
• Constraint tabs are just another way of
interacting with the KB .
64. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
What is a knowledge base ?
• Used to be classes and instances
• Now also includes widgets
– Java code !
• Now also includes constraints
– Instances with an “interpretation” beyond the
standard meaning associated to frames
– Custom pieces of java code that implement
new relations (possibly domain specific) for
the constraint language
65. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
We have evolved from OKBC to
some extent
• If we use the ontology as a type system, it
is convenient to have the types be
mutually exclusive (instances are instances
of a single class)
• The “role” predicate
67. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Model-checking, rather than
theorem proving
• Make strong “closed world” assumptions
• Main goals:
– Detect incomplete entry of information
– Check entered information for inconsistencies
68. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Envisioned: Constraints are
mostly Local
• The “more false” this assumption is, the
worse the engine will perform(the better a
traditional theorem prover would perform
?)
70. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Richer axiom ontology
• Subclassing our ontology to provide more
detailed information
• “Hints” to enforcement engines
– “This is best validated using [subroutine x]” or
“This statement is complexity level gamma”
• Statement could be generated by a widget
• Your widget, in your domain, generating PAL
statements for my engine to check
– Formal Semantics necessary
– Engines might let the user check a subset of
the theory
71. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
More Predicates and Functions
• Not many are included in the default
implementation
– Mostly for reasoning about types, arithmetic,
and slot values (taking transitive closures)
• Over time, we hope that people will
implement predicates and pass the code
to us (for inclusion as part of the Protege
distribution)
• Note also that relations don’t have to be
general -- you can add knowledge-base
72. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Other engines
• In particular, a theorem prover ?
• Can GSAT be used as a preprocessing step
?
– How about the work on ALL ?
73. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Support for Knowledge-
Acquisition
• The knowledge-model is done
• The axiom language is done (as a spec)
• Engines are “a mere matter of
programming” (similar things have been
done for 25 years now)
• What’s left ?
74. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Subclassing the PAL Ontology to
provide hooks for widgets ?
• :CONSTRAINT only provides two slots
(:pragmatics and :sentence)
• How about other slots
– Evaluation cost (for different engines) ?
– Evaluation hints ?
– What widget generated the axiom ?
75. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
“No A is a B”
• A statement that is often enforced by
defining separate classes
• But often not:
– No hemophiliac should be taking Lasix
– Do we really want “Hemophiliac” as a subclass
of “Person” ?
– Do we really want “Lasix_Taker” as a subclass
of “Patient” ?
76. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Let’s write it in PAL
(forall ?P (=> (and (Person ?P)
(has-disease ?P Hemophilia))
(not (taking-drug ?P Laxol))))
77. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Partially filled out
instance defines
matching
Partially filled out
instance defines
matching
This is really a Venn Diagram
Person
Person
Empty Intersection
78. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Widgets play a role here:
• Widget is placed on screen to mediate
between humans and KB
• Widget generates PAL statements
• Engine interprets PAL statements
• User may or may not ever see PAL
79. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Things that are done:
• The knowledge model is done
• The constraint language is done
• The default implementation is designed
and (partially implemented)
80. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
Things that we will do:
• Finish the default implementation
• Publish a full spec (as a Tech Report) ?
• Serve as a clearinghouse for engines and
widgets