SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Gellish
         A standard data and knowledge representation
                    language and ontology

                   Are Data Models becoming Superfluous?


                                            by
                                Ir. Andries van Renssen
                          Shell Global Solutions International
                              Andries.vanRenssen@shell.com


Abstract
Data storage and data communication lack a common standard universal data model as well
as a common data language and knowledge base with a taxonomy of concepts and a
grammar for data exchange messages. This article presents a solution to this problem in the
form of the new Gellish language and knowledge base, as an extension of the standard data
models and ontology of two new ISO standards. The article presents Gellish as a language
for neutral data exchange between systems, that can replace data models, and that provides
an extendable ontology with standard reference data for customization and harmonization of
systems. The definition of Gellish includes the public domain (“open data”) Gellish
knowledge base with definitions of a large number of concepts and product models.
It illustrates that a single Gellish Table in a database or data exchange file, is sufficient to
express a wide range of kinds of facts about classes as well as facts about individual objects.
Keywords: knowledge representation, data exchange, language, data models,
standards, ontology, semantic web, knowledge base, classification system


Table of Content




Knowledge versus Data Models                    1                                   13/04/2010
Introduction
Currently, each software system stores its data using its own data model and communicates
with other systems usually using a dedicated interface data structure, which means that it
applies a dedicated interface data model. The large variety of data models cause that data
exchange between systems is costly because of the required conversion of the data from the
semantics of one data model to the other. This demonstrates the urgent need for widely
applicable common standard data models.
Often systems can be ‘customized’ by adding ‘reference data’ as instances, such as the
definition of equipment types, document types, activity types, property types, pick lists, etc.
However, reference data are usually different per implementation, even when database
structures of different systems are equal, such as is the case with several implementations of
the same system. This also holds for different implementations of the same system, such as
a CAD, CAE, PDM, PLM, ERP or CRM system. The consequence is that data in those
implementations can still not be compared, integrated or exchanged without costly data
conversion processes. This illustrates the urgent need for a common dictionary,
classification system or taxonomy of reference data, because there is currently no standard
user data language.
In the current systems there is a separation between the world of data models and the world
of instances. Data models are developed by IT specialists (data modelers) who document
them using either proprietary tools or using a standard data modeling language, such as
EXPRESS (ISO 10303-11) or UML, which languages are especially designed to define data
models. Once a data model is defined in such a language, the data model acts as another
language in which the reference data as well as the user data has to be expressed. The use of
two different languages, one for the model, one for the user data, illustrates the barrier
between the two worlds. It is as if the English language definition is expressed in Chinese.
On top of this comes that each programmer and each reference data producer is free to
define his own terminology using those data definition languages!
The result of the current state of the art is that data storage is done in a Babylonian mix of
data models and reference data ‘languages’ with the consequence that exchange of data
between systems is impossible, except where dedicated bilateral translators are created not
only for each pair of data models, but also for the data content ‘languages’.
The current situation is sketched by Smith and Welty (2001) as follows: “Out of the
apparent chaos, some coherence is beginning to emerge. Gradually, computer scientists are
beginning to recognize that the provision, once for all, of a common, robust reference
ontology – a shared taxonomy of entities – might provide significant advantages over the
ad-hoc, case-by-case methods previously used”.
Several attempts are made to develop an ‘upper ontology’, such as SUMO by Niles and
Pease (2001), the IEEE Standard Upper Ontology, SUO (2001), the Cyc ontology, Lenat
(1995) and GOL, Degen et al (2001). However none of them integrates the upper level
ontology with a lower level ontology of reference data. In other words they do not integrate
a generic data model with reference data and a language for the description of knowledge
and of individual objects and processes.
This article presents a solution to the above-mentioned issues in the form of the Gellish
language. Gellish satisfies the criteria for proper ontologies as expressed by Degen et al
(2001 par 6.1), but is not limited to an upper ontology. It includes and extents concept
definitions that also appear in other sources such as ISO standards and IEC standards, and
knowledge stemming from industry standards and proprietary sources. It is extendable just
as any natural language. Its taxonomy and knowledge base uses unique identifiers for


Knowledge versus Data Models                    2                                   13/04/2010
concepts, thus allowing for synonyms and multiple names in various languages. The latter
enables the expression of propositions about facts in one natural languages and automatic
translation and presentation in any other natural language.
Gellish eliminates the traditional barrier between the data model definitions of classes and
the data instances. The Gellish language demonstrates that this barrier is not necessary and
that there are clear advantages when class definitions, reference data and user data are
expressed in one and the same language.
Standard data models, ontologies and reference data
There are several developments of standard lower level ontologies and reference data
libraries, stimulated among others by requirements of the e-commerce ‘market places’ and
the developments around The Semantic Web promoted by Lee et al (2000) and the Web
Ontology Language OWL.
For example, the UNSPSC code (http://www.unspsc.org/), Ecl@ss (http://www.eclass.de/),
Trade Ranger (http://www.trade-ranger.com/EN/Pages/ContentStandards.asp), etc. These
standards have their value mainly in the standardization of terminology, but do not provide
a standard language or a standard data model for general use, because of their limited
semantic expression power due to the fact that they apply only a few relation types and lack
of integration with a rich upper ontology.
There have also been several attempts to develop standard data models for data exchange
or for data storage. Some of them are proprietary, but others are in the public domain. Those
standard data models are defined independent of a particular system, and are therefore called
‘neutral’. Those standard data models are usually developed for a particular application
domain instead of being limited to a particular system.
Examples of standard data models are the STEP family of standards in ISO 10303, such as a
graphics data model AP203, a data model for the automotive industry (AP214), one for
piping systems (AP227), one under development for the defense industry (AP239, PLCS),
etc. The integration of all those data models into one overall data model is not yet fully
achieved. Although the scopes of these valuable standard data models are wide, they are still
limited to particular application area’s and do not provide a general ‘common language’ yet.
A further step towards a data model with a generic scope was the development and
publication of the Epistle Core Data Model (2001), in which development the author of this
article participated. From that, two new ISO standards were derived, ISO 15926-2 and its
counterpart within the STEP family (AP221). Although these generic data models stem
from the process industries, they have the generic nature of an upper level ontology, which
make that they are applicable in other application domains as well.
To become practically applicable in a particular application domain, these generic data
models need a standard ‘reference data library’ or lower level ontology, in order to add
standard definitions of application domain specific concepts and to specialize the generic
data model. The author coordinated the development of such a standard reference data
library, called STEPlib. This is a main source for the common standard library ISO
15926-4.
Then it was discovered that the top of the specialization hierarchy of standard data in the
library coincided with the entities, attributes and relations in the generic data model. This
led to the inclusion of the data model in the library. In other words, the upper level ontology
was combined with a lower level ontology. The insight that information should be contained
in relations and not in objects, led to the birth of the Gellish language, which is based on
standard relation types, expressed by natural language ‘phrases’.




Knowledge versus Data Models                   3                                   13/04/2010
The Gellish language and ontology
Gellish is a public domain standard data and knowledge representation language and
ontology that that is defined in STEPlib. It does not have the barrier between the user data
and the IT data model data. It contains and extents the concepts of the above mentioned
generic data models and integrated and extended them with standard reference data and a
knowledge base with product and process models. The ontology includes also the definition
of a large number of standard fact types (or relation types) that defines the grammar of the
Gellish language. It contains the definition of over 20.000 concepts arranged in a
specialization hierarchy of classes. These concepts can be interpreted as entity types,
attribute types and relationship types or as a classification system or taxonomy. This makes
Gellish equivalent to a very large data model.
In addition to that STEPlib contains a large number of relations between the concepts. They
define the content of the knowledge base of product models and process models.
Gellish is not object oriented, but fact oriented. The basic Gellish object is therefore a fact.
Each (atomic) fact is expressed as a relation between (two) objects.
For example, fact 1 is expressed by a particular relation between objects with unique
identifiers (UID’s) 100 and 101. This expression (1, 100, 101) illustrates the structure of
each basic Gellish expression. Gellish requires that both the objects and the fact must be
classified explicitly by standard classes, including standard relation types. The standard
classes are predefined in the Gellish ontology. In addition to that, objects may have a name.
This enables that the expression can be interpreted correctly by software.
Gellish and the above mentioned ISO standards are both based on the understanding that
there appears to exist a limited set of application independent standard relation types that are
sufficient to model all kinds of products and processes. Gellish standardizes these relation
types. The relation types also define the role types that the related objects play in the
relations with each other. The variety and extendibility of standard relation types define the
semantic expression capabilities of Gellish.
A large part of the Gellish relation types is defined in the ISO standards and an extended set
is defined in the TOP part of the Gellish language definition (STEPlib).
A standard implementation of Gellish is defined as a Gellish Table. In a Gellish Table the
basic Gellish expression becomes:
Left hand      Left hand    Fact     Relation         Relation type          Right         Right hand
 object         object      UID      type UID             name                hand         object name
  UID            name                                                      object UID
   100           thing-1        1      2850            is related to            101            thing-2


In a Gellish Table one (atomic) fact is represented by one record, being as a relation
between two object UID’s, the names of the objects and the classification of the fact. The
classification of the objects is done via separate classification facts in additional records.
Some examples of facts from a particular application domain, which illustrates the use of
standard Gellish relation types are:
  Left       Left hand Fact Relation Relation type name                Right     Right hand        Scale
 hand       object name UID type UID                                   hand      object name
 object                                                                object
  UID                                                                   UID
130091      diesel engine   2       1146   is a specialization of      130108         engine




Knowledge versus Data Models                      4                                        13/04/2010
104           M-1          3      1225       is classified as a     130091    diesel engine
130802        cylinder       4      1146     is a specialization of   730063       artifact
  107           C-1          5      1225       is classified as a     130802       cylinder
  107           C-1          6      1190           is part of          104          M-1
  107           C-1          7      1727          has aspect           108     volume of C-1
  108      volume of C-1     8      1225       is classified as a     550140   internal volume
  108      volume of C-1     9      2044       is quantified as       922235        1800         cm3
  104           M-1         10      4760         is subject of         110         order-1


   •     Note, for human readability, the relation type UID is ignored in the tables below.
   The above table illustrates:
   •     Standard Gellish relation types, that classify the facts, and that determine the
            expression capabilities and semantics of Gellish.
   •     Examples from the large number of standard object types that are predefined in
            Gellish. For example: engine, diesel engine, cylinder, artifact, internal volume,
            1800 and cm3.
   •     The way in which new object types can be added: such as fact 2 and 4. Although
            they already exist in Gellish. But if diesel engine and cylinder would not have
            existed, they could have been added in this way.
   •     It is possible in Gellish to express facts, such as the volume of C-1, without the need
              that such a fact is pre-modeled in the data model. Although such a fact type
              could be defined in Gellish, after which this particular instance can be verified
              against such a definition. It could also be defined to be obligatory in a particular
              context, after which the instances can be validated on completeness and
              compliance.
   •     One table is suitable to express many kinds of facts.
Note: The table above presents just an example of some of the capabilities of Gellish. For
      example, Gellish also allows to express in which language the facts are expressed,
      whether the objects are real or imaginary, what the communicative intent is, who the
      author of a proposition is and the addressee, etc.
Storage and exchange of data as well as semantics in Gellish
In this paragraph I will describe how knowledge, data and semantics are represented in
Gellish.
The generic nature of Gellish allows expressing any complex network of facts. For example
it allows expressing that:
- physical objects (of any kind) have properties (of any kind),
- properties have values,
- physical objects have parts,
- physical objects participate in activities or processes in particular roles,
- etc.
But for clarity I will use a specific example, being the fact that:
- a particular pump (‘P-1’) is pumping a particular stream (‘S-1’).



Knowledge versus Data Models                        5                                      13/04/2010
In a conventional database it is required to declare some entity types and attribute types that
define the semantics in the form of a data model. In case of the example, the data model
could for example consist of the entity types ‘pump’, ‘process’ and ‘stream’, each with
some attributes.
In Gellish, the concepts ‘pump’, process’ and ‘stream’ are not entity types, but they are
concepts that are defined via facts that are expressed as relations in a generic knowledge
base.
The knowledge base has a structure that only ‘knows’ the minimum number of ‘basic
semantic axioms’ and contains the definition of a large number of concepts. The minimum
set of ‘basic semantic axioms’ comprises the fundamental ontological concepts of Gellish
that should be known and understood and which are sufficient for the definition of
additional semantic concepts.
For the definition of a new concept it is required to define a coherent set of elementary
facts, expressed as relations between the new concept and the existing concepts. In other
words, each new concept requires the creation of a structure as presented in figure 1.

                                                kind of thing




                            is (a)                      is a              is a




                                                      role
  anything                playing                                    requirement            relation
                                                (of something
                           a role                                       of role
                                                  in relation)
                           plays                                          in
  - object-1                                       - role-1
                        played by                                      requires
                                                                                            - relation-1
  - object-2                                       - role-2


                              Figure 1, Basic semantic concepts
The minimum set of ‘basic semantic concepts’ that are the axioms of Gellish and which
meaning should be understood is:
        - anything
        - role
        - relation / relations
             - plays role
             - requires role
             - is / is a         (is classified as a)
        - individual thing / individual things
        - kind of thing / kind of things
        - single thing / plural thing


The structure of figure 1 holds for facts about classes as well as facts about individual
objects (instances) or relations, but also for single objects as well as for plural objects. In
other words, object-1 and object-2 in figure 1 can be either a single or plural individual
object, relation or class. The lines in the top left corners of the boxes indicate that the
structure is a typical instance.



Knowledge versus Data Models                       6                                   13/04/2010
Any other ‘atomic fact’ is expressed as such a structure. In other words, any atomic fact is
expressed as an ‘atomic relation’ between two or more ‘objects’ and by the classification of
the ‘objects’, the ‘roles’ and the ‘relation’. This implies that an atomic fact is expressed by a
structure of nine (9) relations, formed by the blue boxes in figure 2 (note that 4 of the 5
boxes appear twice in an atomic fact).
For example the fact that impeller O1 is part of centrifugal pump O2 is expressed in Gellish
by the following 4 elementary relations:
   - O1     plays role            R1
   - R1     is required by        C1
   - C1     requires role         R2
   - R2     is played by          O2
These 4 relations relate 5 objects. To interpret them correctly the following 5 additional
classification relations are required:
   - O1     is classified as an   impeller
   - R1     is classified as a    part
   - C1     is classified as a    composition relation (“is part of”)
   - R2     is classified as a    whole
   - O2     is classified as a    centrifugal pump
In practical implementations it appears that the explicit identification of the roles and their
classification can be neglected, because they follow from the classification of the relation
and the definition of the relation type.
Therefore the above relations are usually summarized in 3 Gellish atomic expressions as
follows:
   - O1     is classified as an impeller
   - O1     is part of          O2
   - O2     is classified as a centrifugal pump
From this example it can be seen that the 5 kinds of things with which the 5 objects are
classified need to be present in or added to the semantics of the Gellish knowledge base in
order to ensure that the fact can be interpreted correctly.
The awareness that a knowledge base of predefined concepts is required for a correct
interpretation of Gellish expressions resulted in the development of the top-down
hierarchical definition of the Gellish knowledge base of concepts, including also relation
types, as available in STEPlib.
Knowledge representation: relations between classes
Any fact type that extends the semantics is expressed as a relation between kinds of things.
For example, assume that the concept ‘centrifugal pump’ needs to be added. Then the
following two atomic relations define that concept:
   1. A specialization relation that defines that:
                                centrifugal pump                is a specialization of pump
   2. A relation that defines that a centrifugal pump by definition uses the centrifugal
      principle:
          centrifugal pump has by definition as aspect centrifugal.
These relations build respectively on the definition of the concept ‘pump’ and ‘centrifugal’.




Knowledge versus Data Models                      7                                  13/04/2010
Interpretation of expressions
In current database technology the semantic interpretation of an expression is done via the
fact that any object is implicitly classified by being an ‘instance’ of an entity of which the
semantics are defined.
For example, assume that P1 is an instance of an attribute called ‘name’ of an entity called
‘pump’. This probably means that P1 is the name of a thing that is classified as a pump,
although this meaning comprises two facts that are usually not defined in a computer
interpretable way. It should be noted that if there are no other attributes, this data structure
does not allow the classification of P1 as a centrifugal pump.
In Gellish all semantics is made explicit by the creation of explicit classification relations
between the elements in the expression and the Gellish concepts (classes of objects,
including relations). This replaces the instantiation relations and eliminates the need to
define a data model with entities and attributes, such as the entity ‘pump’ and the attribute
‘name’. This is illustrated in figure 3.


                                                                 Green shaded area = Gellish ontology (STEPlib)




               130206                                              730083                                                      192512
         pump                    is performer of          liquid stream                is subject in                    pumping
               classifier                 classifier               classifier                   classifier                   classifier
                     13                                                 15                                                         14
   is classified as aa
    is classified as          is classified as aa
                               is classified as        is classified as aa          is classified as aa
                                                                                     is classified as             is classified as aa
                                                                                                                   is classified as
                                                        is classified as
               classified                                          classified                 classified                     classified
                                                                                       12
                                                                                                                                    112
         ‘P-101’ 111                                          ‘S-1’    113      ‘is subject in pumping S-1’         ‘pumping S-1’
                                  11     classified                       player                             requirer
                          ‘is performer of pumping S-1’
            player                                                                                                requirer



  Figure 2, Linking a Gellish expression to Gellish concepts through classification
Figure 2 illustrates the expression: P-101 is pumping S-1” (in dark yellow). The ‘pumping
S-1’ process is an interaction between the fluid S-1 and the pump P-101. The pump has the
role as performer and the liquid has the role as subject in the pumping process. The blue
boxes in the green shaded area represent the Gellish concepts, being instances in the Gellish
knowledge base, STEPlib. The explicit classification relations with the concepts in those
blue boxes provide the semantics for the interpretation of the expression.
In a Gellish Table this becomes:
Left hand          Left hand             Fact UID         Relation type name                Right hand            Right hand
object UID        object name                                                               object UID            object name
   111                    P-101               11           is performer of                     112               pumping S-1
   113                     S-1                12             is subject in                     112               pumping S-1
   111                    P-101               13           is classified as a                130206                     pump
   112           pumping S-1                  14           is classified as a                192512                 pumping



Knowledge versus Data Models                                   8                                                    13/04/2010
113                      S-1                  15               is classified as a                    730083               liquid stream


Such a set of rows in a Gellish Table can be exchanged between Gellish enabled software
packages in any kind of table, such as an MS-Access database table, an Oracle or DB2 table,
XLS spreadsheet, an XML file (e.g. according to ISO 10303-28) or in STEP physical file
format (ISO 10303-21). Further details are described in ref. 1.
Note that the shaded light yellow boxes all have the same name: “is classified as a”.
However, they are different individual classification relations. Each of those relations has a
unique identifier (13, 14 and 15). The name in the shaded box indicates that each is
(implicitly) “conceptualized” to be a classification relation. In other words, each of them is a
“is classified as a” relation.
For a correct interpretation of the Gellish concepts they need to be defined in a computer
interpretable way. This is done via specialization/generalization relations as is illustrated in
figure 3. These specialization relations form one hierarchical network terminating at the top,
called ‘anything’. This generic top supports the wide applicability of Gellish, as any missing
concept can be added to Gellish as a subtype of an existing concept.


                                                                 anything

                                                          is aa specialization of
                                                           is specialization of                                                  individual things
 Green area = Gellish ontology
                                                            individual thing instance isis aninstance of
                                                                                         an instance of                           kinds of things
                                                                supertype                                               entity
                                     is aa specialization of
                                      is specialization of         is aa specialization of
                                                                    is specialization of         is aa specialization of
                                                                                                  is specialization of
                                            subtype                                                                                        instance
                                  physical object         supertype                 relation                                          activity
             supertype                                                           supertype                                                 supertype
  is aa specialization of
   is specialization of        is aa specialization of
                                is specialization of        is aa specialization of
                                                             is specialization of            is aa specialization of
                                                                                              is specialization of           is aa specialization of
                                                                                                                              is specialization of
                subtype                      subtype                        subtype                       subtype                          subtype
          pump                    is performer of                liquid stream                    is subject in                      pumping
                classifier                   classifier                     classifier                     classifier                       classifier
   is classified as aa
    is classified as            is classified as aa
                                 is classified as              is classified as aa             is classified as aa
                                                                                                is classified as                 is classified as aa
                                                                                                                                  is classified as
                                                                is classified as
                classified                                                  classified                   classified                        classified
         ‘P-101’                                                      ‘S-1’              ‘subject in pumping S-1’                  ‘pumping S-1’
                                           classified                               player                              requirer
                            ‘performer of pumping S-1’
             player                                                                                                          requirer



            Figure 3, Definition of Gellish concepts in a specialization hierarchy
In practice there are several intermediate levels of specialization between e.g. ‘pump’ and
‘physical object’ and ‘anything’, etc. Furthermore there are classes of physical objects
defined as subtypes of ‘physical object’. These can be extended by specializations, such as
standard components (e.g. from ASME, BSI or DIN standards) and also specializations such
as manufacturer catalogue items (e.g. Manufacturer models and types).
Figure 3 contains eight facts expressed as eight “is a specialization of” relations, each of
which is a separate relation between classes. Similarly to what is described above about the
“is classified as a” relation, this illustrates that the term ‘is a specialization of’ is not the


Knowledge versus Data Models                                            9                                                         13/04/2010
name of each of those relations, but it is a name of the Gellish concept (the class) that is the
conceptualization of those relations.
The knowledge about the meaning of the concepts pump, ‘is performer of’, liquid stream,
‘is subject in’ and pumping is defined in the Gellish ontology STEPlib. Some of that is
illustrated in the following facts, which includes some intermediate facts not shown in
figure 3 (the UID’s and names are taken from STEPlib, except for the UID’s of the facts):
Left hand      Left hand       Fact UID      Relation type name      Right hand      Right hand
object UID    object name                                            object UID      object name
 130206           pump             16       is a specialization of     730044       physical object
  4761       is performer of       17       is a specialization of      4767        is involved in
  4761       is performer of       18        requires as role-1 a      640020         performer
 730044      physical object       19       can have as role as a      640020         performer
  4761       is performer of       20        requires as role-2 a       4773           involver
 730083       liquid stream        21       is a specialization of     730045           stream
  4760        is subject in        22       is a specialization of      4767        is involved in
 192512         pumping            23       is a specialization of     190168          process


This knowledge is inherited from higher concepts in the hierarchy to lower level concepts.
If an individual object is classified to be of such a class, then the knowledge is applicable to
the individual object as a constraint for the specific aspects of the individual object.
Experiences and applications
Gellish is applied to express
- information about individual objects,
- knowledge about kinds of objects,
- requirements for data and documents in particular contexts about individual objects and
about kinds of objects.
These three application are related to each other, as is illustrated in Figure 4.




Knowledge versus Data Models                    10                                    13/04/2010
Product / Requirements / Knowledge models
                Product Model                                  Requirements Model               Knowledge Model
                       has / is                              shall have a / shall be a       can have a / can be a
                                                               (in the context of a)
                       Dongting



                                                                   SHELLlib                         STEPlib

                        SGP
                                                                      DEP xxx
             Coal gasification
                  facility                                          shall comply with
                                                                                                    compressor
                  U-1300                                            shall have a

         K-1301 system                                                                         luboil system

            K-1301             is classified as a                                                        can have a
                                                                      shall have a

    LubOil-100
                                                                                                           capacity



 Copyright: Shell Global Solutions International B.V.

                                                Figure 4, Three types of Gellish Models
The left hand of Figure 4 represents a Product Model that illustrates a Gellish model of a
process plant (the thick black lines represent composition relations). The relation types in a
product model generally start with ‘is’ or ‘has’. For example, K-1301 system is part of
U-1300 and K-1301 is classified as a compressor.
The right hand Knowledge Model illustrates the content of the STEPlib knowledge base.
The relation types in a knowledge model generally start with ‘can be a’ or ‘can have a’. For
example, a compressor can have a capacity and a lubrication oil system can be part of a
compressor.
The middle part of Figure 4 illustrates a proprietary Requirements Model that expresses
which data has to be present in a particular context. The relation types in a requirements
model generally start with ‘shall be a’ or ‘shall have a’.
For example, we developed requirements models that express that in the context of
‘handover’ of data from design to operations a compressor shall have a capacity (in the
context of a handover) and a compressor shall be compliant with design guide xx, in the
same context. This is expressed in Gellish as follows:
130069                    compressor                    24          shall have a          551564        capacity
130069                    compressor                    25     shall be compliant with    5490386      DEP 31….


When data about a compressor is handed over, then this Gellish specification makes it
possible to do an automated verification of the completeness of that data, whereas that
verification is driven by the requirements model. This is illustrated in figure 5.



Knowledge versus Data Models                                         11                                13/04/2010
Figure 5, Automated verification of a design against a requirements model
The right hand side of figure 5 illustrates the content of the SHELLlib knowledge base,
which is a proprietary extension of STEPlib, which also uses Gellish. It illustrates how the
knowledge in STEPlib and SHELLLlib is inherited via the specialization hierarchy. Because
although P-101 is classified as a centrifugal pump, the requirement that is defined for a
pump in general can automatically be made applicable to P-101, because of the defined
inheritance via the specialization hierarchy.
The specialization hierarchy also enables intelligent queries. For example search engines can
perform intelligent searches on subtypes of keywords. For example, a document which is
recorded to contains information about a line shaft pump can also be found if documents are
searched about ‘centrifugal pump’. And a query on ‘pump’ can also find P-101, being
classified as centrifugal pump.
An example of a commercial application of Gellish is a Gellish Browser developed by Mi2.
The browser can read (and write) data expressed in the Gellish language and is able to
present any knowledge about classes of objects and any data about individual objects. It was
expected that implementation of Gellish would have serious performance issues. Therefore
the Browser was loaded with over 60.000 facts, originating from different systems, but all
expressed in a Gellish Table. These facts included the Gellish knowledge base, extended
with a Shell proprietary standards database, data about documents, a materials catalogue, an
equipment list and material balances of the design of a process plant.
It appears to have an excellent performance.




Knowledge versus Data Models                 12                                  13/04/2010
We also customized an implementation of the Eigner PLM product lifecycle management
system and loaded the same data in that system. This system also had a good performance.
We are currently working on the customization of existing systems so that they can export
data in a Gellish Table. The Browser can then be used to view data from various systems
and data can be imported and integrated with other data in the Eigner PLM system.
It is our intention to use a Gellish Table among others as a data exchange language for data
hand-over of design data between engineering contractors and plant owners and for data
about catalogue items and items delivered by suppliers.
Further work will explore the use of Gellish for the exchange of messages by intelligent
Agent software, acting as nodes in the Semantic Web. For example business communication
messages about transactions in E-procurement.
Conclusions
The above illustrates that the current practice to define data models separate from reference
data and user data is unnecessary. Integration of data model concepts with reference data
and user data in one consistent language can provide a single common standard language for
data storage and exchange that can significantly reduce development costs and can simplify
data communication.
A common use of the little data model of figure 2, together with the common use of the
Gellish ontology makes it possible to express and interpret a very wide scope of types of
facts. This is possible because the explicit classification relations provide interpretation rules
for the expressions for which the relation types as well as the object types are defined in
Gellish. It is only required to have the concepts defined in the Gellish knowledge base and
to refer to them as in the basic structure using the ‘basic semantic axioms’ mentioned above.
The above illustrates that:
   -   It is possible that a common standard knowledge base of concepts and relations
            between concepts can replace many data models.
   -   The Gellish knowledge base of concepts solution is more flexible than fixed data
          models and it is easier to add semantics to the database.
   -   The Gellish knowledge base of concepts provides an application independent
          language with a semantic basis that is equivalent to a very large data model. If
          sufficient concepts of an application domain are present or added, then data
          models for such an application domain can become superfluous.
   -   The Gellish knowledge base, using the inheritance capabilities of the specialization
          hierarchy, provides extendable product models for many types of objects.
   -   The implementations have proven that a Gellish knowledge base can be
          implemented with good performance.
   -   The implementations have proven that neutral format data exchange using a Gellish
       Table is a feasible solution.
As Gellish is in the public domain, proposals for extensions of the Gellish language are
invited.
References
       1. Andries van Renssen, “The Gellish Table and its Formats”. A definition of the
             Gellish Table and its implementation syntax for Gellish messages.
             www.steplib.com.



Knowledge versus Data Models                    13                                    13/04/2010
2. Andries van Renssen, “Guide on STEPlib”. This guide describes how STEPLib
            is defined and how to extent the Gellish language and knowledge base.
            www.steplib.com.
      3. STEPlib, the Gellish knowledge base. This is a set of Gellish Tables (available in
            Excel and in MS Access). The upper level ontology part is documented in the
            TOPini part. www.steplib.com.
      4. Tim Berners-Lee, James Hendler and Ora Lassila, 'The Semantic Web',
            Scientific American, May 2001;
            http://www.sciam.com/2001/0501issue/0501berners-lee.html.
      5. OWL, Web Ontology Language Overview. http://www.w3.org/TR/owl-features/
      6. Ian Niles and Adam Pease (2001), “Towards a Standard Upper Ontology”, in:
             Formal Ontology in Information Systems, ISBN 1-58113-377-4.
      7. SUO (2001), The IEEE Standard Upper Ontology website, http://suo.ieee.org.
      8. Lenat, D. (1995), “Cyc: A Large-Scale Investment in Knowledge Infrastructure”,
            Communications of the ACM, 38, no 11 (November 1995).
      9. Wolfgang Degen, Barbara Heller, Heinrich Herre and Barry Smith (2001),
           “GOL: A General Ontological Language”, in: Formal Ontology in
           Information Systems, ISBN 1-58113-377-4.
      10. The Epistle Core Data Model (2001),
         http://www.btinternet.com/~chris.angus/epistle/specifications/ecm/ecm_400.html




Knowledge versus Data Models                14                                 13/04/2010

Weitere ähnliche Inhalte

Was ist angesagt?

A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontologyIJwest
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big DataKouji Kozaki
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionKent State University
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translateIJwest
 
The Standardization of Semantic Web Ontology
The Standardization of Semantic Web OntologyThe Standardization of Semantic Web Ontology
The Standardization of Semantic Web OntologyMyungjin Lee
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning usingIJwest
 
2008 Industry Standards for C2 CDM and Framework
2008 Industry Standards for C2 CDM and Framework2008 Industry Standards for C2 CDM and Framework
2008 Industry Standards for C2 CDM and FrameworkBob Marcus
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
 
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchSwoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchIDES Editor
 
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentComparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentIJAEMSJORNAL
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...Amit Sheth
 
Dublin Core Application Profile for Scholarly Works KE
Dublin Core Application Profile for Scholarly Works KEDublin Core Application Profile for Scholarly Works KE
Dublin Core Application Profile for Scholarly Works KEJulie Allinson
 
Dublin Core Application Profile for Scholarly Works Slainte
Dublin Core Application Profile for Scholarly Works SlainteDublin Core Application Profile for Scholarly Works Slainte
Dublin Core Application Profile for Scholarly Works SlainteJulie Allinson
 
A Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlA Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlIRJET Journal
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologySteven Miller
 

Was ist angesagt? (20)

A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontology
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big Data
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Ontology
OntologyOntology
Ontology
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translate
 
The Standardization of Semantic Web Ontology
The Standardization of Semantic Web OntologyThe Standardization of Semantic Web Ontology
The Standardization of Semantic Web Ontology
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
 
2008 Industry Standards for C2 CDM and Framework
2008 Industry Standards for C2 CDM and Framework2008 Industry Standards for C2 CDM and Framework
2008 Industry Standards for C2 CDM and Framework
 
Applying Semantic Web Technologies to Services of e-learning System
Applying Semantic Web Technologies to Services of e-learning SystemApplying Semantic Web Technologies to Services of e-learning System
Applying Semantic Web Technologies to Services of e-learning System
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchSwoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic Search
 
Ontology
Ontology Ontology
Ontology
 
Ievobio2010cdaostore
Ievobio2010cdaostoreIevobio2010cdaostore
Ievobio2010cdaostore
 
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentComparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
Dublin Core Application Profile for Scholarly Works KE
Dublin Core Application Profile for Scholarly Works KEDublin Core Application Profile for Scholarly Works KE
Dublin Core Application Profile for Scholarly Works KE
 
Dublin Core Application Profile for Scholarly Works Slainte
Dublin Core Application Profile for Scholarly Works SlainteDublin Core Application Profile for Scholarly Works Slainte
Dublin Core Application Profile for Scholarly Works Slainte
 
A Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlA Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using Xml
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
 

Andere mochten auch

Bolsas De Plastico
Bolsas De PlasticoBolsas De Plastico
Bolsas De PlasticoJose Torres
 
Corruption its definitions and typologies
Corruption its definitions and typologiesCorruption its definitions and typologies
Corruption its definitions and typologiesAlexander Decker
 
Eesti kool - õpi- või riskiühiskonna kool?
Eesti kool - õpi- või riskiühiskonna kool? Eesti kool - õpi- või riskiühiskonna kool?
Eesti kool - õpi- või riskiühiskonna kool? Ene-Silvia Sarv
 
Improving QA on PHP projects - confoo 2011
Improving QA on PHP projects - confoo 2011Improving QA on PHP projects - confoo 2011
Improving QA on PHP projects - confoo 2011Michelangelo van Dam
 
Gabriel garcía márquez
Gabriel garcía márquezGabriel garcía márquez
Gabriel garcía márquezDvendify
 

Andere mochten auch (6)

Bolsas De Plastico
Bolsas De PlasticoBolsas De Plastico
Bolsas De Plastico
 
Corruption its definitions and typologies
Corruption its definitions and typologiesCorruption its definitions and typologies
Corruption its definitions and typologies
 
Db2
Db2Db2
Db2
 
Eesti kool - õpi- või riskiühiskonna kool?
Eesti kool - õpi- või riskiühiskonna kool? Eesti kool - õpi- või riskiühiskonna kool?
Eesti kool - õpi- või riskiühiskonna kool?
 
Improving QA on PHP projects - confoo 2011
Improving QA on PHP projects - confoo 2011Improving QA on PHP projects - confoo 2011
Improving QA on PHP projects - confoo 2011
 
Gabriel garcía márquez
Gabriel garcía márquezGabriel garcía márquez
Gabriel garcía márquez
 

Ähnlich wie Are Data Models Superfluous Nov2003

Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentJorge Barreto
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseEditor IJMTER
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database EssayTammy Moncrief
 
Oudg cross model datum access
Oudg cross model datum accessOudg cross model datum access
Oudg cross model datum accesscsandit
 
Making the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked DataMaking the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked DataKingsley Uyi Idehen
 
Robust Module based data management system
Robust Module based data management systemRobust Module based data management system
Robust Module based data management systemRahul Roi
 
Space efficient structures for json documents
Space efficient structures for json documentsSpace efficient structures for json documents
Space efficient structures for json documentsIAEME Publication
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanPeter Berger
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsBaden Hughes
 
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...David Milward
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAudrey Britton
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1Sonia Mim
 
Mc0077 – advanced database systems
Mc0077 – advanced database systemsMc0077 – advanced database systems
Mc0077 – advanced database systemsRabby Bhatt
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorCognitum
 
Semantic technologies at work
Semantic technologies at workSemantic technologies at work
Semantic technologies at workYannis Kalfoglou
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyIJwest
 

Ähnlich wie Are Data Models Superfluous Nov2003 (20)

Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented Database
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database Essay
 
Oudg cross model datum access
Oudg cross model datum accessOudg cross model datum access
Oudg cross model datum access
 
Making the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked DataMaking the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked Data
 
Robust Module based data management system
Robust Module based data management systemRobust Module based data management system
Robust Module based data management system
 
Space efficient structures for json documents
Space efficient structures for json documentsSpace efficient structures for json documents
Space efficient structures for json documents
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary Linguistics
 
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain Ontology
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
 
Mc0077 – advanced database systems
Mc0077 – advanced database systemsMc0077 – advanced database systems
Mc0077 – advanced database systems
 
Data models
Data modelsData models
Data models
 
DBMS - Introduction
DBMS - IntroductionDBMS - Introduction
DBMS - Introduction
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditor
 
Semantic technologies at work
Semantic technologies at workSemantic technologies at work
Semantic technologies at work
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontology
 

Are Data Models Superfluous Nov2003

  • 1. Gellish A standard data and knowledge representation language and ontology Are Data Models becoming Superfluous? by Ir. Andries van Renssen Shell Global Solutions International Andries.vanRenssen@shell.com Abstract Data storage and data communication lack a common standard universal data model as well as a common data language and knowledge base with a taxonomy of concepts and a grammar for data exchange messages. This article presents a solution to this problem in the form of the new Gellish language and knowledge base, as an extension of the standard data models and ontology of two new ISO standards. The article presents Gellish as a language for neutral data exchange between systems, that can replace data models, and that provides an extendable ontology with standard reference data for customization and harmonization of systems. The definition of Gellish includes the public domain (“open data”) Gellish knowledge base with definitions of a large number of concepts and product models. It illustrates that a single Gellish Table in a database or data exchange file, is sufficient to express a wide range of kinds of facts about classes as well as facts about individual objects. Keywords: knowledge representation, data exchange, language, data models, standards, ontology, semantic web, knowledge base, classification system Table of Content Knowledge versus Data Models 1 13/04/2010
  • 2. Introduction Currently, each software system stores its data using its own data model and communicates with other systems usually using a dedicated interface data structure, which means that it applies a dedicated interface data model. The large variety of data models cause that data exchange between systems is costly because of the required conversion of the data from the semantics of one data model to the other. This demonstrates the urgent need for widely applicable common standard data models. Often systems can be ‘customized’ by adding ‘reference data’ as instances, such as the definition of equipment types, document types, activity types, property types, pick lists, etc. However, reference data are usually different per implementation, even when database structures of different systems are equal, such as is the case with several implementations of the same system. This also holds for different implementations of the same system, such as a CAD, CAE, PDM, PLM, ERP or CRM system. The consequence is that data in those implementations can still not be compared, integrated or exchanged without costly data conversion processes. This illustrates the urgent need for a common dictionary, classification system or taxonomy of reference data, because there is currently no standard user data language. In the current systems there is a separation between the world of data models and the world of instances. Data models are developed by IT specialists (data modelers) who document them using either proprietary tools or using a standard data modeling language, such as EXPRESS (ISO 10303-11) or UML, which languages are especially designed to define data models. Once a data model is defined in such a language, the data model acts as another language in which the reference data as well as the user data has to be expressed. The use of two different languages, one for the model, one for the user data, illustrates the barrier between the two worlds. It is as if the English language definition is expressed in Chinese. On top of this comes that each programmer and each reference data producer is free to define his own terminology using those data definition languages! The result of the current state of the art is that data storage is done in a Babylonian mix of data models and reference data ‘languages’ with the consequence that exchange of data between systems is impossible, except where dedicated bilateral translators are created not only for each pair of data models, but also for the data content ‘languages’. The current situation is sketched by Smith and Welty (2001) as follows: “Out of the apparent chaos, some coherence is beginning to emerge. Gradually, computer scientists are beginning to recognize that the provision, once for all, of a common, robust reference ontology – a shared taxonomy of entities – might provide significant advantages over the ad-hoc, case-by-case methods previously used”. Several attempts are made to develop an ‘upper ontology’, such as SUMO by Niles and Pease (2001), the IEEE Standard Upper Ontology, SUO (2001), the Cyc ontology, Lenat (1995) and GOL, Degen et al (2001). However none of them integrates the upper level ontology with a lower level ontology of reference data. In other words they do not integrate a generic data model with reference data and a language for the description of knowledge and of individual objects and processes. This article presents a solution to the above-mentioned issues in the form of the Gellish language. Gellish satisfies the criteria for proper ontologies as expressed by Degen et al (2001 par 6.1), but is not limited to an upper ontology. It includes and extents concept definitions that also appear in other sources such as ISO standards and IEC standards, and knowledge stemming from industry standards and proprietary sources. It is extendable just as any natural language. Its taxonomy and knowledge base uses unique identifiers for Knowledge versus Data Models 2 13/04/2010
  • 3. concepts, thus allowing for synonyms and multiple names in various languages. The latter enables the expression of propositions about facts in one natural languages and automatic translation and presentation in any other natural language. Gellish eliminates the traditional barrier between the data model definitions of classes and the data instances. The Gellish language demonstrates that this barrier is not necessary and that there are clear advantages when class definitions, reference data and user data are expressed in one and the same language. Standard data models, ontologies and reference data There are several developments of standard lower level ontologies and reference data libraries, stimulated among others by requirements of the e-commerce ‘market places’ and the developments around The Semantic Web promoted by Lee et al (2000) and the Web Ontology Language OWL. For example, the UNSPSC code (http://www.unspsc.org/), Ecl@ss (http://www.eclass.de/), Trade Ranger (http://www.trade-ranger.com/EN/Pages/ContentStandards.asp), etc. These standards have their value mainly in the standardization of terminology, but do not provide a standard language or a standard data model for general use, because of their limited semantic expression power due to the fact that they apply only a few relation types and lack of integration with a rich upper ontology. There have also been several attempts to develop standard data models for data exchange or for data storage. Some of them are proprietary, but others are in the public domain. Those standard data models are defined independent of a particular system, and are therefore called ‘neutral’. Those standard data models are usually developed for a particular application domain instead of being limited to a particular system. Examples of standard data models are the STEP family of standards in ISO 10303, such as a graphics data model AP203, a data model for the automotive industry (AP214), one for piping systems (AP227), one under development for the defense industry (AP239, PLCS), etc. The integration of all those data models into one overall data model is not yet fully achieved. Although the scopes of these valuable standard data models are wide, they are still limited to particular application area’s and do not provide a general ‘common language’ yet. A further step towards a data model with a generic scope was the development and publication of the Epistle Core Data Model (2001), in which development the author of this article participated. From that, two new ISO standards were derived, ISO 15926-2 and its counterpart within the STEP family (AP221). Although these generic data models stem from the process industries, they have the generic nature of an upper level ontology, which make that they are applicable in other application domains as well. To become practically applicable in a particular application domain, these generic data models need a standard ‘reference data library’ or lower level ontology, in order to add standard definitions of application domain specific concepts and to specialize the generic data model. The author coordinated the development of such a standard reference data library, called STEPlib. This is a main source for the common standard library ISO 15926-4. Then it was discovered that the top of the specialization hierarchy of standard data in the library coincided with the entities, attributes and relations in the generic data model. This led to the inclusion of the data model in the library. In other words, the upper level ontology was combined with a lower level ontology. The insight that information should be contained in relations and not in objects, led to the birth of the Gellish language, which is based on standard relation types, expressed by natural language ‘phrases’. Knowledge versus Data Models 3 13/04/2010
  • 4. The Gellish language and ontology Gellish is a public domain standard data and knowledge representation language and ontology that that is defined in STEPlib. It does not have the barrier between the user data and the IT data model data. It contains and extents the concepts of the above mentioned generic data models and integrated and extended them with standard reference data and a knowledge base with product and process models. The ontology includes also the definition of a large number of standard fact types (or relation types) that defines the grammar of the Gellish language. It contains the definition of over 20.000 concepts arranged in a specialization hierarchy of classes. These concepts can be interpreted as entity types, attribute types and relationship types or as a classification system or taxonomy. This makes Gellish equivalent to a very large data model. In addition to that STEPlib contains a large number of relations between the concepts. They define the content of the knowledge base of product models and process models. Gellish is not object oriented, but fact oriented. The basic Gellish object is therefore a fact. Each (atomic) fact is expressed as a relation between (two) objects. For example, fact 1 is expressed by a particular relation between objects with unique identifiers (UID’s) 100 and 101. This expression (1, 100, 101) illustrates the structure of each basic Gellish expression. Gellish requires that both the objects and the fact must be classified explicitly by standard classes, including standard relation types. The standard classes are predefined in the Gellish ontology. In addition to that, objects may have a name. This enables that the expression can be interpreted correctly by software. Gellish and the above mentioned ISO standards are both based on the understanding that there appears to exist a limited set of application independent standard relation types that are sufficient to model all kinds of products and processes. Gellish standardizes these relation types. The relation types also define the role types that the related objects play in the relations with each other. The variety and extendibility of standard relation types define the semantic expression capabilities of Gellish. A large part of the Gellish relation types is defined in the ISO standards and an extended set is defined in the TOP part of the Gellish language definition (STEPlib). A standard implementation of Gellish is defined as a Gellish Table. In a Gellish Table the basic Gellish expression becomes: Left hand Left hand Fact Relation Relation type Right Right hand object object UID type UID name hand object name UID name object UID 100 thing-1 1 2850 is related to 101 thing-2 In a Gellish Table one (atomic) fact is represented by one record, being as a relation between two object UID’s, the names of the objects and the classification of the fact. The classification of the objects is done via separate classification facts in additional records. Some examples of facts from a particular application domain, which illustrates the use of standard Gellish relation types are: Left Left hand Fact Relation Relation type name Right Right hand Scale hand object name UID type UID hand object name object object UID UID 130091 diesel engine 2 1146 is a specialization of 130108 engine Knowledge versus Data Models 4 13/04/2010
  • 5. 104 M-1 3 1225 is classified as a 130091 diesel engine 130802 cylinder 4 1146 is a specialization of 730063 artifact 107 C-1 5 1225 is classified as a 130802 cylinder 107 C-1 6 1190 is part of 104 M-1 107 C-1 7 1727 has aspect 108 volume of C-1 108 volume of C-1 8 1225 is classified as a 550140 internal volume 108 volume of C-1 9 2044 is quantified as 922235 1800 cm3 104 M-1 10 4760 is subject of 110 order-1 • Note, for human readability, the relation type UID is ignored in the tables below. The above table illustrates: • Standard Gellish relation types, that classify the facts, and that determine the expression capabilities and semantics of Gellish. • Examples from the large number of standard object types that are predefined in Gellish. For example: engine, diesel engine, cylinder, artifact, internal volume, 1800 and cm3. • The way in which new object types can be added: such as fact 2 and 4. Although they already exist in Gellish. But if diesel engine and cylinder would not have existed, they could have been added in this way. • It is possible in Gellish to express facts, such as the volume of C-1, without the need that such a fact is pre-modeled in the data model. Although such a fact type could be defined in Gellish, after which this particular instance can be verified against such a definition. It could also be defined to be obligatory in a particular context, after which the instances can be validated on completeness and compliance. • One table is suitable to express many kinds of facts. Note: The table above presents just an example of some of the capabilities of Gellish. For example, Gellish also allows to express in which language the facts are expressed, whether the objects are real or imaginary, what the communicative intent is, who the author of a proposition is and the addressee, etc. Storage and exchange of data as well as semantics in Gellish In this paragraph I will describe how knowledge, data and semantics are represented in Gellish. The generic nature of Gellish allows expressing any complex network of facts. For example it allows expressing that: - physical objects (of any kind) have properties (of any kind), - properties have values, - physical objects have parts, - physical objects participate in activities or processes in particular roles, - etc. But for clarity I will use a specific example, being the fact that: - a particular pump (‘P-1’) is pumping a particular stream (‘S-1’). Knowledge versus Data Models 5 13/04/2010
  • 6. In a conventional database it is required to declare some entity types and attribute types that define the semantics in the form of a data model. In case of the example, the data model could for example consist of the entity types ‘pump’, ‘process’ and ‘stream’, each with some attributes. In Gellish, the concepts ‘pump’, process’ and ‘stream’ are not entity types, but they are concepts that are defined via facts that are expressed as relations in a generic knowledge base. The knowledge base has a structure that only ‘knows’ the minimum number of ‘basic semantic axioms’ and contains the definition of a large number of concepts. The minimum set of ‘basic semantic axioms’ comprises the fundamental ontological concepts of Gellish that should be known and understood and which are sufficient for the definition of additional semantic concepts. For the definition of a new concept it is required to define a coherent set of elementary facts, expressed as relations between the new concept and the existing concepts. In other words, each new concept requires the creation of a structure as presented in figure 1. kind of thing is (a) is a is a role anything playing requirement relation (of something a role of role in relation) plays in - object-1 - role-1 played by requires - relation-1 - object-2 - role-2 Figure 1, Basic semantic concepts The minimum set of ‘basic semantic concepts’ that are the axioms of Gellish and which meaning should be understood is: - anything - role - relation / relations - plays role - requires role - is / is a (is classified as a) - individual thing / individual things - kind of thing / kind of things - single thing / plural thing The structure of figure 1 holds for facts about classes as well as facts about individual objects (instances) or relations, but also for single objects as well as for plural objects. In other words, object-1 and object-2 in figure 1 can be either a single or plural individual object, relation or class. The lines in the top left corners of the boxes indicate that the structure is a typical instance. Knowledge versus Data Models 6 13/04/2010
  • 7. Any other ‘atomic fact’ is expressed as such a structure. In other words, any atomic fact is expressed as an ‘atomic relation’ between two or more ‘objects’ and by the classification of the ‘objects’, the ‘roles’ and the ‘relation’. This implies that an atomic fact is expressed by a structure of nine (9) relations, formed by the blue boxes in figure 2 (note that 4 of the 5 boxes appear twice in an atomic fact). For example the fact that impeller O1 is part of centrifugal pump O2 is expressed in Gellish by the following 4 elementary relations: - O1 plays role R1 - R1 is required by C1 - C1 requires role R2 - R2 is played by O2 These 4 relations relate 5 objects. To interpret them correctly the following 5 additional classification relations are required: - O1 is classified as an impeller - R1 is classified as a part - C1 is classified as a composition relation (“is part of”) - R2 is classified as a whole - O2 is classified as a centrifugal pump In practical implementations it appears that the explicit identification of the roles and their classification can be neglected, because they follow from the classification of the relation and the definition of the relation type. Therefore the above relations are usually summarized in 3 Gellish atomic expressions as follows: - O1 is classified as an impeller - O1 is part of O2 - O2 is classified as a centrifugal pump From this example it can be seen that the 5 kinds of things with which the 5 objects are classified need to be present in or added to the semantics of the Gellish knowledge base in order to ensure that the fact can be interpreted correctly. The awareness that a knowledge base of predefined concepts is required for a correct interpretation of Gellish expressions resulted in the development of the top-down hierarchical definition of the Gellish knowledge base of concepts, including also relation types, as available in STEPlib. Knowledge representation: relations between classes Any fact type that extends the semantics is expressed as a relation between kinds of things. For example, assume that the concept ‘centrifugal pump’ needs to be added. Then the following two atomic relations define that concept: 1. A specialization relation that defines that: centrifugal pump is a specialization of pump 2. A relation that defines that a centrifugal pump by definition uses the centrifugal principle: centrifugal pump has by definition as aspect centrifugal. These relations build respectively on the definition of the concept ‘pump’ and ‘centrifugal’. Knowledge versus Data Models 7 13/04/2010
  • 8. Interpretation of expressions In current database technology the semantic interpretation of an expression is done via the fact that any object is implicitly classified by being an ‘instance’ of an entity of which the semantics are defined. For example, assume that P1 is an instance of an attribute called ‘name’ of an entity called ‘pump’. This probably means that P1 is the name of a thing that is classified as a pump, although this meaning comprises two facts that are usually not defined in a computer interpretable way. It should be noted that if there are no other attributes, this data structure does not allow the classification of P1 as a centrifugal pump. In Gellish all semantics is made explicit by the creation of explicit classification relations between the elements in the expression and the Gellish concepts (classes of objects, including relations). This replaces the instantiation relations and eliminates the need to define a data model with entities and attributes, such as the entity ‘pump’ and the attribute ‘name’. This is illustrated in figure 3. Green shaded area = Gellish ontology (STEPlib) 130206 730083 192512 pump is performer of liquid stream is subject in pumping classifier classifier classifier classifier classifier 13 15 14 is classified as aa is classified as is classified as aa is classified as is classified as aa is classified as aa is classified as is classified as aa is classified as is classified as classified classified classified classified 12 112 ‘P-101’ 111 ‘S-1’ 113 ‘is subject in pumping S-1’ ‘pumping S-1’ 11 classified player requirer ‘is performer of pumping S-1’ player requirer Figure 2, Linking a Gellish expression to Gellish concepts through classification Figure 2 illustrates the expression: P-101 is pumping S-1” (in dark yellow). The ‘pumping S-1’ process is an interaction between the fluid S-1 and the pump P-101. The pump has the role as performer and the liquid has the role as subject in the pumping process. The blue boxes in the green shaded area represent the Gellish concepts, being instances in the Gellish knowledge base, STEPlib. The explicit classification relations with the concepts in those blue boxes provide the semantics for the interpretation of the expression. In a Gellish Table this becomes: Left hand Left hand Fact UID Relation type name Right hand Right hand object UID object name object UID object name 111 P-101 11 is performer of 112 pumping S-1 113 S-1 12 is subject in 112 pumping S-1 111 P-101 13 is classified as a 130206 pump 112 pumping S-1 14 is classified as a 192512 pumping Knowledge versus Data Models 8 13/04/2010
  • 9. 113 S-1 15 is classified as a 730083 liquid stream Such a set of rows in a Gellish Table can be exchanged between Gellish enabled software packages in any kind of table, such as an MS-Access database table, an Oracle or DB2 table, XLS spreadsheet, an XML file (e.g. according to ISO 10303-28) or in STEP physical file format (ISO 10303-21). Further details are described in ref. 1. Note that the shaded light yellow boxes all have the same name: “is classified as a”. However, they are different individual classification relations. Each of those relations has a unique identifier (13, 14 and 15). The name in the shaded box indicates that each is (implicitly) “conceptualized” to be a classification relation. In other words, each of them is a “is classified as a” relation. For a correct interpretation of the Gellish concepts they need to be defined in a computer interpretable way. This is done via specialization/generalization relations as is illustrated in figure 3. These specialization relations form one hierarchical network terminating at the top, called ‘anything’. This generic top supports the wide applicability of Gellish, as any missing concept can be added to Gellish as a subtype of an existing concept. anything is aa specialization of is specialization of individual things Green area = Gellish ontology individual thing instance isis aninstance of an instance of kinds of things supertype entity is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of subtype instance physical object supertype relation activity supertype supertype supertype is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of subtype subtype subtype subtype subtype pump is performer of liquid stream is subject in pumping classifier classifier classifier classifier classifier is classified as aa is classified as is classified as aa is classified as is classified as aa is classified as aa is classified as is classified as aa is classified as is classified as classified classified classified classified ‘P-101’ ‘S-1’ ‘subject in pumping S-1’ ‘pumping S-1’ classified player requirer ‘performer of pumping S-1’ player requirer Figure 3, Definition of Gellish concepts in a specialization hierarchy In practice there are several intermediate levels of specialization between e.g. ‘pump’ and ‘physical object’ and ‘anything’, etc. Furthermore there are classes of physical objects defined as subtypes of ‘physical object’. These can be extended by specializations, such as standard components (e.g. from ASME, BSI or DIN standards) and also specializations such as manufacturer catalogue items (e.g. Manufacturer models and types). Figure 3 contains eight facts expressed as eight “is a specialization of” relations, each of which is a separate relation between classes. Similarly to what is described above about the “is classified as a” relation, this illustrates that the term ‘is a specialization of’ is not the Knowledge versus Data Models 9 13/04/2010
  • 10. name of each of those relations, but it is a name of the Gellish concept (the class) that is the conceptualization of those relations. The knowledge about the meaning of the concepts pump, ‘is performer of’, liquid stream, ‘is subject in’ and pumping is defined in the Gellish ontology STEPlib. Some of that is illustrated in the following facts, which includes some intermediate facts not shown in figure 3 (the UID’s and names are taken from STEPlib, except for the UID’s of the facts): Left hand Left hand Fact UID Relation type name Right hand Right hand object UID object name object UID object name 130206 pump 16 is a specialization of 730044 physical object 4761 is performer of 17 is a specialization of 4767 is involved in 4761 is performer of 18 requires as role-1 a 640020 performer 730044 physical object 19 can have as role as a 640020 performer 4761 is performer of 20 requires as role-2 a 4773 involver 730083 liquid stream 21 is a specialization of 730045 stream 4760 is subject in 22 is a specialization of 4767 is involved in 192512 pumping 23 is a specialization of 190168 process This knowledge is inherited from higher concepts in the hierarchy to lower level concepts. If an individual object is classified to be of such a class, then the knowledge is applicable to the individual object as a constraint for the specific aspects of the individual object. Experiences and applications Gellish is applied to express - information about individual objects, - knowledge about kinds of objects, - requirements for data and documents in particular contexts about individual objects and about kinds of objects. These three application are related to each other, as is illustrated in Figure 4. Knowledge versus Data Models 10 13/04/2010
  • 11. Product / Requirements / Knowledge models Product Model Requirements Model Knowledge Model has / is shall have a / shall be a can have a / can be a (in the context of a) Dongting SHELLlib STEPlib SGP DEP xxx Coal gasification facility shall comply with compressor U-1300 shall have a K-1301 system luboil system K-1301 is classified as a can have a shall have a LubOil-100 capacity Copyright: Shell Global Solutions International B.V. Figure 4, Three types of Gellish Models The left hand of Figure 4 represents a Product Model that illustrates a Gellish model of a process plant (the thick black lines represent composition relations). The relation types in a product model generally start with ‘is’ or ‘has’. For example, K-1301 system is part of U-1300 and K-1301 is classified as a compressor. The right hand Knowledge Model illustrates the content of the STEPlib knowledge base. The relation types in a knowledge model generally start with ‘can be a’ or ‘can have a’. For example, a compressor can have a capacity and a lubrication oil system can be part of a compressor. The middle part of Figure 4 illustrates a proprietary Requirements Model that expresses which data has to be present in a particular context. The relation types in a requirements model generally start with ‘shall be a’ or ‘shall have a’. For example, we developed requirements models that express that in the context of ‘handover’ of data from design to operations a compressor shall have a capacity (in the context of a handover) and a compressor shall be compliant with design guide xx, in the same context. This is expressed in Gellish as follows: 130069 compressor 24 shall have a 551564 capacity 130069 compressor 25 shall be compliant with 5490386 DEP 31…. When data about a compressor is handed over, then this Gellish specification makes it possible to do an automated verification of the completeness of that data, whereas that verification is driven by the requirements model. This is illustrated in figure 5. Knowledge versus Data Models 11 13/04/2010
  • 12. Figure 5, Automated verification of a design against a requirements model The right hand side of figure 5 illustrates the content of the SHELLlib knowledge base, which is a proprietary extension of STEPlib, which also uses Gellish. It illustrates how the knowledge in STEPlib and SHELLLlib is inherited via the specialization hierarchy. Because although P-101 is classified as a centrifugal pump, the requirement that is defined for a pump in general can automatically be made applicable to P-101, because of the defined inheritance via the specialization hierarchy. The specialization hierarchy also enables intelligent queries. For example search engines can perform intelligent searches on subtypes of keywords. For example, a document which is recorded to contains information about a line shaft pump can also be found if documents are searched about ‘centrifugal pump’. And a query on ‘pump’ can also find P-101, being classified as centrifugal pump. An example of a commercial application of Gellish is a Gellish Browser developed by Mi2. The browser can read (and write) data expressed in the Gellish language and is able to present any knowledge about classes of objects and any data about individual objects. It was expected that implementation of Gellish would have serious performance issues. Therefore the Browser was loaded with over 60.000 facts, originating from different systems, but all expressed in a Gellish Table. These facts included the Gellish knowledge base, extended with a Shell proprietary standards database, data about documents, a materials catalogue, an equipment list and material balances of the design of a process plant. It appears to have an excellent performance. Knowledge versus Data Models 12 13/04/2010
  • 13. We also customized an implementation of the Eigner PLM product lifecycle management system and loaded the same data in that system. This system also had a good performance. We are currently working on the customization of existing systems so that they can export data in a Gellish Table. The Browser can then be used to view data from various systems and data can be imported and integrated with other data in the Eigner PLM system. It is our intention to use a Gellish Table among others as a data exchange language for data hand-over of design data between engineering contractors and plant owners and for data about catalogue items and items delivered by suppliers. Further work will explore the use of Gellish for the exchange of messages by intelligent Agent software, acting as nodes in the Semantic Web. For example business communication messages about transactions in E-procurement. Conclusions The above illustrates that the current practice to define data models separate from reference data and user data is unnecessary. Integration of data model concepts with reference data and user data in one consistent language can provide a single common standard language for data storage and exchange that can significantly reduce development costs and can simplify data communication. A common use of the little data model of figure 2, together with the common use of the Gellish ontology makes it possible to express and interpret a very wide scope of types of facts. This is possible because the explicit classification relations provide interpretation rules for the expressions for which the relation types as well as the object types are defined in Gellish. It is only required to have the concepts defined in the Gellish knowledge base and to refer to them as in the basic structure using the ‘basic semantic axioms’ mentioned above. The above illustrates that: - It is possible that a common standard knowledge base of concepts and relations between concepts can replace many data models. - The Gellish knowledge base of concepts solution is more flexible than fixed data models and it is easier to add semantics to the database. - The Gellish knowledge base of concepts provides an application independent language with a semantic basis that is equivalent to a very large data model. If sufficient concepts of an application domain are present or added, then data models for such an application domain can become superfluous. - The Gellish knowledge base, using the inheritance capabilities of the specialization hierarchy, provides extendable product models for many types of objects. - The implementations have proven that a Gellish knowledge base can be implemented with good performance. - The implementations have proven that neutral format data exchange using a Gellish Table is a feasible solution. As Gellish is in the public domain, proposals for extensions of the Gellish language are invited. References 1. Andries van Renssen, “The Gellish Table and its Formats”. A definition of the Gellish Table and its implementation syntax for Gellish messages. www.steplib.com. Knowledge versus Data Models 13 13/04/2010
  • 14. 2. Andries van Renssen, “Guide on STEPlib”. This guide describes how STEPLib is defined and how to extent the Gellish language and knowledge base. www.steplib.com. 3. STEPlib, the Gellish knowledge base. This is a set of Gellish Tables (available in Excel and in MS Access). The upper level ontology part is documented in the TOPini part. www.steplib.com. 4. Tim Berners-Lee, James Hendler and Ora Lassila, 'The Semantic Web', Scientific American, May 2001; http://www.sciam.com/2001/0501issue/0501berners-lee.html. 5. OWL, Web Ontology Language Overview. http://www.w3.org/TR/owl-features/ 6. Ian Niles and Adam Pease (2001), “Towards a Standard Upper Ontology”, in: Formal Ontology in Information Systems, ISBN 1-58113-377-4. 7. SUO (2001), The IEEE Standard Upper Ontology website, http://suo.ieee.org. 8. Lenat, D. (1995), “Cyc: A Large-Scale Investment in Knowledge Infrastructure”, Communications of the ACM, 38, no 11 (November 1995). 9. Wolfgang Degen, Barbara Heller, Heinrich Herre and Barry Smith (2001), “GOL: A General Ontological Language”, in: Formal Ontology in Information Systems, ISBN 1-58113-377-4. 10. The Epistle Core Data Model (2001), http://www.btinternet.com/~chris.angus/epistle/specifications/ecm/ecm_400.html Knowledge versus Data Models 14 13/04/2010