A presentation given by Keith May and me at CAA 2004 held in Prato, Italy. The topic was a sub-project which emerged from the English Heritage Revelation project; the Ontological Modelling project. This project looked at a range of existing data models, paper forms, databases and other source information and through discussions with domain specialists, created a representation of the information archaeologists use based on the CIDOC Conceptual Reference Model (CRM).
Axa Assurance Maroc - Insurer Innovation Award 2024
To OO or not to OO? Revelations from defining an ontology for an archaeological information system
1. To OO or not to OO? Revelations from defining an ontology for an archaeological information system English Heritage C entre for Archaeology (CfA) Paul Cripps & Keith May
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
Hinweis der Redaktion
The Ontological Modeling Project is a CfA project that derives from the assessment stage of the Revelation project
The ontological modelling project developed from the original work on the Revelation project at the English Heritage Centre for Archaeology. Revelation carried out a review of existing systems and this painted a picture of a fragmented number of different parts of CfA each with different Information Systems that don’t speak to each other except by a series of ad hoc “routes”. Further Revelation work looking at sectoral practice suggested that this picture was by now means unique to CfA and that there would be value in trying to develop models of that could better express the relationships between archaeological data and processes at a conceptual level in addition to more standard data flow diagrams and entity modelling.
We needed to express the existing use of data within CfA in a way that could be understood at a general level by the users as representing how they went about their work on archaeological projects and how it related to others within the different specialist teams at CfA. There was also a requirement to try and model the current state of affairs, but in a way that would enable us to show how the data could be better structured in future for sharing and interoperability not just within the CfA but also for the wider organisation in EH and beyond to the archaeological sector and public. To do this it seemed that the use of an ontology for expressing not just the keywords in the data, but also the conceptual meanings behind the information held in various systems could provide a way to begin. The principle aims in adopting an ontological approach were: Shared understanding of information – building bridges between different data sets Encapsulating & re-using domain expertise – ability to share across different specialisms Enabling searching by non-domain experts – using semantic web compatible approaches
The CIDOC CRM has evolved from the world of museums documentation. It is only more recently becoming known in the archaeological world (as the number of papers in this years conference seems to testify), but there appeared to be a number of advantages to using the CRM for modelling archaeological recording systems. First and possibly the biggest selling point is that the modelling approach is based on mapping the knowledge of the domain experts. Rather than setting some prescriptive set of standard terms that everyone had to use or else be incompatible, there was considerable appeal to archaeologists in an approach that only asked that existing data be mapped to a more conceptual model for it to be usable. Defining conceptual processes that analyse the data without necessarily following simple data processing techniques (e.g. ability to model phasing and grouping) Relating archaeological data to environmental, geological, agricultural domains. Event based modelling of archaeological activities Extensibility of the CRM to allow local extensions of the model while maintaining compatibility Using the CRM for modelling gave the advantages of OO modelling without pre-determining a non- relational implementation * Using an existing ontology such as CRM should provide greater standardisation and interoperability with similar data sets
At its core the CRM consists of a few high level concepts. These few concepts have been extended by a larger set of specialisations that allow us to more fully describe the Heritage Domain. Some of the archaeological examples that map to the CRM are given below: Expand on archaeological examples of this as time allows: The core concepts are: E2 Temporal Entities - These are all things that happen in time. They include events like the creation of objects, their loss, deposition, discovery, interpretation and conservation. E18 Physical Stuff - These are persistent physical items with a relatively stable form both man-made and natural. Examples include pottery, coins, pollen and carbonised seeds. E28 Conceptual Objects - These are non-material products of our minds. They include things like project designs, reports, the mark used by a potter, text books, songs and military orders. E39 Actors - These are people, either individually or in groups, who have the potential to perform intentional actions for which they can be held responsible. Individuals include people like potters, archaeologists and scientists. Groups include project teams, archaeological societies, excavation units and English Heritage. E52 Time-Spans - These are temporal extents that have a beginning, an end and a duration. For example “The duration of the Catterick project” “The duration of the use of the X potters mark” E53 Places - These are mathematical extents in space. They are usually relative to the surface of the earth but can be relative to some other fixed body of matter (for example the bow of a ship is a place). For example “The total extent of the excavation in 1967” E41 Appellations - These are all proper names, words, phrases or codes that are used to identify something. For instance John Smith, J. Smith and Smith, John are all names. They are different from the person. Other examples include context 1456, Lyons, English Heritage and The Portland Vase. E55 Types - These are the classifications used to characterise something. For example Samian is a type of pottery. This is where all thesauri, word lists and controlled vocabulary fit into the CRM.
Finding a suitable methodology proved less straight-forward as the CRM does not actually include this. The overall approach adopted was derived from general examples of ontology building. However we were not building the ontology so much as mapping CfA information to it and defining methods for how we would actually actually use the CRM. The approach broadly was as follows: Acquire domain knowledge - Defined our domain as CfA information systems – interviews with domain experts 2. Organize the ontological model - This can be seen as two basic operations - identifying the global concepts (Classes) that best match the data being created - identifying the properties (roles & relationships between the classes) 3. Flesh out the ontological model - drawing the diagrams - text documentation of entities/classes and relationships 4. Check the work - re-iterate discussions and checking of diagrams with domain experts - circulation to domain experts 5. Commit the ontological model - final verification by CRM community - broaden usage as appropriate to wider archaeological community Partly because we found this an area that was less documented we felt writing this paper would help others looking for methodologies.