This paper is based upon practical experiences of Conceptual modelling, using CIDOC CRM, of the single context recording system at English Heritage and mapping it to other 'single context' based systems. It also presents recent work on identifying conceptual commonalities that may exist in different archaeological recording methodologies, whether 'single context recording' or otherwise, along with practical challenges based on experiences of trying to integrate, or simply search across, data from different archaeological recording systems. In addition it introduces the work to date on developing http://www.heritagedata.org/ and suggests opportunities for sharing and aligning further archaeological vocabularies using SKOS and Linked Open Data technologies.
CAA 2014 - To Boldly or Bravely Go? Experiences of using Semantic Technologies for Archaeological Resources
1.
by
Keith May @Keith_May
Ceri Binding & Prof Doug Tudhope
Faculty of Advanced Technology
University of South Wales
To Boldly or Bravely Go? Experiences of
using Semantic Technologies for
Archaeological Resources
2. Excavation record data modelling
• CRM-EH focuses on common ‘core’
Concepts of our Archaeological processes
• Stratigraphic relationships (e.g. Harris
matrix) crucial for relating individual records
• Mapped only a Limited degree of the minute
archaeological detail to CIDOC CRM
• Different broad categories of contexts
(Deposits, Masonry, Timber, etc) handled by
separate forms but modelled together
• Model already "complex" enough - most
archaeologists find it a little daunting
Details of
Context on
recording
form
3. What about comparing records across different countries?
With thanks to Anja Masur
4. Documentation
• Different excavation methods bring differing documentation
• Comparison of different documentation sheets
Similarities and Differences
10. Catalhoyuk - Hodder's
'Post-Processual' excavation recording
Units - Stratigraphic units,
similar to Contexts
Features - groupings of
units or more complex
structures, similar to
MoLA Groups
11. French - e.g. ???? Please !!!!
Examples using Single
Context Recording
methodology?
INRAP N'est pas?
Other excavation
methodologies?
14. ▪Controlled vocabularies online
▪Vocabularies from EH, RCAHMS, RCAHMW
▪Conversion to a common standard format (SKOS)
▪Persistent globally unique identifiers for every concept
▪Made available online as Linked Open Data
▪Also downloadable data files and listings
▪Web services
▪Facilitate concept searching, browsing, suggestion, validation
▪ Tools to use controlled vocabularies
▪Browser-based ‘widget’ user interface controls
▪Search, browse, suggest, select concepts
▪Case studies
▪Legacy data to thesaurus alignment
▪Thesaurus to thesaurus alignment
▪Third party use of project outcomes
15. STELLAR Project Tools - SKOS Template
SKOS = Simple Knowledge Organisation System
Using SKOS - W3C standard for Web-based Terminologies
17. Voacabulary Widgets – e.g. for OASIS
▪ Scheme list
▪ Scheme details
▪ Top concepts
▪ Composite control
(composite control)(top concepts)
(scheme details)
(scheme list) More Widget details on HeritageData.org
20. - Semantic ENrichment Enabling Sustainability of arCHAeological LinksSENESCHAL
Early adoption (continued)
▪Clwyd-Powys Archaeological Trust (SENESCHAL widgets
embedded into HER application and mobile field
recording app)
22. Typical alignment problems encountered
▪ Simple spelling errors
▪ POSTHLOLE”, “CESS PITT”, “FURRROWS”, FLINT SCRAPPER”
▪ Alternate word forms
▪ “BOUNDARY”/”BOUNDARIES”, “GULLEY”/”GULLIES”
▪ Prefixes / suffixes
▪ “RED HILL (POSSIBLE)”, “TRACKWAY (COBBLED)”, “CROFT?”, “CAIRN (POSSIBLE)”,
“PORTAL DOLMEN (RE-ERECTED)”
▪ Nested delimiters
▪ “POTTERY, CERAMIC TILE, IRON OBJECTS, GLASS”
▪ Terms not intended for indexing
▪ “NONE”, “UNIDENTIFIED OBJECT”, “N/A”, “NA”, “INCOHERENT”
▪ Terms that would not be in (any) thesauri
▪ “WOTSITS PACKET”, “CHARLES 2ND COIN”, “ROMAN STRUCTURE POSSIBLY A VILLA“,
“ST GUTHLACS BENEDICTINE PRIORY”, “WORCESTER-BIRMINGHAM CANAL”,
“KUNGLIGA SLOTTET”, “SUB-FOSSIL BEETLES”
▪ More specific phrases
▪ “SIDE WALL OF POT WITH LUG”, “BRICK-LINED INDUSTRIAL WELL OR MINE SHAFT”,
“ALIGNMENT OF PLATFORMS AND STONES”
23. Data alignment - R&D approach
▪Levenshtein edit distance algorithm
▪ Measures optimal number of character edits
required to change one string into another
▪ Accommodates small spelling differences/errors
▪ Bulk alignment process
▪ Compares each value to all terms from specified
thesaurus – obtain best textual match
▪ Similarity threshold introduced to suppress low
scoring matches. Levenshtein algorithm will always
produce a match, even if it is a bad one!
▪ Periods require an additional approach due to mixed
formats (named periods, numeric ranges etc.)
24. Data Alignment R&D Results – Monument Types
Needs some level of
Human verification by
Domain experts.
Do we need semantic
wiki -style
interfaces
To enable that?
25. Conclusions and Challenges -
Do you want to share Open Archaeological Data
somewhere on or over the horizon?
Different archaeological recording systems share
common conceptual frameworks and semantic
relationships
By conceptualising common relationships in our
different data sets at a broad (metadata) level and
aligning vocabularies of shared reference terms we can
cross-search data with more semantic accuracy to find
patterns and answers to related research questions
The technologies are being developed in other
domains but is there a common will for sharing
archaeological data Openly in the interests of
improving research methods?
26. References
Catalin Pavel. "Describing and Interpreting the Past"
Tudhope, May, Binding, Vlachidis. "Connecting
Archaeological Data and Grey Literature via Semantic
Cross Search" - Internet Archaeology Vol 30
Contact:
Keith.May@english-heritage.org.uk
@Keith_May