Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013
Ähnlich wie SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links / Peter McKeague, RCAHMS, on behalf of the SENESCHAL Project team
Ähnlich wie SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links / Peter McKeague, RCAHMS, on behalf of the SENESCHAL Project team (20)
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links / Peter McKeague, RCAHMS, on behalf of the SENESCHAL Project team
1. SENESCHAL
SENESCHAL: Semantic ENrichment Enabling
Sustainability of arCHAeological Links
Peter McKeague
(On behalf of project partners)
peter.mckeague@rcahms.gov.uk
www.rcahms.gov.uk
http://canmore.rcahms.gov.uk
2. Outline of talk
Part I
RCAHMS
What we do
What we hold
Classifying
Part II
Drivers for Linked Data
Part III
SENESCHAL
Project Partners
The Project so far
Prospects
3. RCAHMS Mission Statement
• Identifies, surveys and analyses the
historic and built environment of Scotland
• Preserves, cares for and adds to the
information and items in its national
collection
• Promotes understanding, education
and enjoyment through interpretation of
the information it collects and the items
it looks after
11. RCAHMS thesauri : suggest a term
http://orapweb.rcahms.gov.uk/apex/f?p=210:1:
12. Part II: Drivers for Linked Data
We already publish our thesauri as key reference datasets for use by professional
archaeologists in national organisations, in local authority Historic Environment
Records as well as by anyone interested in the historic environment.
BUT
Our vocabularies (and other data) are not visible
The thesaurus architecture limits the potential of the terminology
Terms lack the persistent URIs that would allow our resources to act as hubs for
the Web of Data.
Interoperability
---For heritage, the main exponents of Linked Data are from the research community,
and in Scotland primarily from Computer Scientists
13. Drivers for Linked Open Data
It is Government policy
Open Data White paper June 2012:
http://data.gov.uk/sites/default/files/Open_data_White_Paper.pdf
Scotland’s Digital Future April 2013:
http://www.scotland.gov.uk/Resource/0042/00421478.pdf
14. Drivers for Linked Open Data
It is Government policy: Open Data White Paper June 2012:
• Public data policy and practice will be clearly driven by the public and businesses who
want to use the data, including what data is released, when and in what form
• Public data will be published in reusable, machine-readable form
• Public data will be released under the same open licence which enables free reuse,
including commercial reuse
• Public data will be published using open standards, and following relevant
recommendations of the World Wide Web Consortium
• Public data from different departments about the same subject will be published in the
same, standard formats and with the same definitions
• Public data underlying the Government’s own website will be published in re-usable form
• Release data quickly, and then work to make sure it is available in open standard
formats, including Linked data forms.
15. ... And a practical use
An online submission form to report fieldwork from contractors to curators
17. Lineage
STAR: Semantic Technologies for Archaeological resources
2007-2010
AHRC funded project with English Heritage to apply semantic and knowledge-based
technologies to the digital archaeological domain. STAR developed new methods for linking
digital archive databases, vocabularies and the associated grey literature, exploiting the
potential of a high level, core ontology and natural language processing techniques.
http://hypermedia.research.southwales.ac.uk/kos/star/
STELLAR: Semantic Technologies Enhancing Links and Linked data for Archaeological
Resources
2010-2011
AHRC funded project with the ADS and English Heritage. Building on the outcomes of STAR,
STELLAR provided support for non-specialist users to map and extract datasets.
http://hypermedia.research.southwales.ac.uk/kos/stellar/
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links 2013-2014
AHRC funded project with the ADS, English Heritage, RCAHMS, RCAHMW and Wessex
Archaeology.
http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/
and
http://www.heritagedata.org
18. The SENESCHAL Project
seneschal n. Historical
The steward or major-domo of a medieval great house
12 month AHRC funded project
March 2013 - February 2014
Deliverables
Controlled vocabularies online
Linked data (SKOS)
Downloadable files
Web services
term suggestion, term validation, legacy data alignment
Tools to align data with controlled vocabularies
Browser-based ‘widget’ controls
19. Interoperability
“The terminology of a subject is the key to
interoperability” (John F. Sowa)
Interoperability requires more than just a common
data model
Data compatibility occurs on 2 levels – semantic
and syntactic. Ontologies / data structures deal
with the semantic but not necessarily the syntactic
“The CRM relies on existing syntactic interoperability
and is concerned only with adding semantic
interoperability” (CIDOC CRM documentation)
20. You say potato, I say tomato…
Multiple datasets, multiple
organisations, multiple languages
Unification of data structures is
possible, BUT…
Incompatible terminology hinders
cross search and prevents greater
interoperability
Applications attempting to reuse
data must all individually sort out
the same old problems
E.g. Get all the iron age post holes…
Feature
Period
Post-hole
IRON AGE
Posthole
|ron age
POST HOLE
Iron age?
POSTHLOLE
EARLY IRON AGE
POST HOLE
(POSSIBLE)
250 BC
POSTHOLES
C 500-200 B.C.
Solution: data cleansing and controlled vocabularies?
21. Typical interoperability issues encountered
Simple spelling errors
POSTHLOLE”, “CESS PITT”, “FURRROWS”, FLINT SCRAPPER”
Alternate word forms
“BOUNDARY”/”BOUNDARIES”, “GULLEY”/”GULLIES”
Prefixes / suffixes
“RED HILL (POSSIBLE)”, “TRACKWAY (COBBLED)”, “CROFT?”, “CAIRN
(POSSIBLE)”, “PORTAL DOLMEN (RE-ERECTED)”
Nested delimiters
“POTTERY, CERAMIC TILE, IRON OBJECTS, GLASS”
Terms not intended for indexing
“NONE”, “UNIDENTIFIED OBJECT”, “N/A”, “NA”, “INCOHERENT”
Terms that would not be in (any) thesauri
“WOTSITS PACKET”, “CHARLES 2ND COIN”, “ROMAN STRUCTURE POSSIBLY A
VILLA“, “ST GUTHLACS BENEDICTINE PRIORY”, “WORCESTER-BIRMINGHAM
CANAL”, “KUNGLIGA SLOTTET”, “SUB-FOSSIL BEETLES”
More specific phrases
“SIDE WALL OF POT WITH LUG”, “BRICK-LINED INDUSTRIAL WELL OR MINE
SHAFT”, “ALIGNMENT OF PLATFORMS AND STONES”
22. Solutions - SENESCHAL
Controlled vocabularies (again)
Commonly agreed concepts, terminology and identifiers
Existing / new thesauri – community contributions?
Openness and availability
Licensing, web services, downloads, data formats
Alignment of existing data
Data cleansing tools
Alignment techniques
Alignment of new data
Interactive embedded data entry tools
Validation at point of data entry
Rather than trying to solve this vocabulary problem, help to
prevent it from happening in the first place
23. Vocabularies online as (SKOS) Linked Data
Vocabularies from English Heritage
Monument Types Thesaurus
Objects Thesaurus
Event Types Thesaurus
Maritime Craft Thesaurus
RCHME Cultural Periods List / MIDAS Archaeological Periods List
Vocabularies from RCAHMS
Monument Thesaurus (Scotland)
Multilingual - includes Scottish Gaelic translations!
Objects (Scotland)
Maritime Craft (Scotland)
Vocabularies from RCAHMW
Monument Thesaurus (Wales)
Event (Wales)
Period (Wales)
Moving from term based towards concept based indexing
Start to create links between concepts… between vocabularies… between datasets…
between sites… between countries
Cross searching of (multilingual) cultural heritage resources
25. Data licensing and attribution using CC REL
Attribution back to original data providers
URI
cc:attributionURL
cc:attributionURL
cc:license
cc:license
skos:ConceptScheme
skos:Concept
URI
cc:attributionName
cc:attributionName
[literal value]
dct:creator
dct:creator
URI
dc:source
URI
dc:source
27. Linked Data API (preliminary)
The project will implement a Linked Data (restful) API
The base URI maybe http://www.heritagedata.org/ or http://purl.org/xxx/..
Seneschal is a sub-project within the wider scope of ‘heritagedata.org’ – so:
http://www.heritagedata.org/seneschal - wiki/blog for project details, and
<base uri>/schemes/123 (e.g.) for actual data API – see below…
Proposed REST API:
/schemes – return list of all SKOS concept schemes held
/schemes/search - (with parameters) – search for schemes
/schemes/{id} – return details of specified SKOS concept scheme (current version)
/schemes/{id}.html, .n3, .rdf, .json – return different serializations of that data,
obtained either by content negotiation or by direct request including extension
/schemes/{id}/concepts – return list of ALL SKOS concepts in specified scheme
/schemes/{id}/concepts/search – search for concepts in the specified scheme
/concepts – return list of all SKOS concepts in ALL schemes
/concepts/search - (with parameters) – search for concepts in any scheme
/concepts/{id} – return details of specified SKOS concept (current version)
/concepts/{id}.html, .n3, .rdf, .json – return different serializations of the data,
obtained either by content negotiation or by direct request including extension
/concepts/{id}/schemes - return list of all schemes referencing the specified concept
36. Versioning (preliminary)
/schemes/{id} – returns current version of the specified scheme
/schemes/{id}/versions – returns all versions of the specified
scheme
/schemes/{id}/versions/{id} – returns specified version of the
specified scheme
/concepts/{id} – returns current version of the specified concept
/concepts/{id}/versions – returns all versions of the specified
concept
/concepts/{id}/versions/{id} – returns specified version of the
specified concept
dct:hasVersion
[skos:ConceptScheme]
data:schemes/123
(dct:isVersionOf)
[skos:ConceptScheme]
data:schemes/123/versions/20111005
dct:hasVersion
(dct:isVersionOf)
[skos:ConceptScheme]
data:schemes/123/versions/2013020301
38. A question of jurisdiction
TENEMENT (Scotland)
http://purl.org/heritagedata/schemes/1/concepts/467
A large building containing a number of rooms or flats,
access to which is usually gained via a common stairway.
TENEMENT (England)
http://purl.org/heritagedata/schemes/eh_tmt2/concepts/68997
A parcel of land.
TENEMENT (Wales)
http://purl.org/heritagedata/schemes/10/concepts/68997
TENEMENT BLOCK (England)
http://purl.org/heritagedata/schemes/eh_tmt2/concepts/71489
Use for speculatively built 19th century "model dwellings",
rather than those built by a philanthropic society.
TENEMENT BLOCK (Wales)
http://purl.org/heritagedata/schemes/10/concepts/71489
TENEMENT HOUSE (England)
http://purl.org/heritagedata/schemes/eh_tmt2/concepts/71476
SC674834
289 Allison Street, Glasgow: TENEMENT
http://canmore.rcahms.gov.uk/en/site/148111/
Originally built as a family house. Converted into flats during the
19th or 20th century.
39. A question of jurisdiction
SC683414
Cruck Framed Byre, Latheron, Caithness
http://canmore.rcahms.gov.uk/en/site/86630/
A Cruck House in Wick, Worcestershire
Cruck cottage in Wick
Philip Halling
http://creativecommons.org/licenses/by-sa/2.0/
40. A bheil Gàidhlig agaibh?
DP151933
The Cenotaph, George Square, Glasgow: http://canmore.rcahms.gov.uk/en/site/143264/
43. Challenges for RCAHMS
Controlled vocabularies online
Integration of project deliverables into RCAHMS processes
Managing candidate terms
Publishing additional vocabularies
Jurisdiction
- a single British thesaurus for Cultural heritage?
Adding images
Moving the goalposts
44. Summary
Controlled vocabularies online
Linked data (SKOS)
Downloadable files
Linking out
Mapping between the different thesauri
Web services
term suggestion, term validation, legacy data alignment
Tools to align data with controlled vocabularies
Browser-based ‘widget’ controls
http://www.heritagedata.org/blog/work-in-the-pipeline/
I am giving this presentation on behalf of the SENESCHAL Project led by the University of South Wales - who provide the wizardry - knowledge and skills to transform key reference datasets for heritage in Britain. This presentation is given from the perspective of a vocabulary owner wanting to, but lacking the skills, to get involved in publishing Linked Data.In the talk I will introduce some of the work of the Royal Commission What we do, hold and how we use vocabularies to classify and index some of our records. I will then look at the drivers for engaging with Linked Data before looking at SENESCHAL.Who is involved, where the project is at and what is still to be done.
Our monuments inventory table works with the relevant industry standards – MIDAS heritage which is informed by the CIDOC- CRM . ISO 21127 (2005) CIDOC is the documentation committee of the International Council of Museums and the Council of Europe Cultural Heritage Committee
Information is entered via controlled field in the internal staff database (eliminating errors)
Web services from the Oracle database are consumed by the web development team who design and publish Canmore – our online serachable database. Information from individual classifications is concatenated into a single string.
And thesauri terms are key to searches on Canmore There is also a link to our online thesauri
Where users can select thesaurus and search by hierarchy – or by text Preferred terms are represented in Upper case, non-preferred in lower-case with a re-direction to the relevant preferred term. Terms do not have unique identifiers
The online form also allows users to suggest candidate terms which are validated locally at RCAHMS.
We already publish our thesauri (and data) online through Canmore – where information can be accessed by professional archaeologists in local government , central government and accessed by anyone interested in the subject. So why do more?The thesauri are not visible Existing architecture limits the potential of the terminologyThey lack persistent URIs that would allow our resources to act as hubs for the Web of Data Drivers are external - primarily from the research community - and in Scotland this is computer science not the heritage professionals There is little perceived value – or demonstrable gain from the end user used to accessing data from a range of disparate sources .
Explicit in the UK Government Open Data white paper from 2012 with drivers for Open data in Scottish Government documentation relating to Scotlands Digital Future
Key point here is Key point is that Public data from different departments about the same subject will be published in the same, standard formats and with the same definitions Public data policy and practice will be clearly driven by the public and businesses who want to use the data, including what data is released, when and in what form Public data will be published in reusable, machine-readable form Public data will be released under the same open licence which enables free reuse, including commercial reuse Public data will be published using open standards, and following relevant recommendations of the World Wide Web Consortium Public data from different departments about the same subject will be published in the same, standard formats and with the same definitions Public data underlying the Government’s own website will be published in re-usable form Release data quickly, and then work to make sure it is available in open standard formats, including Linked data forms.
OASIS: as part of a planning condition in both Scotland and England most commercial archaeological fieldwork is reported through OASIS - an online form developed and maintained by the Archaeology Data Service on behalf of EH and RCAHMS. Currently users are able to terms for monuments and objects as free text though there is a link to the RCAHMS online thesaurus for Scotland. The types of activity reported are controlled by check boxes but do not map to the event terms used by RCAHMS and EH. SO there is a clear requirement for vocabulary alignment.
The AHRC funded SENESCHAL project was set up to address the need for persistent URIS for key vocabularies - explore vocabulary alignment and develop RESTful services for the data.Leading the project Professor Doug Tudhope with Ceri Binding at The Hypermedia Unit, School of Computing and Mathematics at the University of South Wales who are providing the specialist knowledge required to transform and align datasets provided by EH, RCAHMS and the RCAHMS as well as the AD who in addition to developing and hosting the OASIS form, have extensive online digital archives
Through the STAR and Stellar projects, in partnership with EH, the University of South Wales has an extensive background working with heritage information – While the STAR and STELLAR research objectives were met, they encountered a lack of vocabulary control (with unique identifiers) that hindered the full potential of the resulting Linked Data.Hence the current Knowledge Transfer Project widened out to include both Scotland and Wales.
Like all good projects, it has an acronym: SENESCHAL the steward of a medieval great house.The project deliverables are to Provide Controlled vocabularies online - as linked data (SKOS) and as downloadable files This part was completed in July 2013.It intends to develop RESTFUL web services – term suggestion, term validation and legacy data alignmentAnd tools to align data with controlled vocabularies through browser-based ‘widget controls
In both the STAR project and the more recent STELLAR project we observed from legacy archaeological datasets a tendency to allow free text data entry (leading to simple anomalies), and to “decorate” controlled terms with additional text (so the terms are no longer controlled!). Minor differences in spelling or punctuation can hinder the successful alignment of vocabulary so affecting wider interoperability and building up problems for the future.Not dictated or handled by CRM itself – left to implementation. Each is a barrier to full interoperability. Data can therefore conform to an ontology, but still lack interoperability.Quoted source: http://www.jfsowa.com/talks/cnl4ss.pdf - Refers to syntactic interoperability, still need common data model for greater semantic interoperability.
Issues encountered in archaeological metadata range from simple spelling errors to a conscious attempt to create additional structure or description within free text fieldsThere are multiple datasets, multiple organisations, multiple languages and dialectsUnification of data structures is possible BUT incompatible terminology hinders cross-searching and prevents greater interoperability.Applications attempting to reuse data must all individually sort out the same old problems Addressed through data cleansing and controlled vocabularies though there are tensions between descriptive and controlled vocabularies...
The following is a breakdown of various issues as empirically observed in some existing (archaeological) metadata sets intended to conform to controlled vocabularies. Simple spelling errors , alternate word forms / Prefixes and suffixes , nested delimiters, terms not intended for indexing, or should not be included in any thesauri, or more specific phrases...
Controlled vocabularies (again)Commonly agreed concepts, terminology and identifiersExisting / new thesauri – community contributions?Openness and availabilityLicensing, web services, downloads, data formatsAlignment of existing dataData cleansing toolsAlignment techniquesAlignment of new dataInteractive embedded data entry toolsValidation at point of data entry Rather than trying to solve this vocabulary problem, help to prevent it from happening in the first place
The SKOS data model to manage the concepts and their relationships
and managing the data licencing and attribution . RCAHMS data is licenced under an OGL, everything else is under a CC0 licence
Copies of the original Vocabularies are transformed through templates developed as part of Stellar into SKOS RDF vocabularies which are added to the Seneschal data store which feeds out data as Web services and to Sparql endpoint
The project will implement a Linked Data (restful) API It has now established a base URI and proposes a number of REST APIs
So just over halfway through the project what has been achieved? They have established the domain heritagedata.org with information about the project, links to the data providers and to the vocabularies.
Provides a list of al l the contributed vocabularies - we can select the RCAHMS monument thesaurus
Provides a record for the concept Monument thesaurus and lists the top level terms in that scheme.Note the property for the licenincg (repeated at every level)The coverage – which references the Ordnance Survey URI for Scotland (Only at top level)Also across all schema the ability to download a concept in one of a series of formats: n3, turtle, JSON and xmlWe select the Broad Term Domestic
Defines the Broad term and lists the narrow terms - we select Broch
This is the concept level record for ‘Broch’
users may also search for a term within a chosen thesaurus
with the results providing a list of matches, the URI for each returned term and the scope note.
there is also a SPARQL end point
Seneschal is also exploring managing versioning within the schema , though this requires more thought as it could get complicated – need to think this through a bit more before committing to it!
As you will see we have each published thesauri for Monuments, England and Scotland have Object and maritime thesauri. England and Wales use period thesauri..England has events and archaeological sciences vocabularies - RCAHMS also has similar lists but has not declared them as part of SENESCHAL. For monuments this is wrapped around issues of jurisdiction whereas he terms in the Events thesauri are equally applicable in England and Scotland so there seems little point in re-inventing the wheel.... So should there be a single Thesaurus of British Cultural heritage ?
Jurisdiction : There are differences in the way we use monument terms – beyond regionality. The Scottish tenement is a very different concept from the English and Welsh concept of a parcel of land and the Scottish tenement does not equate to architectural terms containing tenement in England or Wales
And again with Crucks – where a component in the architecture defines the monument type.Scotland has Cruck-framed buildings, barns and byres whereas England has Cruck house. Both refer to techniques of construction a cruck-framed building in Scotland is a very different concept to its English neighbour.
Moving to a SKOS based approach of defining the Concept rather than the term allows us to address issues of language and regionality.This is the cenotaph in George Square, Glasgow
SENESCHAL has benefited from a BordnaGaidlig sponsored project led by Historic Scotland translate the existing terms in the Scottish Monument thesaurus into Gaelic. To date we have added about 100 of these terms into the thesaurus: So the concept for Cenotaph includes both preferred and variant terms expressed in Gaelic.
And this can be expressed as a graph.Concept 205: CENOTAPH has preferred terms in English and Gaelic but also manages alternate versions of the gaelicConcept 205 is a narrower term of Concept 203: Commemorative MonumentAnd a related term of 206: Commemorative Cairn 210: War memorial and 1727: Tomb They are all part of the Concept shemes: Scottish Monuments Thesausus
Integration of project deliverables into RCAHMS processes Managing candidate terms – we have a working system !Issues over jurisdiction - is a single thesaurus desirable – depends what the subject matter is! Moving the goalposts- we have a successful website what is the benefit of Linked data...And I still don’t know (or need to know? ) what do do with the SPARQL endpoint... Or turtle...