Invited speaker talk given at the 'Meeting on Semantic Web and Archives, Libraries and Museums' event, Fundación Ramón Areces, Madrid, Spain. 10th April 2014.
http://www.fundacionareces.es/fundacionareces/cargarAplicacionAgendaEventos.do?verPrograma=1&idTipoEvento=1&identificador=1634&nivelAgenda=2
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
1. Meeting on Semantic Web and Archives, Libraries and Museums
n n Areces, Madrid, Spain. 10th April 2014
Adrian Stevenson
Senior Technical Innovations Coordinator
Mimas, University of Manchester, UK
@adrianstevenson
“Il n’y a pas de hors-texte” –
Challenges for Archival Linked Data
2. “Il n’y a pas de hors-texte”
‘Of Grammatology’
Jacques Derrida, 1967
7. Deconstruction / Context
• Archives Hub data in ‘Encoded Archival
Description’ EAD XML form
• Need to think about:
– knowing what we want to say about our ‘things’
– data modelling
– defining relationships
– selecting vocabularies
– deciding on identifiers – HTTP URIs
– creating RDF XML
– linking to external resources
11. Visualisation Prototype
Using Timemap –
– Googlemaps and
Simile
– http://code.google.com/p/time
map/
Early stages with this
Will give location and
‘extent’ of archive.
Will link through to
Archives Hub
14. Linking Lives
• Linking Lives is a project to create an end-user
interface based on Linked Data
• A biographical interface, providing information
about individuals that is taken from a variety
of sources
• Aim is to place archival descriptions within a
much broader context
15. Martha Beatrice Webb
Place of birth:
Gloucester, England
Place of death:
Liphook, Hampshire, Englan
d
Life dates: 1858-1943
Epithet: social reformer and
historian
Family name: Webb
Image
from: Beatrice Webb letters
Beatrice Webb (1858 - 1943). Fabian Socialist, social
reformer, writer, historian, diarist. Wife, collaborator and assistant of
Sidney Webb, later Lord Passfield. Together they contributed to the
radical ideology first of the Liberal Party and later of the Labour
Party.
from: Beatrice Webb, A summer holiday in Scotland, 1884.
Beatrice Webb (1858-1943), nee Potter, social reformer and diarist.
Married to Sidney Webb, pioneers of social science. She was involved
in many spheres of political and social activity including the Labour
Party, Fabianism, social observation, investigations into
poverty, development of socialism, the foundation of the National
Health Service and post war welfare state, the London School of
Biographical Notes
Works
Our Partnership
My Apprenticeship
The case for the factory acts
Beatrice Webb’s diaries; edited by Margaret Cole
The Diary
Knows
http://dbpedia.org/page/George_Bernard_Shaw
http://dbpedia.org/page/Sidney_Webb,_1st_Bar
on_Passfield
16.
17.
18. Why?
• Telling stories
• Placing archives in a global information space
• External data forms part of the user interface
– moving away from the silo approach
• Dynamic links to other content
• Extensible
• An exemplar – shows what can be done
19. Some Challenges / Lessons Learnt
• Steep learning curve
• Difficult data, URI persistence
• Linking data not straightforward
• Keeping data up to date
• How sustainable are the data sources?
• Can you track the provenance of data
sources?
• Are data licensing issues covered?
20. Data Modelling
• Steep learning curve
–RDF terminology “confusing”
–Lack of archival examples
• Complexity
–Archival description is hierarchical and
multi-level
–RDF may be at odds with ISAD(G)
21. Hub data inconsistencies
• Winston Leonard Churchill
• Sir Winston Leonard Spencer Churchill
• Churchill, Sir, Winston Leonard Spencer, 1874-
1965, knight, prime minister and historian
• Churchill, Winston Leonard, 1874-1965, prime
minister
• Churchill, Sir Winston, 1874-
1965, knight, statesman and historian
25. Thoughts on What Next?
• We still need more convincing use / business
cases
– Clear articulation of what researchers actually gain
by bringing diverse data together
• We still need more and better tools
– But this depends on use cases
• Cultural heritage not working together enough
– better collaboration on things like name URIs
• Coordinated consistent approach for vocabs
27. This presentation is available under creative
commons Non Commercial-Share Alike:
http://creativecommons.org/licenses/by-nc/2.0/uk/
Hinweis der Redaktion
Generally considered to be a more accurate translation.Got me thinking about archival context.Got me thinking about process of creating linked data somewhat like deconstruction – breaking down what we have thinking about things – then reconstruct. This process the possibly problematizes the notion of archival context – RDF model problematizes notion of ISAD(G) and archival context and document centric ways of thinking.
Hub is an aggregation of archival descriptions from archive repositories across the UK.The core data comes from the Archives Hub, UK aggregator of archival descriptions – forms the basis of the linked dataApprox500,000 component level descriptions
Talk through the page a bit.1,495,168 statements currently - triples in LD subset
‘Every story has a beginning’
Mock-up of the LInking Lives interface shows the way data is brought together.
External data is key to linked data. We link to VIAF and through that to DBPedia. We are looking at linking to the BNB.
Current unfinished version of the interface.
Data modelling can be hard – takes timeVocabularies can be hardTransforming data hardXSLT hardNot many toolsWorth the investment?
Steep learning curve: - RDF Linked Data modelingterminology - Lack of archive domain examples – though you now have LOCAH! - Certain level of expertise neededDirty Data - Joe Bloggs and others’ rather than just a name, or where the access points do not have rules or a source associated with them. - Extent data highly variableComplexity - “lower level” units interpreted in context of the higher levels of description - Arguably “incomplete” without the contextual data.Relations are asserted, e.g. member-of/component-ofBut there is no requirement or expectation that data consumers will follow the links describing the relationsFrom Pete’s blog post:“In a post on the Archives Hub blog, Jane emphasised the value of the “Linked Data” approach in making things mentioned in our data into “first-class citizens”. One consequence of the multi-level approach in archival description practice is a strong sense of the importance of “context”, and that the descriptions of the “lower level” units should be read and interpreted in the context of the higher levels of description (perhaps even that they are in some sense “incomplete” without that “contextual” data). In contrast, the “Linked Data” approach typically involves exposing “bounded descriptions” of individual resources. Now, certainly, yes, those “bounded descriptions” include assertions of relationships with other resources (including the sort of part-whole/member-of/component-of relationships present here), and those links can be followed by consumers to obtain further information on the other resources – however, there is no requirement or expectation that consumers will do so. So, there is arguably a (perhaps unavoidable) element of tension between the strongly “contextual” emphasis of EAD and ISAD(G) and the “bounded descriptions” of “Linked Data”. Rather than seeing that as an insurmountable hurdle, however, I think it provides an area that the project can usefully explore and evaluate.”
Names are often entered into the Hub in different ways, despite the use of Rules.
One of the challenges of doing Linked Data is the plethora of vocabularies. It is hard to decide what we should use. Daniel Suarahighlighted this.
But matching strings is not easy, e.g. matching subjects in the Hub with subjects in LCSH.
Quotes from Linking Lives Evaluation:Researchers want a clearer idea of what is covered and they don’t always understand the results they see and why they get certain results in response to their searches. I can’t help thinking that, bearing this in mind, bringing diverse sources together may make it more difficult for users to understand and interpret results.“they remained cautious about the the principle of bringing sources together”serendipitous searching there was a feeling that it could potentially be useful but also that it could actually distract the researcher from what is relevant.“I think at PhD level there’s a kind of artistry to how you make your way through…I’ve certainly never come across a search engine that can do the same or be as complex as your own thinning patterns.”Whilst it could be said that it is not important for users to understand how data is pulled together under the hood, our research suggested that potential users, particularly advanced researchers, do indeed have an interest in how and why this information has been gathered together in a particular way.