2. Libraries in the semantic web
Marcia Zeng
@ ANU Library – 26 June 2015
3. Opportunities and challenges
• How can we do more with what we have?
• How can we do more with less?
• How can we use the LOD system?
4. • Turning the end point into a starting point
– FRBR +
• Obtain/find/identify/select/EXPLORE
• http://www.agris.fao.org
• http://www.numismatics.org/ocre
• Turn ‘text’ into ‘data’
5. Turn text into data
Big text → Big data
“oil without refining is of no use”
7. Kent State University – research teams
• Team 1: Linked Open Data LAM research
group
• http://lod-lam.slis.kent.edu
• Metadata
• Fact mining
• Knowledge Organisation systems (KOS)
8. Kent State University – research teams
• Team 2: Smart Big Data – how can innovation
history be interpreted by/via data?
10. A paradigm shift in how cultural heritage materials
can be
• Searched
• Mined
• Displayed
• Taught
• Analysed using digital technologies
→ new expectations of memory institutions
11. Content
• From ‘web of documents’ → ‘web of data’
• From ‘linking strings’ → ‘linking things’
Results
• From ‘on the web’ → ‘of the web’
Approaches / methods
• From machine-readable → ‘machine understandable’
• From ‘machine-readable’ → ‘machine processable’
15. LODLAM
• Building GLAM directories
• LD for non-LD people
• Use cases for bibliographic data as LOD
• Vendor engagement – pt.1
• Linking people
• Data quality
• Disambiguation
• Vendor engagement – pt.2 - The Manifesto
16. Building GLAM directories
• We want to be found
• Data needs to be accurate
• Data should be single-sourced
• Data should be open
• Schema.org offers a solution:
– Consumed by search engines
– Consumable by others
www.schema.org
17.
18. Vendor engagement – pt. 1 & 2
• Vendors want use cases for implementing LOD
• How we present use cases to vendors, business
cases internally
• Incentives for vendors to align with our business
needs
• The manifesto
19. The Manifesto
Given that libraries, museums and archives are often heavily dependent on their vendors (yada yada preamble)....
VENDORS should…
• Use and encourage established open standards.
• Prefer open source components as parts of their workflow and provide a list of their own dependencies for the evaluation of the offering
• Provide solutions that are modular and scalable, not monolithic. Allow “pluggable” components for specialty functionality (for example, OCR, entity
extraction, etc).
• Document these components in a way that explains the to the customer how they fit together.
• Allow integration of systems through RESTful, open, unlicensed, non-rate-limited APIs
• Not erect barriers to the full and complete access to the institution’s own data
• Cultivate their communities of users, listen to them, and encourage them to talk to each other and pool their resources.
• Safeguard against their own instability (through mechanisms such as code escrow, code transparency,etc).
• Not be adversarial to integrating with systems supported by other vendors
• Not proffer overly special treatment for vendors to integrate with their own products
• Support experimentation by permitting custom code to run on development copies of the software
THE CUSTOMER should…
• Prefer vendors who are incentivizing open data formats and data sharing.
• Be clear about their objectives, and try to be consistent about the language they use to
• Do not over-specify requirements. Concentrate on describing what you want accomplished, not how to do it. Be open to innovation. (Hint: Ask open,
leading questions on your request for proposal).
• Be a good participant in the user community.
• Be aware and respectful of the fact that some licenses are “sticky” and do not play well with some commercial models.
BOTH PARTIES should…
• Have a data exit strategy in mind when they enter into a commercial relationship
– It should be as easy as possible to get data out of the system in a non-proprietary format at the end of a vendor engagement (or at any time)
– The customer owns the data and it should not be encumbered by additional license agreements.
• Concentrate on the smallest possible number of open standards
Monika Szunejko attended 2 linked data events in June 2015 – this is a report to the National Library of Australia - Linked Data Working Group (8 July 2015) on learnings/observations from those events.
Thinking about linked data beyond bibliographic data – to other datasets the National Library holds.
Thinking about linked data as a solution to workflows/processes/problems – rather than an endpoint in itself.
Libraries in the semantic web / Marcia Zeng
Opportunities and challenges
2nd generation of the web: the semantic web
Big Data
Participatory culture
Opportunities and challenges
1. 2nd generation of the web: the semantic web
search engines – mature linked data technologies, non-traditional databases
Schema.org – used to say what KINDS of links connect things
Web of things, not web of links
2. Big Data
Govt. funding opportunities
Blooming of ‘data analytics’ profession
What is our role here?
3. Participatory culture
Social media
Engaging end-users in the workflow
2 types of data we can do further work on. Based on library’s metadata services:
Turning the end point into a starting point (the library catalogue is no longer the end point)
Links based on entities to mash-up data [and mesh-up data]
Using FRBR model PLUS other functions:
Obtain / find / identify / select / EXPLORE
Search/filter/analysis/display
Dynamically generated results – more vivid results
MARC format – there are many hidden access points that can bring in much richer information and knowledge through library data – e.g., 500 fields
Turn ‘text’ into ‘data’
‘Big text’ – the text version of big data
Where does it exist?
In special collections, archives, oral histories, etc.
How do we do this?
Fact mining, analytics
What is needed/how?
Use tools to ‘mine’ the text
Why?
pdf files or images
Contents are not linked anywhere
Indexes are not used in the collection’s searching, they do not reside ‘outside’ of the document
Tools to make this happen: e.g., Open Calais…
Metadata - Connecting LAMs
Fact mining – digging into unstructured and semistructured data (e.g., finding aids, transcripts)
Knowledge Organisation systems (KOS) – including thesauri, classification, pick lists, name authorities, ontologies
Liquid Crystal Institute (LCI) –
People relationships/networks
Prove innovation history by using data
Goodyear – track patents
A network framework of cultural history / M.Schick, et al
2014 Science, 345 (6196), 558-562
Using biographical dates/pathways of famous people in our culture - to see formations of culture
Using 2 elements of authority data:
Person/born/date
Person/died/date
Embracing the new and changing concepts of the semantic web in LAMs
Hints from implementers:
Don’t make your users figure it out
Don’t just publish data as SPARQL endpoints - -> creat queries
e.g., Getty vocabularies: LOD – create the questions
I attended these events, and will focus on GLAM directories and vendor engagement.
LODLAM – opened up possibilities for using other data – not just bib data (in fact, bib data was not the primary data source under consideration at the summit).
Also – another outcome from LODLAM Summit was the formation of an interest group to pursue an archives extension to schema.org – just as there already is a bibliographic data extension to schema.org
Possibilities for NLA – releasing directory and events information as LOD.
What about other datasets – that we do not already think of as datasets – e.g., preservation planning documents, registry information, etc.
Directory information – could be released as linked open data
What about releasing NLA events as linked open data. It all resides in a format that could be exported as RDFa in schema.org
Use case: push our information/events out to the web. It means that the public do not have to come to our website to learn about us – the web will bring them to us.
The system constrains the data model
LOD can be part of the solution to system interoperability problems. LOD brings a new portability to data that may help us solve local problems.
As conceived at the LODLAM Summit 2015
Linked Data for Professional Educators (LD4PE). Partner with University of Washington iSchool. 2011-12, 2014-16, ongoing
– to create a linked data training environment.
Are mapping all the competencies required for professionals to learn linked data, training in those competencies, and also plan to link to data sets they can experiment with.
The aim is for trainers to create their own training package from the offerings in the technology platform and deliver training that meets the needs of the group they are training.
The goal is also to output the platform as linked data.