Unlocking the value : metadata and linked data at the British Library / Alan Danskin (British Library)
1. Unlocking the Value
metadata strategy and linked data
at the British Library
Alan Danskin
Collection Metadata Standards Manager
CIGS Metadata and Linked Data Seminar, Edinburgh 12/9/2016
4. www.bl.uk 4
British Library Metadata Services
The British Library Act records
our role as “national centre for…
bibliographical & other information services”
BL Metadata Services:
• Originally offered priced services &
evolved through many technologies
• Began to offer open data in 2010 &
Linked Open Data in 2011
• Collection Metadata Strategy published
in 2015
8. www.bl.uk 8
Legacy Metadata Challenges
0
20
40
60
80
100
120
2013 2014
% Records containing Language of Content
Foundation catalogues Integrated Catalogue Annual Production
Discovery
Research
Bibliometrics
Management Information
Visualisation
Collection Management
12. www.bl.uk 12
Collection Metadata - Uses
Internal External
Direct
To record:
Resource description & availability for
discovery & access
Collection inventory
Preservation requirements
Licensed content rights
Legal deposit & purchased claims
Representation of BL holdings in shared
catalogues
Re-use in 3rd party commercial services
Derived cataloguing by other libraries
Identification of candidate material for
collaborative digitisation
Indirect
Preparing exhibitions
Responding to FOI requests
Website organisation
Content identification for collaborative
initiatives
Management information
Confirmation of UK publication status
Identification of last resort copy for
collection disposal initiatives
Open data contribution
18. www.bl.uk 20
Communications
• Strategy published externally
• Established & communicated
best practice via new CM Wiki
• Implemented:
• Centralised support mailbox
• Horizon scanning function
• Workshop on 2016/17 plans
20. www.bl.uk 22
Vision
“Our vision is that by 2020
the Library’s collection
metadata assets will be
comprehensive, coherent,
authoritative and
sustainable, enabling their
full value to be unlocked for
improved content
management, greater
collaboration and wider use
of the collection.”
21. www.bl.uk 23
Objectives
Drive efficiencies in the creation, management and
exploitation of collection metadata to support delivery of
the Library’s strategic priorities and programmes
Improve the Library’s return on investment in its
collection metadata assets by ensuring their long term
value is maintained for future activities
Open up more of the Library’s collection metadata to
improve access to Library content and promote wider re-use
22. www.bl.uk 24
Collection Metadata Strategy
2015-18 Implementation Roadmap
MetadataManagementStandards
Preservation, Maintenance & Enhancement Open Metadata Discovery & Delivery
2015 2016 2017 2018 2019 2020
Metadata assets
register available
ISNI integration into key
metadata
Licensing&RightsManagementProcessEfficienciesCommunications
Technical Infrastructure
Collection metadata available via
global cross domain delivery channels
Assessment and
prioritisation of
enhancement requirements
Automated E-book
metadata enhancement
process implemented
Establish horizon scanning
function for metadata
Undertake Options appraisal for linked
metadata platform replacement
Aleph v22
Implemention
Metadata strategy
workshop
Development of new
metadata visualisation
tools
Redevelopment of AMED
Service metadata
Undertake investigation of
DRM metadata solutions
Convergence on agreed metadata standards portfolio
Standardise solutions
for derived collection
metadata
Internal
communication of
strategic priorities
Completion of
website
redevelopment
Large scale batch
enhancement solution
implemented
Harmonisation of
existing metadata
licensing practice
IAMS data release
Create metadata based
collection analysis tools
Undertake review of national library
metadata systems options
Non standard systems
migration completed
Assessment and prioritisation
of metadata systems
migration requirements
All collection metadata held in
centralised master metadata
repositories
SAMI Metadata
assessment
Persistent Identifier
infrastructure for metadata and
related content
Implementation of comprehensive
DRM metadata solution
Promotion of agreed
standards portfolio
Preparatory work for ILS
replacement
Ongoingrationalisationoflegacysystems&migrationtosupportedmastermetadatarepositories
Improved user access
options available
Implementation of
PSI directives for
metadata sets
Complex rights & license metadata
solution available
New staff intranet
and wiki resources on
collection metadata
Optimised metadata management
processes
Efficient international standards
engagement process
Undertake review of new
metadata creation &
processing options
Establish annual
collection metadata
audit processes
Collaborate with international partners on core metadata standards
Ongoinginfrastructuredevelopment
Migration to
supported standards
BNB Linked data service
improvements
Printed music metadata
release
New open researcher format
metadata capability
established
Fully representative range
of metadata sets available
Hidden collection metadata
assets exposed for re-use
Centralised metadata
license storage
implemented
Undertake audits of
metadata assets & licenses
Governance &
internal support
functions established
Investigate deriving
options for sound
recordings
Migration complete
Collection Metadata
best practice resource
available for staff
New linked data platform
implemented
JISC national shared
metadata service platform
options appraisal
External publication
of strategy
External user support
functions centralised
Preparatory work for
Google digitisation
phase 2
Implementation of
new external sources
metadata
Linked data exploitation
options implemented
Undertake review of metadata
standards & develop engagement plan
Investigate BL on
Demand requirements
Comprehensive open
metadata service offering
Unified, standardised
metadata management
infrastructure
24. www.bl.uk 26
The British National Bibliography
3.7m entries for UK books &
journals on all subjects in all
languages 1950- to date
Reusable publication dataset - not a
unique institutional catalogue
Permissive License – CC0
Includes: People, Places, Dates, Subjects
Consistent - over 60 years
Prepublication data
Retender platform
25. www.bl.uk 28
Open Metadata
Linked Open Data
Linked Data Analytics Project
• Who is using our data?
• Which data?
• How to optimise publication?
Visitors - Government
Visitors - Academia
Visitors - Libraries
745K ISNIs added to Linked Data BNB
26. www.bl.uk 29
Open Metadata
Increasing Access, Reuse & Relevance
0
50
100
150
200
250
300
350
400
450
500
2014/15 2015/16
Downloads
177 countries use services
New open ‘researcher format’ option
Created open metadata sets
for printed music, manuscripts &
archives
Events – CILIP,
Cabinet Office Data Science…
Research
collaborations
1550+ users
27. www.bl.uk 30
Breaking down silos
Standards convergence
Metadata enhancement
New synergies
System replacement
28. www.bl.uk 31
Conclusions
Lots to be done
Linked data potential solution not an objective
Influencing and delivering corporate strategic objectives
Increasing Demand for our help!
29. www.bl.uk 32
Links: strategy
Unlocking the value: the British Library’s Collection Metadata Strategy,
2015-2018
http://www.bl.uk/bibliographic/pdfs/british-library-collection-metadata-
strategy-2015-2018.pdf
Collection Metadata Strategy Roadmap, 2015-2018
http://www.bl.uk/bibliographic/pdfs/british-library-collection-metadata-
strategy-roadmap-2015-2018.pdf
Living knowledge: the British Library 2015-2023
http://www.bl.uk/projects/living-knowledge-the-british-library-2015-
2023
Apologies if the colour is a bit blinding. Metadata is one of those things that people take for granted until it goes wrong, so we wanted something eye catching. And the future is bright for metadata. My purpose today is not to talk about linked data – there are many speakers who can do so much more knowledgably than I can. It is really to reflect on the place of linked data in the library’s metadata strategy.
First: a quick introduction to the Library
The British Library is the U.K.’s national library and also a legal deposit library. The Library currently has around 1,500 staff operating on three sites in Stockton on Tees, Boston Spa and London. The main cataloguing department is based in Boston Spa, but there are many specialist cataloguers in London
The Library was created in 1973 as the result of an Act of Parliament to amalgamate many existing institutions. Under the Act, the Library is “the national centre for bibliographical & other information services”
From an external perspective, the Library has charged for many of its bibliographic services, but over recent years in line with government policy, barriers to access have been removed. We began to offer open data in 210 and Linked Open Data form 2011.
In 2015 we published our first Collection Metadata Strategy. I’ll come to what the strategy actually is, a bit later, but first, why did we decide that we needed a strategy.
“Why” was the hardest question to answer. In fact, it needed a separate document to justify the need for a strategy.
The Library faces new challenges. In particular those posed by the extension of legal deposit to non-print media. In 2013 the scope of legal deposit was extended to non-print material. This means that publishers can be switched from deposit of printed books and serials to deposit of the e-book or e-journal. The rate of transition is under the Library’s control, which enables us to develop our infrastructure. However, we are already seeing interesting challenges. Based on the contribution from a very small subset of publishers over May 2015 to January 2016 we received 50,000 e books. This corresponds to about 50% of annual intake of printed books. Closer analysis of what has been deposited shows that we are receiving materials which are in effect re-issues of printed books and because of the international scope of publishing, we are receiving materials which have no UK imprint, but are distributed in UK. For much of this additional material we may already have records for the print manifestation, which we should be able to reuse and adapt.
There are also long standing challenges. We still have printed catalogues. Around 2 million items can only be found by consulting a printed catalogue in the reading room. There is a smaller but unquantified number of catalogues and finding aids on electronic media that need to be migrated to strategic repositories. There are some collections that remain uncatalogued.
A pervasive problem is that a lot of legacy metadata, particularly retrospectively converted metadata falls far short of current standards.
Here is legacy metadata: this record was retrospectively converted from the British Museum catalogue. As the cataloguers among you will recognise, there are some omissions. Where is the language of content? Where is the country of publication? Where is the subject data? This information wasn’t recorded in the original catalogue. Taking language as an example, the orange line shows that for current processing 100% of published resources are coded for language of content. However, if we look at the foundation or legacy catalogues, this drops to under 30%, which has implications for services and for the metadata to inform research and collection management or to visualise the collection. How can you build a linguistic picture of the collection, if the language is not explicitly recorded. This metadata is not capable of answering the questions posed by users and staff.
Metadata is increasing in importance but it was recognised that within the organization responsibility for metadata was divided and the boundaries were unclear. Addressing the challenges would be impossible unless we could also break down silos.
In effect each content stream has its own input format and its own system. Each content stream is therefore a distinct silo, usually with its own workflow and specialist staff.
The persistence of silos at the discovery layer prevent cross searching of the whole collection but the complexity they create at every level is a barrier to efficiency.
In 2004 we implemented Aleph, which removed significant long standing silos based around printed collections. This has created much more uniform metadata and more efficient processes, but many content streams and processes were out of scope and thus silos have been perpetuated.
To some extent these silos reflect historic divisions between different components of the Library and responsibility has been distributed between different services and departments.
How did we do it?
It helps to scare people a bit at first – look at all the stuff we couldn’t do without metadata. But you also have to offer solutions and hope!
The following diagram gives an overview of the range of external stakeholders
And internal stakeholders who are increasingly dependent on metadata to deliver services, and the Library’s strategic objectives.
The fundamental idea behind the evolving strategy was to treat Collection Metadata as a strategic asset of the Library, on a par with the collection, the staff and the estate. We deliberately limited the scope to metadata about the collection. Collection metadata identifies attributes and relationships of collection resources; location and availability of collection resources and status and rights associate with them
Like the collection or any other asset collection metadata needs careful stewarship over time and this in turn needs clear leadership and adequate resourcing.
Like any other asset, investment should be rewarded by improvements in efficiency, better services and more value for money. An important contention is that the metadata should be made to work much harder.
What’s in the strateggy?
The vision provides a view of where we want to be in 2020, which aligns closely with the Library’s Strategy, Living Knowledge and its core programmes.
And a plan…
If you search the document for the phrase “linked data”, you won’t find it. Nor will you find an explicit mention of RDF. There are lots of references to opening but none to linking. We see opening data as a strategic objective, whereas linked data is a technology whose potential we can exploit.
We will continue to develop the BNB Linked data offering
We have been doing some work on linked data. We’ve added URIs for ISNIs to the LOD BNB. We’ve been working with our linked data service provider TSO to improve our understanding of demand for the linked data service and we’ve also worked with Fujitsu Ireland to analyse the statistics in more depth to discover who is using the service and how. A lesson is that although the linked data offering is being used, it is very difficult to analyse by whom and for what reason. It is clear that for many researchers who may be interested in our data, linked data poses a significant technical barrier.
So linked data is just one of the channels through which we will publish our metadata.
Our free Z39.50 service remains very heavily used and continues to grow. We have more than 1500 users in 177 countries, many in singleton posts or small libraries, who reuse our bibliographic records in their own institutions. More recently we have begun to make the data available to researchers as csv, which allows them to work with desktop tools such as Excel or Open Refine. Datasets are available from our downloads page. New data sets are added to complement exhibitions, such as Shakespeare, Punk or in relation to landmark anniversaries, such as Great Fire of London. These include archival and mss resources as well as published books, maps and scores and are certainly being used. We work with government and professional bodies to promote these offerings. We also create bespoke datasets to meet specific needs and for collaborations with students and even our artist in residence.
To realise the strategic vision and deliver the objectives we have set, we have to break down the silos. The strategy provides a framework within which we can encourage the convergence of standards where appropriate and improve our metadata to make it more actionable.
New synergies will emerge offering the potential for better services and greater efficiency.
There is an opportunity arising from end of life of out major systems Aleph, Symphony and IAMs which will all need to be replace within the next few years.
We can influence but not determine the future of the library’s major systems, which all need to be replaced.
We have raised the awareness of collection metadata within the institution. We are also better informed of new requirements and receive earlier notice which allows us to influence as well as respond.
It is clear that metadata is central to the delivery of many corporate strategic objectives. This has increased demand, but also substantiates bids for additional resource.
Convergence creates new synergies. The vision to bring our metadata together informs decisions about system architecture.