Chasing the Fifth Star - Open Data at the National Library of NZ
1. CHASING
THE FIFTH STAR
Open data at the National Library
Michael Lascarides
User Experience Lead, DigitalNZ
@mlascarides
Kia ora, I am Michael Lascarides. I’m the User Experience Lead at the National Library of New Zealand, where I work as part of the DigitalNZ team.
We make web, mobile, and data interfaces for all kinds of folks to use, from professional and academic researchers to the generally curious, as well
as our own staff. I’d like to share a little bit about a few of the ways our institution creates, uses, and shares our collections data so that folks like you
can turn it into something wonderful.
2. @mlascarides on twitter
FYI: There are a lot of links in this talk, but I’m going to move pretty quickly past most of them. If you want to download a copy
of this talk, I’ve posted it to Twitter, where you can also ask me anything
3. You can also request pictures of my new puppy. (Sorry.)
4. OUR LIBRARY
In case you’re unsure or have forgotten just what a National Library does, here’s a quick overview.
5. By Pear285 (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
The National Library is New Zealand's legal deposit library. The Act of Parliament that created us charges us with a mission to "enrich the cultural and economic life of
New Zealand and its interchanges with other nations”. We have, roughly speaking, three main collections: the General Collections (encompassing the Legal deposit
services), the Schools Collection (including the largest collection of children’s books in the southern hemisphere), and the collections of the Alexander Turnbull Library,
predominantly comprised of unpublished materials such as manuscripts and photographs.
6. 836K
unpublished
1.4M
published
30M
items
We currently have over 800 thousand items in the catalogue of unpublished materials, 1.4 million in the published, and over 30 million items searchable on our web site,
which are mostly individual digitised newspaper articles from our Papers Past service.
7. We’ve got books, maps, photographs, recorded music, music scores, newspapers, periodicals, manuscripts, letters, paintings, artefacts, manuals, and more.
8. HOW WE ORGANISE OUR DIGITAL STUFF
DIGITAL PRESERVATION (NDHA)
PUBLISHED
CATALOGUE
UNPUBLISHED
CATALOGUE
(TIAKI)
DATABASES
&
INDEXES
FULL-TEXT
DIGITAL
OBJECTS
METADATA SERVICE (DIGITALNZ API)
OTHER
INSTITUTIONS
DIGITALNZ.ORG
* GREATLY SIMPLIFIED
PHYSICAL COLLECTIONS
SUBSCRIPTION
SERVICES
NATLIB.GOVT.NZ
PAPERS PAST
This is a very rough diagram of what our digital world looks like. On the bottom, there’s the actual collections, in physical and digital forms, with layers of catalogues and
databases just above in blue. We deliver materials to the world through three main web sites, in red at the top: The National Library site, Papers Past and DigitalNZ. In
between is our metadata service, the DigitalNZ API, which is the secret sauce we use to create ties within our collections and to those in other institutions. We’ll look
more closely at these in a moment.
9. GOOGLE “NATLIB STRATEGY 2030”
We recently created and published our new guiding strategy, which looks ahead to the year 2030. The basic strategy fits on a single slide…
https://natlib.govt.nz/about-us/strategy-and-policy/strategic-directions
10. New Zealanders will…
…trust that their documentary heritage and
TAONGA are collected, preserved and accessible
…easily access, share and use New Zealand’s
KNOWLEDGE resources
…have the LITERACY skills to achieve social,
educational and employment success and be
inspired
…to innovate and create new knowledge.
…and I think it’s a pretty good framework for this talk in front of this particular audience. We are going to preserve the nation’s documentary heritage, build a knowledge
network around it, and ensure that the country has the literacy (including digital literacy!) to make full use of it.
https://natlib.govt.nz/about-us/strategy-and-policy/strategic-directions
11. ❤🔬
(WE LOVE RESEARCHERS)
All of our collections are utterly meaningless if people don’t use them. So we’re always keen to have as many people as possible exploring, interrogating, and reusing our
collections. There is a lot of collaboration and co-creation implied in that strategy, so let’s be on this journey together.
12. OPEN DATA AT NLNZ
One of the ways we try to encourage co-creation and collaboration is to be as open as possible.
13. As a government agency, we strive to release our data in under open licenses
https://www.ict.govt.nz/guidance-and-resources/open-government/new-zealand-government-open-access-and-licensing-nzgoal-framework/
14. and employ the best practices we can when sharing that data.
https://www.data.govt.nz/toolkit/open-data-in-new-zealand/open-data-nz/
15. But beyond the basic governmental requirements to be openly available, we aspire to be as interoperable and interconnected as possible. The five-star scale promoted
by Tim Berners-Lee is still the standard in this regard.
http://5stardata.info/en/
18. #lodlam
That elusive 5th star is the “Linked” in “Linked Open Data”. If you’re new to these concepts and you find them interesting, a great hashtaggable term to follow on your
fave social media is LODLAM, which is Linked Open Data for Libraries and Museums. Doing so will connect you to a lovely, smart and interesting community of people,
some of whom are in this room.
20. OPEN DATA SETS AVAILABLE
Data sets Format ⭐?
PublicationsNZ, IndexNZ, Te Puna Web
Directory, Māori subject headings
CSV, MARC ⭐⭐⭐
Turnbull Library unpublished collections
metadata, Iwi/Hapu Names list XML ⭐⭐⭐
DigitalNZ Metadata, Papers Past
Metadata, Turnbull Library Metadata API (JSON) ⭐⭐⭐⭐
A lot of our open data sets are collections data, and we’re doing all right on the 5-star scale, with mostly threes and fours. But we have a couple of collections that run
quite a bit deeper.
22. Papers Past is the site where we deliver our full-text digitised materials.
23. We started with newspapers, and there are over 4 million New Zealand newspaper pages from 1839 to 1949. They’re scanned and automatically transcribed via optical
character recognition, so they are full-text searchable.
24. It has been expanded to include more than a million pages of magazines,
28. We’ve had researchers mine Papers Past for everything from linguistic analysis training data, to tracking the history of political propaganda, to using old weather reports
to chart historical climate change. (This is a 1912 article about man-made climate change, by the way). And if you’re more maths-y, computer science-y, there’s
opportunities to help us improve machine transcriptions, extract entities like names and places from texts, and a whole lot more.
29. WANT BULK DATA?
Just ask!
Four million articles
from 73 titles
available up to 1878
So, the Papers Past web site is an amazing resource for researchers in its own right. But we often get requests from people who want copies of our raw data. Doing so
previously had been very tricky due to the complexity of copyright—you’d be amazed how many newspaper companies from the 19th century are still around. But we’ve
cleared all the hurdles to release the raw data for newspapers up to 1878. It’s a small part of the collection, but even this small part of Papers Past includes four million
articles.
31. digitalnz.org
DigitalNZ is our service that collects the metadata from cultural heritage organisations in New Zealand, and those worldwide that have New Zealand-related content.
33. We harvest the metadata for over 30 million items from more than 200 institutions, map it to a standard format, and make them all discoverable from a single search.
While we use this data to power the DigitalNZ web site, which is our web front end to the aggregated collection, the real star of the show is the DigitalNZ API, our
machine-readable metadata service.
34. Anyone who is interested can get a developer key from our website and start hacking with our data. You can build your own products, or automate your research. And of
course, most of the National Library’s own collections are available through the service.
35. We’ve recently introduced a feature called Stories, which lets you assemble items from across the DigitalNZ content partners’ collections and weave them together with
your own narratives. Or, if you’re feeling less-inspired, you can just use a story as a way to organise your research.
36. The leading edge of our work with DigitalNZ is getting us really close to that fifth star.
37. Concepts API
Moving towards 5 Star
Linked Open Data
We’ve recently introduced the Concepts API, which allows you to interrogate our collections for items related to specific places or specific people, rather than just
keyword searches.
Overview: https://digitalnz.org/blog/posts/introducing-the-digitalnz-concepts-api
Documentation: http://digitalnz.github.io/supplejack/api_usage/concepts-api.html
38. https://digitalnz.org/concepts/4062
For the first time, you can see concepts in action on our recently redesigned website as the Explore Places feature, where we offer a permanent link to each Place
concept along with all of the items in our collections that we determine to be related to that place.
39. http://digitalnz.github.io/supplejack/
It’s also worth noting that we have freely released the software that powers DigitalNZ as an open source project, so if you’ve got a big metadata harvesting job of your
own, you can benefit from our 10 years of blood, sweat, and tears.
40. WHAT ARE OUR
INTERESTING PROBLEMS?
I’d like to close with a few of the problems we are working on, which should give you a sense of what we’re thinking about and where we’re headed, but just possibly
also spark some ideas for collaboration with some of you in the future.
41. How do we connect our
stuff to other peoples’
stuff?
(aka ⭐⭐⭐⭐⭐)
Understanding the tools. Liaison with other institutions. Doing the work. Going from concept to production.
42. How do we scale up?
Fighting technical debt and scaling issues. Brewster Kahle’s incitement to digitize everything in NZ.
43. How do we get people
involved?
More content partners. Promoting re-use. Promoting our open source tools. General marketing. Making tools easy. Educating people in digital literacy. Breaking down
barriers.
44. How do know what cool
things people are m
making with our stuff?
Measuring the impact we have on New Zealand and the world is a HARD. PROBLEM. If you build something with our stuff, it is immensely useful to us (and immensely
persuasive to the folks who allot our funding) if you let us know about it.