The document discusses the changing landscape for libraries and catalog data as more information is available online. It notes that library patrons now prefer searching online over using physical library resources. The way catalog data is consumed has also changed, with users preferring keyword searches over structured browsing.
The document argues that library catalog data needs to be opened up and made available in formats more suitable for online discovery and reuse by both patrons and external developers. It provides examples of how data like ISBNs could be exposed as structured JSON to enable new uses. Finally, it suggests libraries should take a pragmatic approach and focus on making their data flexible and reusable, rather than adhering strictly to legacy standards.
5. +
Competition …
No longer the single authority for content and description
Commercial, social and academic discovery mechanisms
Explosion of digital content
Illusion of ‘all on the web’
6. +
Fit for purpose?
Studies into Google Generation /
‘Generation Y’ 1
Cambridge Arcadia IRIS report 2009 2
Preference for search engine over
catalogue
Online over in-building
Trust tutors and peers over Librarian
1) ”The Google generation: the information behaviour of the researcher
of the future”
Still respect the library ‘brand’ Aslib Proceedings, V60, issue 4 10.1108/00012530810887953
2) Arcadia IRIS Project report -
http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf
7. +
Improve catalogues
Keyword based discovery
services
New ways to exploit old data
Relevancy ranking
Rich faceting
Greater linking
Search is the new browse
Repositories and archives
Is the OPAC dead?
8. +
Different but the same?
Catalogue data is now:
Consumed as keywords
(not left anchored access
points)
Faceted (not browsed)
Supplemented
Transformed
Merged
Amalgamated
9. +
Prepare for the future …
„Use case you‟ve not yet thought of‟
„Consumer as producer‟
„Pro-Am‟
„Free from silo‟
Developers as well as readers
Preference for data over text
10. +
Our local catalogues
Research group website
Wikipedia
Web start-ups
National /
international
aggregations
Joe Public Library data
Search engines Other
Booksellers libraries
Teenage software
developer / hacker
11. +
Libraries have a lot to offer
Bibliographic data linked to
many aspects of successful
teaching and research
Citation lists – measure output
Shared bibliography – core of
research group work
Reading lists – backbone of
undergraduate teaching
High quality data needed for re-
use
Not all possible whilst data
resides in the library ‘silo’
12. +
'Open metadata creates the opportunity for
enhancing impact through the release of
descriptive data about library, archival and
museum resources. It allows such data to be
made freely available and innovatively
reused to serve researchers, teachers,
students, service providers and the wider
community in the UK and internationally.'
http://discovery.ac.uk
14. +
But …
Is Marc21 the right format for developers (or libraries?)
Is it easy to convert into something more palatable?
15. +
What can we do with an ISBN?
Build Union catalogues
Find existing or alternative records (copy catalogue)
Find related works (XISBN, ISBNThing)
Match and mash with resources on the web:
Images
Reviews
Citations and references
16. +
020 - ISBN
What cataloguer record users What data consumers want:
want:
Accuracy – Accuracy
Contextualization – Contextualization
Access point
– Access point
Something legible to read
– Reusability
– Granularity
17. +
So …
Take ISBN from an 020$a
my $isbn = $record->field('020')->as_string("a");
0123456789(pbk)
(pbk) ?
Is it the same as (.pbk) I noticed earlier?
I‟m a developer – I can solve this …
Regex /^[0-9]+$/ - just gets numbers …
Oh hang on, don‟t some ISBNS end in X?
And all that information on hardback /paperback is lost …
21. my @exportAuthors=();
my @authors =();
+ my $eachAuthor ='';
if ($record->field('100')) {
@authors = $record->field('100');
foreach $eachAuthor(@authors) {
my %exportAuthor =();
my $authorFull = trim($eachAuthor->subfield('a'));
$exportAuthor{'name'} = $authorFull;
my @parsed_author=split(/,/, $authorFull);
$exportAuthor{'lastname'} = $parsed_author[0];
$exportAuthor{'firstname'} = $parsed_author[1];
my $dates = $eachAuthor->subfield('d');
my ($birthDate,$deathDate);
# The glorious 100$d disassembled ...
if ($dates) {
#first of all, get rid of ca. and fl. which aren't real birth or death dates
if ($dates=~/fl.|ca./){
#do nothing
}
#otherwise, if date contains a hyphen, assume range
#but fix also works for unterminated dates?
elsif ($dates=~/-/) {
my @dates=split(/-/,$dates);
$exportAuthor{'birthDate'} = trim($dates[0]);
if ($dates[1]) {
$exportAuthor{'deathDate'} = trim($dates[1]);
}
#No Hyphen - assume single date - look for definitive birth event with a 'd' ...
} elsif ($dates=~/b./) {
$exportAuthor{'birthDate'} = trim($dates[0]);
# - look for definitive death event with a 'd' ...
} elsif ($dates=~/d./) {
$exportAuthor{'deathDate'} = trim($dates[0]);
# Final assumption for authors with recorded dates but with single date no hyphen. Assume its a birthdate?
} else {
$exportAuthor{'birthDate'} = trim($dates[0]);
}
# produce output for dates ...
}
# Assemble author object
push(@exportAuthors,%exportAuthor);
# End author loop
}
# Add list of authors to export object
$exportRecord{'author'} = @exportAuthors;
}
22. +
How is this being solved?
Fix it at the source:
RDA
Marc transition initiative
Other initiatives – BL, OCLC linked data releases
Onyx
Mods
23. +
Pragmatism: the end of big
standards
Adoption of one new standard (or several) for its own sake is
pointless
Fit in around changing needs of libraries and systems
Data needs to be flexible and re-purposable
No standard to „rule them all‟ in the post Marc21 world
I’m trying to frame the next 40 minutes or so as a narrative
When attempting to guess where we are going, it helps if we take a step back1) To simplify things (a little) Librarians and cataloguers used to have full control of their data and the way it was used (consumed) - We created it (or paid others to do so for us) - Our readers consumed it, in our libraries, served via ledgers, card indexes and OPACs - We had / have policies + standards (AACR2, Marc21) procedures (LOC Authority control, organisation (RLUK, OCLC), technology (Z39.50, OPACS)
Library still in its bubbleAlternative discovery mechanisms and academic data & content sources suddenly existed alongside our sealed environment – all very heavily branded, very slick, constantly evolvingSome we pay for, some we contribute to, some we view as inferior competition – but they exist – all legitimate means to discover bibliographic material of interest to the researcher or the scholar and they act as a direct alternative to our traditional modelAll with their own data environments, standards, procedures, protocols – not necessarily oursIn light of this I argue that we could not longer maintain the closed ecosystem – to argue as such has become a fallacy, even in the mighty libraries of Oxford and Cambridge with world class special collections
2) - We slowly lost our place as a single prime authority - for data- Commercial, social and academic discovery mechanisms Other sources of information for our users to turn toand eventually for content Also had to cope with a growth in digital content - Publishing shift to digital(took as while as journals came first, they were only a small part of our business - analytical cataloguing not standard practice) – this is resulting in massive changes in metadata and discovery usage …
In the new environment, come new users termed Generation Y. Generation Y, it is argued have grown up and worked outside of our bubble all along - used to a very different mode of consumption for data and resourcesThey are born between 1984 and 1990. but I would argue the concept can be stretched further, way back, probably anyone who has studied science since the mid to late 1990s …Cambridge Arcadia report 2009Preference for search engine over catalogueOnline over in-buildingTrust peers of librarianStill respect the library ‘brand’All of this has lead to a direct and open questioning of the purpose of the academic library – never mind the public one
Keyword based discovery servicesRich facetingGreater linkingNew ways to OPAC is dead? -it is in your case, and I’m quite jealous…All possible due to richness of data – our authority controlled catalogue records generally work quite well in faceted environments – we gain a competitive edge over folk whose data is not in such good shapeCatalogues are easier to pick up, easier to teach and provide a more cohesive experience, even if they don’t always work in the way we as Librarians would always like. Our data is still in use, it is valuable and relevant, partly as a result of these changes in interfaceAnd I know this, because when you launched Solo a couple of years ago, some of your undergrads became our post grads and told us what they thought of our interfaces
Catalogue data now goes through several processesThe record you create is not always the record readers will seeThe way it is searched and accessed Yet we still build it with the same rules and container formats as we did 20 years ago
Gets us so far. Need to move forward. One way to prepare is to open up. We need to share and open up our raw data and to make it easier for others to re-use. I would argue each of these groups has an equal right to our raw data as much as we do, each would have different use cases for itAnd by and large, in the field of online services, I’m talking about software developers but in many areasAllow others to innovate on our data on our behalf, think of those use cases and explore them.
And there is demand. This slide is based on the ideas of a certain Cambridge academic.Bibliographic data linked to many aspects of teaching and researchCitation lists – measure outputShared bibliography – core of research group workReading lists – backbone of undergraduate teachingQuality of data – in terms of consistency and accuracy and form we are much easier to handle than museums and archivesAll exists already, but not in an open, linked capacity that can be tied quickly and easily into other institutional and external services
This is recognised nationally by the JISC, who earlier this year launched the discovery initiativeOxford text archive contributed a project, we did with catalogue data and they are funding some very exciting work …