Presentation given to visitors from the University of Sunderland on the 10th of February, 2014 about BL Labs at the British Library in the Panizzi Room.
1. British Library Labs
http://labs.bl.uk
British Library Labs and Digital Humanities
Monday 10th February, 1600 - 1700
University of Sunderland
Panizzi, British Library, NW1 2DB
London
Mr Mahendra Mahey
British Library Labs Project Manager
Scholarship and Collections, Digital Scholarship
2. Overview
•Background, People, Details, Plan
•Content used with Labs
•Research methods
•The competition and engaging with Labs
http://labs.bl.uk #bl_labs 2
3. The project in a nutshell…
Encouraging scholars and
developers to do research
and development with and
across British Library
collections and data (+other)
http://labs.bl.uk #bl_labs 3
6. Background
• The Andrew Mellon Foundation
• 2 year initial project
http://labs.bl.uk #bl_labs 6
7. Labs details…what
• No digitisation involved, just digitized and born digital Library content
• Some content online
• Other in digital form but not online yet
– e.g. too big, needs work, technical challenges, license restrictions
(e.g. onsite access etc.)
• Examine and analyse the content, especially entire collections (i.e.
cross collection research)
• Do research, publish
• Make things, e.g. tools, services, apps etc…
• Transforming processes, services and tools for scholars / developers
using Library digital collections
http://labs.bl.uk #bl_labs 7
8. Lab details…how
• Competitions, events and various activities
• Creating environment where scholars / developers can
work intensively with Library’s digital collections (winners will
be resident), but not only…
• Encourage research / developers generally to do
interesting things with BL digital content (+other) with and
across collections (a data driven approach)
• Labs is more than the competition!
• Ideas can be pursued by talking to Library staff , scholars /
developers interested in conducting research / making
things, e.g. meetings, events etc., business opportunities
http://labs.bl.uk #bl_labs 8
9. Labs Competition
• At least 2 Competitions, winners will work ‘in residence’ where
possible
• Review and feedback to examine approach, at the moment
‘Data Driven Research’, i.e. here is our data come and do stuff
with it!
• Focus particularly on cross collection research, research at
scale
• Other research and development encouraged too!
• Help develop tools and services to support digital scholarship
• Any suggestions for next competition? When to visit?
http://labs.bl.uk #bl_labs 9
10. How Labs works…
Events
Competition BL Labs
Contact
Software
Publications
Tools and
services to
support Digital
Scholarship
BL Digital
Collection /
Data
idea
idea
idea
idea
idea
BL Digital
Collection /
Data
Other Digital
Collection
http://labs.bl.uk #bl_labs 10
11. The plan in time…
• Launch Event – 25th March 2013 – draft details of competition and feedback, launched end
of April
• Virtual 17 May (Video of Hangout Available), more virtual event?
• Hack Event 28/29 May London
• AHRC research network - 'the infinite archive‘, Open University, University of Nottingham,
University of Warwick
• Winners announced at 6 July 2013, York (Digital Heritage Conference)
• Best two ideas work in residence and showcase their work on the 11th November 2013,
when the next competition will be launched (deadline end of March 2014, work on entry
May to end of October 2014, Nov/Dec Showcase and final event
• Other ideas, look at supporting in other ways e.g. through Labs, other Library departments,
Business opportunities etc.
• Case studies produced around Nov/Dec for first iteration 2013 and second iteration 2014
http://labs.bl.uk #bl_labs 11
12. BL Labs Services
• Developed for scholars / developers wanting to use digital
Library collections for research and development
• Application Programming Interface (APIs) for data /
collections
• Powerful interface for researchers and developers for
conducting innovative and transformative projects
• We are currently doing an audit of what web services the BL
has.
• Lead by Technical lead
http://labs.bl.uk #bl_labs 12
13. Labs Hack Days…
• Bringing researchers, developers, curators and anyone
interested with collections together at events, want to do
more!
• Brainstorming ideas – ideas lab
• Scoping research, ideas, solving problems and developing
prototypes
Brainstorm ideas and group
Consider and choose
Work into the night and show
what has been done
http://labs.bl.uk #bl_labs 13
14. Case studies…
• Research generated from the competitions and general
activity of Labs
• Inform the Library / Other libraries around the world about
the issues, challenges, solutions and benefits generated
when using a Labs approach
http://labs.bl.uk #bl_labs 14
15. Labs Content
•Work with curators to identify those digital collections that
are suitable for Labs
• Focus on those that are copyright cleared at the moment
• Others considered in light of challenges, i.e. in scope for
Labs work
• Engage researchers/developers with these materials
through meetings, road-shows, hack days, promotions
(including competitions and events)
• Have list of 100s of digital collections
• Need a filter
http://labs.bl.uk #bl_labs 15
16. Where do you start?
http://labs.bl.uk #bl_labs 16
17. British Library Digital Collections
• Most content unique!
• Copyright cleared for research
and non-commercial use?
• Curated?
• Collection Level
Metadata available?
Available
only in
Reading
Rooms
Available
on site
Digital but
not online –
various storage
devices
Available only onsite at the moment
Hack Events, In residence
Digital and
online
http://labs.bl.uk #bl_labs 17
19. Types of content
• Datasets
• Books / Text
• Images / Music
• Maps
• Sounds
• Multimedia
http://labs.bl.uk #bl_labs 19
20. British National Bibliographic Data
• bnb.data.bl.uk (part of data.bl.uk)
• 2.8 Million individual records
• Available as Linked Open Data, Basic
RDF/XML and Marc21.
http://labs.bl.uk #bl_labs 20
21. UK Web Archive Data
• data.webarchive.org.uk/o
pendata
• 32TB subset of the
Internet Archive’s web
collection relating to the
UK.
• Collecting freely since e-legal
deposit
• Comparing events across
media types?
http://labs.bl.uk #bl_labs 21
22. 19th Century Digitised Books
• 68,000 digitised volumes and their
accompanying JP2, PDF, metadata
and OCR text files
• Many rare or inaccessible books
published between 1789 and 1870
and covers a wide range of subject
areas including philosophy, history,
poetry and literature, travel
• Representative materials here:
britishlibrary19c.tumblr.com
• Text mining? Text is 29Gb
http://labs.bl.uk #bl_labs 22
23. International Dunhuang Project
• IDP international collaboration
• images of all manuscripts,
paintings, textiles and artefacts
from Dunhuang and
archaeological sites of the Eastern
Silk Road freely available on the
Internet and to encourage their
use through educational and
research programmes
• http://idp.bl.uk/
• Time-lining the silk road?
http://labs.bl.uk #bl_labs 23
24. Book ordering data…
• Every day thousands of items are ordered up from the
library stacks and delivered to researchers in our reading
rooms. We can provide daily anonymised reports of these
titles including shelfmark information and reading room
location
• Visualising what readers are reading?
Anonymised reader data…
• Anonymised information about our readers
• Big buckets
• Social trends?
http://labs.bl.uk #bl_labs 24
25. Bringing Text Mining to the Library
Many electronic journals we have negotiated text mining
rights for (50%) journals
A project to get the tools to readers?
http://labs.bl.uk #bl_labs 25
26. Environment and Nature Sounds
• thousands of recordings from the Sound Archive's unrivalled
natural sounds collection is available for free download as
MP3’s to staff and students UK higher and further education
institutions
• http://sounds.bl.uk/Environment/
• Adding sounds to poetry?
http://labs.bl.uk #bl_labs 26
27. Resonance FM
• London Community Arts Radio Show
• http://resonancefm.com/
• 10 year sound archive!
• Speech to text?
http://labs.bl.uk #bl_labs 27
28. Example Research Methods
• Corpus Analysis tools
• Visualisations
• Topic Models
• Location based searching
• Geotagging
• Annotation
• APIs for datasets e.g. Metadata, Images
• Crowdsourcing / Human Computation
• Natural Language Processing
• Transcribing
http://labs.bl.uk #bl_labs 28
29. Examples from Launch event
http://labs.bl.uk/Launch+Event
http://labs.bl.uk #bl_labs 29
30. Ideas from first competition
• Text mining in the reading rooms
• Curatorial – funded through other stream
• Visualising large collections of sound at a glance (thumbnails)
• Using sheet music – combined with AHRC proposal being submitted
now
• Working with a radio archive – possibly funded through another stream
– semantic media
• Serious news
http://labs.bl.uk #bl_labs 30
31. What’s happening at the moment?
•Working with competition entrants
•Working on first year project deliverables
• Planning for next competition and dissemination
http://labs.bl.uk #bl_labs 31
32. Dan Norton
• Mixing the Library: The Disc Jockey and the Digital Collection
• Dan Norton is a PhD Researcher, University of Dundee and is Artist in Residence at
Hangar, Centre for Art and Research, Barcelona.
• Building an interface for interacting in digital collections developed from the DJ's
interaction with information. His project uses selecting and mixing as creative behaviours
for exploring, learning, and authoring with digital collections.
• The prototype will demonstrate the interface requirements necessary for collecting,
enriching (organizing, annotating),and mixing information from digital libraries; for building
aesthetic, experimental, or logical links between resources; and for developing ad hoc
visualizations, or publishing annotated data.
• Working on functioning prototype to collect URLs for different media types e.g. text, video,
sound and images, shuffle order and then comparing two digital objects and being able to
annotate in real time
http://labs.bl.uk #bl_labs 32
33. Pieter Francois
• The Sample Generator for Digitised Texts
• Pieter Francois is a Postdoctoral Researcher at the University of Oxford.
• The ‘Sample Generator for Digitized Texts’ is a relatively simple piece of software which
connects one or more major catalogues or bibliographies with one or more collections of
digitized texts through the metadata.
• The main aim is to tell the story of over a million nineteenth-century books through a
structured sampling of 68,000 books, focus is on travel routes/accounts
• Creating demonstrator which searches across 1.8 million records and where possible find
highly significant digital samples for further research from the books we have digitised so
far
http://labs.bl.uk #bl_labs 33
34. Images from the 19th Century books
From late 1600s
From 1700s
600,000 small images so far
(estimated)!
Work on going with
BL Labs Technical Lead
http://labs.bl.uk #bl_labs 34
35. The Mechanical Curator!
• Just launched, lot’s of interest!!
• Randomly selected small illustrations and ornamentations,
posted on the hour.
• Rediscovered artwork from the pages of 17th, 18th and 19th
Century books.
http://labs.bl.uk #bl_labs 35
36. Distribution of the use of DDC in BnB
Looking at the metadata for the books they have limited
metadata for subject classification
Work by Ben O’Steen, BL Labs Technical Lead
http://labs.bl.uk #bl_labs 36
38. Engaging with Labs
• ENGAGE
• Submit your name, contact details and lets speak!
• AHRC Big Data Call, a number of requests to use BL
Collections / Data if successful
• Next competition? Launch 11 November!
• Just talk to us, work with our collections!
• Ideas labs, workshops and hack events
http://labs.bl.uk #bl_labs 38
39. What next?
Speak to me: 0207 412 7324
Email me: mahendra.mahey@bl.uk or labs@bl.uk
Labs Website: http://labs.bl.uk/
Twitter: @BL_Labs
Hash Tag: #bl_labs
Jiscmail: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=BL-LABS
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
http://labs.bl.uk #bl_labs 39