British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentral.com
1. British Library Labs
http://labs.bl.uk
Saturday 26th October 2013, 1000 – 1100 (15 min slot)
Ed Tech Hackathon 2013 (Apps for Learning and Teaching)
hackathoncentral.com
Central Working Canteen, Google Campus
4–5 Bonhill Street, London, EC2A 4BX, UK
Mr Mahendra Mahey
British Library Labs Project Manager
2. British Library Labs is about…
Encouraging scholars and
developers to do research
and development with and
across British Library digital
collections and data
http://labs.bl.uk
#bl_labs
2
3. What activities does Labs do?
• Encourage researchers / developers to do interesting things with BL
digital content (+other) with and across collections (a data driven
approach)
• Competitions and events (hack events)
• Creating an environment where scholars / developers can work
intensively with Library’s digital collections (winners will be resident)
• Work with your ideas
• Help develop tools and services to support digital scholarship
• Case studies for the library and research communities
http://labs.bl.uk
#bl_labs
3
4. How Labs works…
idea
BL Digital
Collection /
Data
BL Digital
Collection /
Data
idea
idea
Competitions
idea
Contact
Other Digital
Collection
http://labs.bl.uk
Software
Events
idea
#bl_labs
BL Labs
Publications
Tools and
services to
support Digital
Scholarship
4
6. Some data / digital collections
Datasets, Books / Text, Images / Music, Maps, Sounds, Multimedia
http://labs.bl.uk/Digital+Collections
Resonance FM
10 year Community
Arts Radio Show
Text-mining of
electronic journals
Book ordering and
anonymised reader
data
UK Web Archive Data
http://labs.bl.uk
19th Century Books
#bl_labs
Environmental Sounds
British National Bibliography
6
7. Example Research Methods
• Corpus analysis tools
• Visualisations
• Location based searching
• Geotagging
• Annotation
• APIs for datasets e.g. Metadata, Images
• Crowdsourcing / Human Computation
• Natural Language Processing
• Transcribing
http://labs.bl.uk
#bl_labs
7
8. Ideas from first competition
• Text mining tool in the reading rooms
• Curatorial…repackaging metadata for teaching and
learning in a CMS e.g. Drupal
• Visualising large collections of sound at a glance
(thumbnails)
• Using sheet music and OMR software
• Working to re-use a radio archive
• The winners are…
http://labs.bl.uk
#bl_labs
8
9. Mixing the Library:
The Disc Jockey & the Digital Collection
Dan Norton’s prototype ‘mixing’ interface
Dan Norton completed a
PhD at the University of
Dundee and is an Artist in
Residence at Hangar, Centre
for Art and Research, Barcelona.
Annotation
His idea is to build a ‘mixing’ interface for
interacting with BL digital collections and
wider developed from the DJ's model of
interaction with information.
Preview ‘item’
‘Play back’ of ‘items’ (Blue)
and annotations (Yellow)
Selected ‘left’
channel ‘item’
Collection ‘stalks’ made of ‘items’. Each ‘item’ is a URL.
The order of the ‘items’ can be ‘shuffled’ and sent to the ‘left’ or ‘right’ channels
http://labs.bl.uk
#bl_labs
Selected ‘right’
channel ‘item’
9
10. The Sample Generator
for Digitised Texts
1
Pieter Francois is a Postdoctoral
Researcher at the University of Oxford.
The ‘Sample Generator’ connects one
or more major catalogues or
collections of digitized texts through
metadata.
British Library Labs Sample Generator
From
Travel Routes
1888
To
1899
Account, Tour, Adventure, Visit. Journey, Expedition,
Excursion, Trip, Holiday, Guide, Plan, Route
Digitally available content only
Search terms
Synonyms
Distribution of items in catalogue
Sample Size
8
Generates a randomised unbiased sample
Generate
2
Generated sample URL (unique & citable after creation)
Terms used: ‘Travel Routes’ from ‘1888-1899’, sample size ‘8’.
Set created on ‘16/10/2013’ by ‘Pieter Francois’
Travel route extracted from
‘Work 1’ for further research
3
Work 1
Researcher carries out research on works in
the sample generated. Here it used for the basis
of generating travel routes as shown in 3.
Work
1
Work
2
Work
3
Work
4
Work
5
Work
6
Work
7
Work
8
In this example, the ‘Sample Generator’ searches across 1.8 million bibliographic
records from the 19th Century for items about ‘Travel Routes’ and where possible
(digitised items permitting) provides unbiased digital ‘samples’ for further research.
http://labs.bl.uk
#bl_labs
10
11. Next Competition
• Starts 11 November 2013 and ends around April 2014
• Submit idea, engage during this period to formulate a good
idea
• First prize £3000 and residency (expenses paid) and we will
work with you to make your shiny thing between May and
October 2014
• Work with us anyway and our content at Data / Hack
events:
• 12 Dec 2013, 13 January 2014, 12 February 2014, 10
March 2014
http://labs.bl.uk
#bl_labs
11
12. Data / Items brought
• British National Bibliography in RDF Triples
• Digitised books from 17th, 18th and 19th and 20th Century
• Image metadata
• 10 x USB Sticks
• 1 x 500 Gb hard drive
http://labs.bl.uk
#bl_labs
12
13. British National Bibliographic Data
• http://bnb.data.bl.uk (part of data.bl.uk) –
download here, SPARQL end point
• 2.8 Million individual records
• Available as Linked Open Data, Basic
RDF/XML and Marc21.
• On USB
• Hard Drive
http://labs.bl.uk
#bl_labs
Augmenting Author records
– London Review of Books
Combining with other data
sources?
13
14. 19th Century Digitised Books
• 65,000 digitised volumes. Many rare or inaccessible books
published between 17, 18, 19 and 20th Century including
philosophy, history, poetry and literature, travel
• 25 million pages (OCR text available, 75% accuracy) on hard
drive as .txt, .json and metadata as .xml (50 Gb) (metadata as
.tsv metadata on USB stick), items identified by unique numbers
• 420,000 images / illustrations available on Flickr (around 70%
and counting) http://goo.gl/OrCKZz (use their API) and on hard
drive (100 Gb – 20 mins? – illustrations and covers)
• See Mechanical Curator on Tumblr - http://goo.gl/uvE5Yw
• For images - Jigsaw, crowdsourcing metadata, image recognition
(machine learning)
• For Text – dirty data, cleaning up exercise, with educational
purpose?
http://labs.bl.uk
#bl_labs
14
15. Image Metadata
• .CSV files on USB stick and hard drive
• Contains links to images
• Re-purpose metadata and images?
http://labs.bl.uk
#bl_labs
15
16. What next?
Speak to me: 0207 412 7324
Email me: mahendra.mahey@bl.uk or labs@bl.uk
Labs Website: http://labs.bl.uk/
Twitter: @BL_Labs
Hash Tag: #bl_labs
Jiscmail: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=BL-LABS
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
http://labs.bl.uk
#bl_labs
16