SlideShare ist ein Scribd-Unternehmen logo
1 von 71
1
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
1230 - 1300, Tuesday, 5 June 2018,
Treasures of Brotherton Gallery,
Parkinson Building, University of Leeds,
Woodhouse Lane,
Leeds, LS2 9JT
What is BL Labs?
How have we engaged researchers, artists, entrepreneurs and educators in using our digital
collections?
mahendra.mahey@bl.uk
Mahendra Mahey, Manager of British Library Labs (BL Labs)
2
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon Foundation & British Library
Running since March 2013
Core Team
• Adam Farquhar (Principal Investigator)
• Mahendra Mahey (Manager) (Full Time)
• Ben O’Steen (Technical Lead) (Full Time)
• Eleanor Cooper (Project Officer) (0.5)
3
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon Foundation
Mahendra Mahey
Experiment with our
Digital Collections
Running since March 2013
Core Team
• Adam Farquhar (Principal Investigator)
• Mahendra Mahey (Manager) (Full Time)
• Ben O’Steen (Technical Lead) (Full Time)
• Eleanor Cooper (Project Officer) (0.5)
4
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Challenges Labs Addresses
• Money spent on digitising / capturing digital – return on
investment, how is it being used and what value and impact
it is having, especially when opening collections for all.
• What digital collections are there that can be used openly
and onsite and how do we tell people?
• How do we explore the feel / shape of collections at scale?
• How do we find, explore, augment discovery in often
‘messy’ cultural heritage data without public APIs?
• How do we discover, celebrate old culture & remix to create
new culture?
5
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
The British Library
Inside the British Library
Space for 1200 readers, around 500,000 visitors per year
Building 37 uses low oxygen and robots
Reading room and delivery to London
Many items stored at Document Supply and Storage centre 48 hours away
Stockton-on-Tees
Author right to payment each time their books
are borrowed from public libraries.
St Pancras, London, UK
Many books are stored 4 stories below the building
UK Legal Deposit Library – Reference only
Founded in 1973 though origins stem back to British Museum Library 1753
Boston-Spa
6
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Collections – not just books!
> 180*million items
> 0.8* m serial titles
> 8* m stamps
> 14* m books
> 6* m sound recordings
> 4* m maps
> 1.6* m musical scores
> 0.3* m manuscripts
> 60* m patents
King’s Library *Estimates
7
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Living Knowledge Vision (2015 – 2023)
Custodianship Research Business
Culture Learning International
To make our intellectual heritage accessible to everyone,
for research, inspiration and enjoyment and be the most open, creative
and innovative institution of its kind by 2023 (50 year anniversary).
Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK
Roly Keating (Chief Executive Officer of the British Library)
To make our intellectual heritage accessible to everyone,
for research, inspiration and enjoyment and be the most open, creative
and innovative institution of its kind by 2023 (50 year anniversary).
8
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Wider engagement…not just
Digital Humanities / Scholarship Researchers
Researchers
https://goo.gl/WutNyi Artists
http://goo.gl/nNKhQ2
Librarians
Curators
https://goo.gl/9NWZUW
Software Developers
https://goo.gl/7QQ5Tf
Archivists
https://goo.gl/x7b4tg
Educators
https://goo.gl/qh01Mi
Working and Communicating
Inspirational
examples
Experiences
Challenges
Lessons Learned
Entrepreneurs
https://goo.gl/Fx8RG7
9
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Digital research methods
Digital Scholarship
Visualisations
Application Programming Interfaces (APIs)
for datasets e.g. Metadata, Images, etc
Transcribing
Annotation
Location based searching & Geo-tagging
Corpus analysis, Text Mining &
Natural Language Processing
Crowdsourcing
Human Computation
In 20 years time?
10
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
What about Digital?
Born Digital Digitised
11
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
/
Knowledge Quarter London
80 knowledge organisations (as of 14/04/18) within 1 mile radius of
Kings Cross, http://www.knowledgequarter.london
http://www.turing.ac.uk (Headquartered at the British Library)
UK Web Archive and e-legal deposit (2013)
http://www.webarchive.org.uk/ukwa/
Born digital
12
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
#bldigital
3 %* digitised
* estimate
Digitisation
Partnerships
Commercial & Other Organisations
Amount
increasing rapidly
e.g. Heritage Made Digital
Bias in digitisation
http://goo.gl/bR9UJL
Sample Generator
13
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Playbills, Books, Newspapers
(includes Optical Character Recognition (OCR))
Digital collections and Datasets
British National
Bibliography
http://bnb.data.bl.uk
http://sounds.bl.ukhttp://dml.city.ac.uk/
Music (Recordings & Sheet) & Sounds
http://goo.gl/frSMJt
Broadcast News (TV and Radio)
http://goo.gl/cwThHw
http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ
Usage data
EtHOS
Web ArchiveImages, Manuscripts & Maps
http://www.qdl.qa/
Qatar Digital Library
http://idp.bl.uk/
International
Dunhuang
Project
Maps
http://www.bl.uk/maps/
Hebrew Manuscripts
http://goo.gl/4sbCp9
Flickr &
Wikimedia Commons
https://goo.gl/LZRmaZ
14
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Finding Open Cultural Heritage Datasets
http://labs.bl.uk/Digital+Collections
Collection Guides (199 as of 17/04/2018)
https://www.bl.uk/collection-guides/
Datasets about our collections
Bibliographic datasets relating to our published and archival
holdings
Datasets for content mining
Content suitable for use in text and data mining research
Datasets for image analysis
Image collections suitable for large-scale image-analysis-
based research
Datasets from UK Web Archive
Data and API services available for accessing UK Web
Archive
Digital mapping
Geospatial data, cartographic applications, digital aerial
photography and scanned historic map materials
https://data.bl.uk
Download collections as zips, no API
Each dataset has a Digital Object Identifier (DOI)
can be referenced for research
Not all discoverable via
search engines!
15
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
How are we
doing this?
16
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Competition
Awards
Projects
Tell us your ideas of what to do
with our digital content (2013-16)
Show us what you have already done with
our digital content in research, artistic,
commercial and learning and teaching
categories
Talk to us about working on
collaborative projects
Tell us your ideas of what to do
with our digital content
Engagement
• Roadshows
• Events
• Meetings
• Conversations
New!
Digital Research
Support
17
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Digital Research Support
Application Process
• Complete online form - https://goo.gl/Kgaq8d
• Entries reviewed and selected at the beginning of the month
• Up to 5 days support provided
• Technical, curatorial and legal advice
• Scope, Costs, Time, Risks
• Any other relevant issues?
18
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
• The Library has to go out to meet researchers, regularly and
cyclically to tell them what we have and learn what they
want to do
• Debunk ‘myths’ about the Library
• Show / tell researchers about the reality of our data
• Researcher’s ideas always change once they explore the
data!
https://goo.gl/esqpRb
Lots of two-way communication!
BL Labs runs annual ‘
Roadshows’ around the UK and the World
19
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Have you got X?
https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg
Looking for Physical Content in the British Library
20
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Have you got X digitised / in digital form?
http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg
Looking for Digitised / Digital Content in the BL
21
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
•Digitisation costs money, time, resources
•705 Digitisation projects / collections
(as of 15/05/2018)
From the UK Web (born digital)
to small amounts of digitised manuscripts (digitised)
So little digitised…why?
© £ 
22
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Openly Licensed Digital Content?
15% Openly
Licensed
Around 80%*
available online
Working through to make more open through
Access and Re-use committee which meets once a month…
Though some collections will always only be available onsite due to
various reasons including legal, ethical etc.
Breakdown by collection*
Manuscripts 59%
Books 9%
Maps and Views 7%
Newspapers 3%
Archives and Records 3%
Paintings, Prints and Drawings 2%
*Based on number of digitisation projects (705 as of 15/05/18)
Largest proportion of funding
Public / Private Partnership
15 %* Openly Licensed – most online
85 %* Available onsite only at the moment
*Estimates
23
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
The Story of the Digital Collection…
Digital
Collection
Curator
Who paid for the digitisation?
Who did the digitisation?
Technology used
Born digital?
Published
Unpublished
Where is it?
Can it still be accessed?
Generates income
Reputational risk in using?
Legalities /
Ethics / Morality
Politics when digitised
Personalities involved
Surprises (e.g. gaps)
Descriptive information
Old format not supported
What media was the
digitisation done from?
Is there any background documentation?
No Descriptive information
Inconsistent descriptive information
Still there?
Good to know the background ‘story’ of a Digital Collection
if you want to use it for research and make conclusions…
24
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Open Content vs Onsite Only Access
• Access easier for openly licensed content
• More challenging for on-site, in-copyright, non-print legal
deposit, data protected, old content media & contemporary
material (post 1877)
https://goo.gl/Y5zCXg
©
25
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
How do we give access to
onsite-only
Digital Collections
(85% of our Digital Collections)?
26
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
only in
Reading
Rooms due
to ©
only on
site due to
© or
ethical etc
not online /
available –
various storage
devices,
personal data
online
and open
British Library
online
behind
paywall
Challenges of access to Digital Collections
Labs Residency Model
27
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
https://goo.gl/qpCLlk
https://goo.gl/wMTS3Z
• Dialogue typically:
– you are ‘lucky’ & we have the digital content
/ data relevant to your research
– we don’t have exactly what your looking for,
but is there anything of interest? Let’s talk…
– engagement can be hard work and it’s
constantly required to maintain interest in our
digital collections!
• We also tend to attract researchers with ‘fuzzier’
research boundaries and possibly open to more
interdisciplinary / collaborative research
• Artists find this dialogue easier…
What engagement does the BL have with
researchers wanting use our digital content?
28
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Our Audience and You
Audience
research &
Digital
interests
Digital
collections
you have
This is where Labs works
It starts with a conversation!
Only a small amount of content is digitised!
Might not be the treasure expected at the end of a digital journey!
29
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Interactions with BL Labs “researcher”
wanting to work with our data
Submit idea
for support
30
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
What did people
actually do?
Examples from Text and Images
Over 200 examples (including sound, video) from
Competition and Awards:
http://labs.bl.uk/Ideas+for+Labs
http://labs.bl.uk/Other+Uses+of+Collections
31
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Example Pattern of Research
1, 2, 3
1. Find / identify new things in messy stuff
2. Unlock hidden history / data
3. Celebrate new discoveries
32
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Finding / identifying invisible / well hidden
things in ‘messy’ historical data
https://goo.gl/mcpa8B
Not the British Library!
Example Pattern of Research 1
33
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– U.S.
– U.S.A
– U.S.A.
– USA
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.
34
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Open Refine
http://openrefine.org/
35
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Characterising / learning the shape
of your data
http://blogs.bl.uk/digital-scholarship/2013/09/data-exploration-through-visualisation.html
36
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
http://dirtdirectory.org
37
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
#digitalhumanities
dancohen/lists/digitalhumanities
@ProfHacker
@Dhnow
@BL_DigiSchol
And more links to resources here: http://scottbot.net/teaching-yourself-to-code-in-dh/
38
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Unearthing / unlocking
hidden histories & data
to stimulate new research
https://goo.gl/vJ291F
It’s an
18th Century Poem!
Example Pattern of Research 2
39
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Celebrating hidden histories / data
creatively through events, art &
performance
https://goo.gl/Ql0Bwz
Re-enacting, re-discovering history
Example Pattern of Research 3
40
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Experiments with Text
41
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
https://goo.gl/oUNj5N
https://goo.gl/ImAUv4
Finding things in ‘messy’
Optical Character Recognised (OCR) text
Mrs Folly
• Clean up some manually
• Get human ‘ground truth’
• Write computer code (sometimes
it’s machine learning) to find
things reliably in it ‘automatically’
• Try code on messy content
• Tweak if necessary
• Digital ‘lasso’ around content
• Human sift through
Mrs Folly
An example pattern of research
42
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Legalities of Machine Learning /
Text and Data mining
https://goo.gl/toq4Bo
Legalities of Machine Learning / Text and Data
mining still up for discussion…Often misunderstood
Is it the same as humans reading and looking for
patterns…just a bit quicker?
43
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
http://victorianhumour.tubmblr.com
Victorian Meme Machine (2014)
https://goo.gl/HMqDt3
Bob Nicholson
http://victorianhumour.tumblr.com/
Bob Nicholson interviewed on
BBC Radio 4 Making History Programme:
http://goo.gl/fmV9ep
And telling jokes to the public:
http://goo.gl/xIDRhz
Bob obtained further funding from his university
Looking for more collaborations
https://www.youtube.com/watch?v=-GRgj7Q5OM0
Rob Walker, Victorian Mother-in-law Jokes
Victorian Comedy Night, 7 Nov 2016
Learnt about access paths
to digital collections
44
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Katrina Navickas (2015)
Political Meetings Mapper
http://politicalmeetingsmapper.co.uk
https://goo.gl/Qq78Oa
Labs Symposium 2015
https://goo.gl/BSA3be
Interview 2015
The Chartist Newspaper
http://goo.gl/vOLSnH
Chartist Monster Meeting
Chartists Walking Tour and
Re-enactment London
Learnt that domain knowledge
reduces noise
45
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Black Abolitionist Performances & their
Presence in Britain (2016) – Hannah-Rose Murray
Frederick
Douglass
Ellen
Craft
Josiah
Henson
Ida B
Wells
A Performance by
Joe Williams &
Martelle Edinborough
http://frederickdouglassinbritain.com/
Started to implement
Machine Learning Techniques
46
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Data-mining verse in 18th Century newspapers
BL Labs Project 16-17, Jennifer Batt
https://goo.gl/5Akthd
Slides courtesy Jennifer Batt
Started to refine
Machine Learning Techniques
47
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Psychiatrist’s Journey
into 19th Century Newspapers (2016)
• Dr Surendra P Singh, Consultant Psychiatrist
• To identify weekly, monthly, yearly and
longitudinal trends in suicide reporting in
terms of gender, status, sites, locations and
health in OCR text of 19th Century
Newspapers
• Used ‘R’ Open Source Stats
Package to collect ‘Suicide’ corpus
• Looking for collaborators to work on this
dataset
Use off-the-shelf tools
and remote access pathways
48
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Virtual Infrastructure for OCR text
OCR text ‘scraped’ from
digitised newspapers
and put in internal cloud
Jupyter notebook
Write python code and results
in web browser
http://jupyter.org
Access available for researchers ‘in residence’
49
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Experiments with Images
50
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
65,000 digitised 19th Century books
Image: Artwork by Alicia Martin 2007 / 2008
Paid for by:
For a full list:
https://goo.gl/HqPQMS
Subjects include:
Philosophy
Poetry
History
Literature
1789 - 1876
51
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
30 August 2012
52
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
002819694
Unique number
53
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
54
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
55
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
OCR XML Generated by ABBY Fine Reader
Optical Character Recognition
56
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Images from books captured too!
57
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Worked better for female faces than men’s
Press
http://mechanicalcurator.tumblr.com
Posts image every 30 minutes
http://www.flickr.com/photos/britishlibrary/
1,020,418 images
need tagging!
Creative uses of images
Face recognition
Algorithms based on photos
Mechanical Curator
with an algorithmic brain
(Circles, Squares and Slanty etc)
http://goo.gl/qPPgxX
Wikimedia
Flickr Commons
Individual URL & API
Snipping out images
from 65,000 Digitised Books*
>800,000,000* views
>17,000,000* tags
https://goo.gl/FgZ4HM
Work @ BL by Ben O’Steen, Labs
and Digital Research Team*Matt Prior - http://goo.gl/j29Tnx
Since Dec 2013
Tumblr
*Estimates
>More demand to see
physical items
58
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Tagging, Tagging, Tagging…
59
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Tagging a million images
Iterative Crowdsourcing
http://goo.gl/j6fxac
Cardiff University’s
Lost Visions Project
http://www.metadatagames.org/
Metadata Games
James Heald
Mario Klingemann
Chico 45
Use computational methods
Human Tagger
Top British Library Flickr Commons Taggers
18 hard core taggers
How to reward and keep motivated this ‘small group?
Average for ‘crowd’ is 1 tag per person
What kind of ‘task’ can this ‘crowd’ do?
Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
60
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Adam Crymble (2015)
Crowdsource Arcade
http://goo.gl/LBfJ4W
http://goo.gl/OH9pOZ
https://goo.gl/7z0j8p
30 mins talk
Labs Symposium (2015)
https://goo.gl/SSRsdd
5 min interview (2015)
http://goo.gl/0APpE8
Game Jam
Using Arcade Games
to help Tag images
‘Art Treachery’ and ‘Tag Attack’
61
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Special Jury’s Prize (2015)
James Heald – Wikimedia and Map work
https://goo.gl/WYZCB2
http://goo.gl/HNQq5e
https://goo.gl/VPgffL
https://commons.wikimedia.org/
https://goo.gl/djtm1b
Labs Symposium (2015)Geotagging maps
50,000 Maps
Found in Flickr 1 million
Human & Computational Tagging
& Community engagement
Geo-referencing work
https://www.bl.uk/georeferencer
62
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
SherlockNet: Competition Winner 2016
Karen Wang, Luda Zhao and Brian Do
Using Convolutional Neural Networks to Automatically Tag and Caption
the British Library Flickr Commons 1 million Image Collection
12 categories
>15.5 million tags added
>100,000 captions
bit.ly/sherlocknet
Pooled surrounding
OCR text on page
from similar images
Used Microsoft COCO (photographs) &
British Museum Prints and Drawings
collections as training sets.
Tags Captions
63
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
http://goo.gl/dM8ieA
Mario Klingeman (2015)
Code Artist / Curator
http://goo.gl/bNxGZZ
Kris Hoffman (2016)
Animation for Fashion Week 2016
https://goo.gl/QilqqT
Jiayi Chong 2016 - Animation tool
https://www.facebook.com/RealmlandStory/
Paul Rand Pierce 2016
Graphic Novel on Facebook
Tragic Looking Women
44 Men who Look 44
(Notice the direction faces)
A Hat on the Ground
Spells trouble
Artistic / Creative Works
https://www.youtube.com/watch?v=Q3SBxO34Zlc
David Normal 2014 and 2015
Collages/Paintings & Lightboxes
64
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Hey there Young Sailor!
Ling Low 2016 – Hey there Young Sailor
https://www.youtube.com/watch?v=bcOP1E5bRE0VIMEO.COM/SWEETANDLOWFILMS
@SWEETNLOWFILMS ON INSTAGRAM
@SWEETNLOWLING ON TWITTER
The Impatient Sisters
Play to fade!
65
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Imaginary Cities – BL Labs Project /
Exhibition 16-18 (Michael Takeo Magruder)
An artistic exploration seeking to create provocative fictional cityscapes for the Information Age
from the British Library’s digital collection of historic urban maps
66
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Alanna Hilton
British Fashion Colleges Council and
Teatum Jones
67
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Careful of making conclusions based on
‘black box’ software & techniques (e.g.
sentiment analysis, algorithms), learn the
assumptions behind them first!
Lessons Learned & Challenges…
Beware of ‘Black Box’ software…
68
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Breaking Black Boxes – Melodee Beals
69
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Huge appetite to use digital content & data
for anyone’s ideas!
(e.g. Flickr Commons stats).
Lessons Learned & Challenges…
Huge demand for open digital content…
https://goo.gl/yQ5s4U
70
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Labs mindset…
1. Labs tries to start a conversation, generate positive energy,
encourages fun/play/experimentation and tries to support ideas.
2. Start with small experiments, use can be really simple, but OK to
think big!
3. Fail faster (don’t be afraid) and persevere.
4. Reject perfectionism! Good enough is sometimes…good enough!
5. Services that allow useful exploration of cultural heritage data are
rare!
6. Exploring data is difficult to do with large datasets and often requires
specific skills and capabilities that many of our users don’t have –
training or collaborations?
7. Celebrate the uses of digital collections, tell the world!
8. Success is sometimes all about the right people, place & right time…
https://goo.gl/noASfl
71
@BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds
Explore or Imagine Our Data!
• CSV of Metadata
https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv
• 19th Century Books - Book Metadata - 01/09/2013.
https://data.bl.uk/digbks/db21.html
• Digitised Books - Flickr Tag History - Dec 2013 to March 2016.
TSV
https://data.bl.uk/digbks/db15.html
• Digitised Hebrew Manuscripts - Metadata
https://data.bl.uk/hebrewmanuscripts/heb1.html
• Digitised Hebrew Manuscripts: Or 2210 - Or 2364
https://data.bl.uk/hebrewmanuscripts/heb8.html
• Theatrical playbills from Britain and Ireland (OCR text only)
https://data.bl.uk/playbills/pb2.html
• Portraits of actors, views of theatres and playbills (covering
1750 - 1821 in a single volume)
https://data.bl.uk/singlesheet/por1.html
• Volumes of Lysons Collectanea (Amusements), comprising
broadsides, cuttings, advertisements on amusements.1660-
1840. https://data.bl.uk/singlesheet/ad1.html
https://data.bl.uk
• Have a look at the data.
• Data Quality
• Issues
Or an idea you have thought of
what to do with the data!
http://labs.bl.uk/Ideas+for+Labs
Smaller datasets

Weitere ähnliche Inhalte

Was ist angesagt?

Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
labsbl
 
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research CentreBritish Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
labsbl
 

Was ist angesagt? (20)

British Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of OxfordBritish Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of Oxford
 
British Library Labs Presentation at the Accelerating Human Imagination Workshop
British Library Labs Presentation at the Accelerating Human Imagination WorkshopBritish Library Labs Presentation at the Accelerating Human Imagination Workshop
British Library Labs Presentation at the Accelerating Human Imagination Workshop
 
Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...
 
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
 
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities MissThe Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss
 
Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
 
Europeana Newspapers -
Europeana Newspapers - Europeana Newspapers -
Europeana Newspapers -
 
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections, D...
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections, D...Digital Research Support by Stella Wisdom, 20th & 21st Century Collections, D...
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections, D...
 
Digitised Images Sharing and Reuse by Stella Wisdom
Digitised Images Sharing and Reuse by Stella WisdomDigitised Images Sharing and Reuse by Stella Wisdom
Digitised Images Sharing and Reuse by Stella Wisdom
 
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
 
Europeana in a Research Context
Europeana in a Research ContextEuropeana in a Research Context
Europeana in a Research Context
 
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research CentreBritish Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
 
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
 
Cpd25_Aquiles Alencar Brayner
Cpd25_Aquiles Alencar BraynerCpd25_Aquiles Alencar Brayner
Cpd25_Aquiles Alencar Brayner
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
 
BL Labs Competition 2016
BL Labs Competition 2016BL Labs Competition 2016
BL Labs Competition 2016
 
BL Labs and Channel 4 Presentation at Sunnyside of the Doc 250615
BL Labs and Channel 4 Presentation at Sunnyside of the Doc 250615BL Labs and Channel 4 Presentation at Sunnyside of the Doc 250615
BL Labs and Channel 4 Presentation at Sunnyside of the Doc 250615
 
British Library Labs Roadshow 2016 UCL 24 Feb 2016
British Library Labs Roadshow 2016 UCL 24 Feb 2016British Library Labs Roadshow 2016 UCL 24 Feb 2016
British Library Labs Roadshow 2016 UCL 24 Feb 2016
 
Presentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of SciencesPresentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of Sciences
 
Places of Inspiration: Playing and Making in the Library
Places of Inspiration: Playing and Making in the LibraryPlaces of Inspiration: Playing and Making in the Library
Places of Inspiration: Playing and Making in the Library
 

Ähnlich wie British Library Labs Leeds Roadshow 2018

Bl labs edinburgh_dh_snet_020415-mmahey
Bl labs edinburgh_dh_snet_020415-mmaheyBl labs edinburgh_dh_snet_020415-mmahey
Bl labs edinburgh_dh_snet_020415-mmahey
labsbl
 

Ähnlich wie British Library Labs Leeds Roadshow 2018 (20)

Digital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British LibraryDigital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British Library
 
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
 
BL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of WolverhamptonBL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of Wolverhampton
 
British Library Labs Roadshow 2017 at the University of Birmingham
British Library Labs Roadshow 2017 at the University of BirminghamBritish Library Labs Roadshow 2017 at the University of Birmingham
British Library Labs Roadshow 2017 at the University of Birmingham
 
British Library Labs Roadshow - Sussex Humanities Lab
British Library Labs Roadshow - Sussex Humanities LabBritish Library Labs Roadshow - Sussex Humanities Lab
British Library Labs Roadshow - Sussex Humanities Lab
 
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
Digital Research Support by Stella Wisdom, 20th & 21st Century CollectionsDigital Research Support by Stella Wisdom, 20th & 21st Century Collections
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
 
Presentation to the London Psychology Group
Presentation to the London Psychology GroupPresentation to the London Psychology Group
Presentation to the London Psychology Group
 
BL Labs Roadshow at the University of Kent
BL Labs Roadshow at the University of KentBL Labs Roadshow at the University of Kent
BL Labs Roadshow at the University of Kent
 
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural DataBL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
 
DH Project Management
DH Project ManagementDH Project Management
DH Project Management
 
British Library Labs - Presentation at the University of Nottingham - Digital...
British Library Labs - Presentation at the University of Nottingham - Digital...British Library Labs - Presentation at the University of Nottingham - Digital...
British Library Labs - Presentation at the University of Nottingham - Digital...
 
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
 
British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014
 
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
 
More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...
 
British Library Labs - CityLIS
British Library Labs  - CityLISBritish Library Labs  - CityLIS
British Library Labs - CityLIS
 
Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...
Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...
Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...
 
BL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development TeamBL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development Team
 
Reinventing the future to save the past v2.0 12.02.2015
Reinventing the future to save the past v2.0 12.02.2015Reinventing the future to save the past v2.0 12.02.2015
Reinventing the future to save the past v2.0 12.02.2015
 
Bl labs edinburgh_dh_snet_020415-mmahey
Bl labs edinburgh_dh_snet_020415-mmaheyBl labs edinburgh_dh_snet_020415-mmahey
Bl labs edinburgh_dh_snet_020415-mmahey
 

Mehr von labsbl

A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...
labsbl
 
Bl labs ou-dh-collaboration
Bl labs ou-dh-collaborationBl labs ou-dh-collaboration
Bl labs ou-dh-collaboration
labsbl
 

Mehr von labsbl (17)

7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments
 
7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update
 
7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award
 
7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award
 
7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
 
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
 
7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award
 
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
 
7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update
 
7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction
 
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
 
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
 
A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...
 
Experiences and lessons learned through British Library Labs How have we eng...
Experiences and lessons learned through British Library Labs  How have we eng...Experiences and lessons learned through British Library Labs  How have we eng...
Experiences and lessons learned through British Library Labs How have we eng...
 
What is BL Labs?
What is BL Labs?What is BL Labs?
What is BL Labs?
 
Bl labs ou-dh-collaboration
Bl labs ou-dh-collaborationBl labs ou-dh-collaboration
Bl labs ou-dh-collaboration
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 

Kürzlich hochgeladen (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

British Library Labs Leeds Roadshow 2018

  • 1. 1 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds 1230 - 1300, Tuesday, 5 June 2018, Treasures of Brotherton Gallery, Parkinson Building, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT What is BL Labs? How have we engaged researchers, artists, entrepreneurs and educators in using our digital collections? mahendra.mahey@bl.uk Mahendra Mahey, Manager of British Library Labs (BL Labs)
  • 2. 2 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation & British Library Running since March 2013 Core Team • Adam Farquhar (Principal Investigator) • Mahendra Mahey (Manager) (Full Time) • Ben O’Steen (Technical Lead) (Full Time) • Eleanor Cooper (Project Officer) (0.5)
  • 3. 3 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation Mahendra Mahey Experiment with our Digital Collections Running since March 2013 Core Team • Adam Farquhar (Principal Investigator) • Mahendra Mahey (Manager) (Full Time) • Ben O’Steen (Technical Lead) (Full Time) • Eleanor Cooper (Project Officer) (0.5)
  • 4. 4 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Challenges Labs Addresses • Money spent on digitising / capturing digital – return on investment, how is it being used and what value and impact it is having, especially when opening collections for all. • What digital collections are there that can be used openly and onsite and how do we tell people? • How do we explore the feel / shape of collections at scale? • How do we find, explore, augment discovery in often ‘messy’ cultural heritage data without public APIs? • How do we discover, celebrate old culture & remix to create new culture?
  • 5. 5 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds The British Library Inside the British Library Space for 1200 readers, around 500,000 visitors per year Building 37 uses low oxygen and robots Reading room and delivery to London Many items stored at Document Supply and Storage centre 48 hours away Stockton-on-Tees Author right to payment each time their books are borrowed from public libraries. St Pancras, London, UK Many books are stored 4 stories below the building UK Legal Deposit Library – Reference only Founded in 1973 though origins stem back to British Museum Library 1753 Boston-Spa
  • 6. 6 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  • 7. 7 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary). Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary).
  • 8. 8 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Wider engagement…not just Digital Humanities / Scholarship Researchers Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi Working and Communicating Inspirational examples Experiences Challenges Lessons Learned Entrepreneurs https://goo.gl/Fx8RG7
  • 9. 9 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Digital research methods Digital Scholarship Visualisations Application Programming Interfaces (APIs) for datasets e.g. Metadata, Images, etc Transcribing Annotation Location based searching & Geo-tagging Corpus analysis, Text Mining & Natural Language Processing Crowdsourcing Human Computation In 20 years time?
  • 10. 10 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds What about Digital? Born Digital Digitised
  • 11. 11 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds / Knowledge Quarter London 80 knowledge organisations (as of 14/04/18) within 1 mile radius of Kings Cross, http://www.knowledgequarter.london http://www.turing.ac.uk (Headquartered at the British Library) UK Web Archive and e-legal deposit (2013) http://www.webarchive.org.uk/ukwa/ Born digital
  • 12. 12 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds #bldigital 3 %* digitised * estimate Digitisation Partnerships Commercial & Other Organisations Amount increasing rapidly e.g. Heritage Made Digital Bias in digitisation http://goo.gl/bR9UJL Sample Generator
  • 13. 13 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Playbills, Books, Newspapers (includes Optical Character Recognition (OCR)) Digital collections and Datasets British National Bibliography http://bnb.data.bl.uk http://sounds.bl.ukhttp://dml.city.ac.uk/ Music (Recordings & Sheet) & Sounds http://goo.gl/frSMJt Broadcast News (TV and Radio) http://goo.gl/cwThHw http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ Usage data EtHOS Web ArchiveImages, Manuscripts & Maps http://www.qdl.qa/ Qatar Digital Library http://idp.bl.uk/ International Dunhuang Project Maps http://www.bl.uk/maps/ Hebrew Manuscripts http://goo.gl/4sbCp9 Flickr & Wikimedia Commons https://goo.gl/LZRmaZ
  • 14. 14 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Finding Open Cultural Heritage Datasets http://labs.bl.uk/Digital+Collections Collection Guides (199 as of 17/04/2018) https://www.bl.uk/collection-guides/ Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image-analysis- based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials https://data.bl.uk Download collections as zips, no API Each dataset has a Digital Object Identifier (DOI) can be referenced for research Not all discoverable via search engines!
  • 15. 15 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds How are we doing this?
  • 16. 16 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Competition Awards Projects Tell us your ideas of what to do with our digital content (2013-16) Show us what you have already done with our digital content in research, artistic, commercial and learning and teaching categories Talk to us about working on collaborative projects Tell us your ideas of what to do with our digital content Engagement • Roadshows • Events • Meetings • Conversations New! Digital Research Support
  • 17. 17 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Digital Research Support Application Process • Complete online form - https://goo.gl/Kgaq8d • Entries reviewed and selected at the beginning of the month • Up to 5 days support provided • Technical, curatorial and legal advice • Scope, Costs, Time, Risks • Any other relevant issues?
  • 18. 18 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds • The Library has to go out to meet researchers, regularly and cyclically to tell them what we have and learn what they want to do • Debunk ‘myths’ about the Library • Show / tell researchers about the reality of our data • Researcher’s ideas always change once they explore the data! https://goo.gl/esqpRb Lots of two-way communication! BL Labs runs annual ‘ Roadshows’ around the UK and the World
  • 19. 19 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Have you got X? https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg Looking for Physical Content in the British Library
  • 20. 20 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Have you got X digitised / in digital form? http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg Looking for Digitised / Digital Content in the BL
  • 21. 21 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds •Digitisation costs money, time, resources •705 Digitisation projects / collections (as of 15/05/2018) From the UK Web (born digital) to small amounts of digitised manuscripts (digitised) So little digitised…why? © £ 
  • 22. 22 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Openly Licensed Digital Content? 15% Openly Licensed Around 80%* available online Working through to make more open through Access and Re-use committee which meets once a month… Though some collections will always only be available onsite due to various reasons including legal, ethical etc. Breakdown by collection* Manuscripts 59% Books 9% Maps and Views 7% Newspapers 3% Archives and Records 3% Paintings, Prints and Drawings 2% *Based on number of digitisation projects (705 as of 15/05/18) Largest proportion of funding Public / Private Partnership 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment *Estimates
  • 23. 23 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Can it still be accessed? Generates income Reputational risk in using? Legalities / Ethics / Morality Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘story’ of a Digital Collection if you want to use it for research and make conclusions…
  • 24. 24 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Open Content vs Onsite Only Access • Access easier for openly licensed content • More challenging for on-site, in-copyright, non-print legal deposit, data protected, old content media & contemporary material (post 1877) https://goo.gl/Y5zCXg ©
  • 25. 25 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds How do we give access to onsite-only Digital Collections (85% of our Digital Collections)?
  • 26. 26 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds only in Reading Rooms due to © only on site due to © or ethical etc not online / available – various storage devices, personal data online and open British Library online behind paywall Challenges of access to Digital Collections Labs Residency Model
  • 27. 27 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds https://goo.gl/qpCLlk https://goo.gl/wMTS3Z • Dialogue typically: – you are ‘lucky’ & we have the digital content / data relevant to your research – we don’t have exactly what your looking for, but is there anything of interest? Let’s talk… – engagement can be hard work and it’s constantly required to maintain interest in our digital collections! • We also tend to attract researchers with ‘fuzzier’ research boundaries and possibly open to more interdisciplinary / collaborative research • Artists find this dialogue easier… What engagement does the BL have with researchers wanting use our digital content?
  • 28. 28 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Our Audience and You Audience research & Digital interests Digital collections you have This is where Labs works It starts with a conversation! Only a small amount of content is digitised! Might not be the treasure expected at the end of a digital journey!
  • 29. 29 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Interactions with BL Labs “researcher” wanting to work with our data Submit idea for support
  • 30. 30 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds What did people actually do? Examples from Text and Images Over 200 examples (including sound, video) from Competition and Awards: http://labs.bl.uk/Ideas+for+Labs http://labs.bl.uk/Other+Uses+of+Collections
  • 31. 31 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Example Pattern of Research 1, 2, 3 1. Find / identify new things in messy stuff 2. Unlock hidden history / data 3. Celebrate new discoveries
  • 32. 32 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Finding / identifying invisible / well hidden things in ‘messy’ historical data https://goo.gl/mcpa8B Not the British Library! Example Pattern of Research 1
  • 33. 33 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Messiness in historical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  • 34. 34 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Open Refine http://openrefine.org/
  • 35. 35 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Characterising / learning the shape of your data http://blogs.bl.uk/digital-scholarship/2013/09/data-exploration-through-visualisation.html
  • 36. 36 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds http://dirtdirectory.org
  • 37. 37 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds #digitalhumanities dancohen/lists/digitalhumanities @ProfHacker @Dhnow @BL_DigiSchol And more links to resources here: http://scottbot.net/teaching-yourself-to-code-in-dh/
  • 38. 38 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Unearthing / unlocking hidden histories & data to stimulate new research https://goo.gl/vJ291F It’s an 18th Century Poem! Example Pattern of Research 2
  • 39. 39 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Celebrating hidden histories / data creatively through events, art & performance https://goo.gl/Ql0Bwz Re-enacting, re-discovering history Example Pattern of Research 3
  • 40. 40 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Experiments with Text
  • 41. 41 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds https://goo.gl/oUNj5N https://goo.gl/ImAUv4 Finding things in ‘messy’ Optical Character Recognised (OCR) text Mrs Folly • Clean up some manually • Get human ‘ground truth’ • Write computer code (sometimes it’s machine learning) to find things reliably in it ‘automatically’ • Try code on messy content • Tweak if necessary • Digital ‘lasso’ around content • Human sift through Mrs Folly An example pattern of research
  • 42. 42 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Legalities of Machine Learning / Text and Data mining https://goo.gl/toq4Bo Legalities of Machine Learning / Text and Data mining still up for discussion…Often misunderstood Is it the same as humans reading and looking for patterns…just a bit quicker?
  • 43. 43 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds http://victorianhumour.tubmblr.com Victorian Meme Machine (2014) https://goo.gl/HMqDt3 Bob Nicholson http://victorianhumour.tumblr.com/ Bob Nicholson interviewed on BBC Radio 4 Making History Programme: http://goo.gl/fmV9ep And telling jokes to the public: http://goo.gl/xIDRhz Bob obtained further funding from his university Looking for more collaborations https://www.youtube.com/watch?v=-GRgj7Q5OM0 Rob Walker, Victorian Mother-in-law Jokes Victorian Comedy Night, 7 Nov 2016 Learnt about access paths to digital collections
  • 44. 44 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Katrina Navickas (2015) Political Meetings Mapper http://politicalmeetingsmapper.co.uk https://goo.gl/Qq78Oa Labs Symposium 2015 https://goo.gl/BSA3be Interview 2015 The Chartist Newspaper http://goo.gl/vOLSnH Chartist Monster Meeting Chartists Walking Tour and Re-enactment London Learnt that domain knowledge reduces noise
  • 45. 45 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Black Abolitionist Performances & their Presence in Britain (2016) – Hannah-Rose Murray Frederick Douglass Ellen Craft Josiah Henson Ida B Wells A Performance by Joe Williams & Martelle Edinborough http://frederickdouglassinbritain.com/ Started to implement Machine Learning Techniques
  • 46. 46 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Data-mining verse in 18th Century newspapers BL Labs Project 16-17, Jennifer Batt https://goo.gl/5Akthd Slides courtesy Jennifer Batt Started to refine Machine Learning Techniques
  • 47. 47 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Psychiatrist’s Journey into 19th Century Newspapers (2016) • Dr Surendra P Singh, Consultant Psychiatrist • To identify weekly, monthly, yearly and longitudinal trends in suicide reporting in terms of gender, status, sites, locations and health in OCR text of 19th Century Newspapers • Used ‘R’ Open Source Stats Package to collect ‘Suicide’ corpus • Looking for collaborators to work on this dataset Use off-the-shelf tools and remote access pathways
  • 48. 48 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Virtual Infrastructure for OCR text OCR text ‘scraped’ from digitised newspapers and put in internal cloud Jupyter notebook Write python code and results in web browser http://jupyter.org Access available for researchers ‘in residence’
  • 49. 49 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Experiments with Images
  • 50. 50 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds 65,000 digitised 19th Century books Image: Artwork by Alicia Martin 2007 / 2008 Paid for by: For a full list: https://goo.gl/HqPQMS Subjects include: Philosophy Poetry History Literature 1789 - 1876
  • 51. 51 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds 30 August 2012
  • 52. 52 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds 002819694 Unique number
  • 55. 55 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds OCR XML Generated by ABBY Fine Reader Optical Character Recognition
  • 56. 56 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Images from books captured too!
  • 57. 57 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Worked better for female faces than men’s Press http://mechanicalcurator.tumblr.com Posts image every 30 minutes http://www.flickr.com/photos/britishlibrary/ 1,020,418 images need tagging! Creative uses of images Face recognition Algorithms based on photos Mechanical Curator with an algorithmic brain (Circles, Squares and Slanty etc) http://goo.gl/qPPgxX Wikimedia Flickr Commons Individual URL & API Snipping out images from 65,000 Digitised Books* >800,000,000* views >17,000,000* tags https://goo.gl/FgZ4HM Work @ BL by Ben O’Steen, Labs and Digital Research Team*Matt Prior - http://goo.gl/j29Tnx Since Dec 2013 Tumblr *Estimates >More demand to see physical items
  • 58. 58 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Tagging, Tagging, Tagging…
  • 59. 59 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Tagging a million images Iterative Crowdsourcing http://goo.gl/j6fxac Cardiff University’s Lost Visions Project http://www.metadatagames.org/ Metadata Games James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers 18 hard core taggers How to reward and keep motivated this ‘small group? Average for ‘crowd’ is 1 tag per person What kind of ‘task’ can this ‘crowd’ do? Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
  • 60. 60 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Adam Crymble (2015) Crowdsource Arcade http://goo.gl/LBfJ4W http://goo.gl/OH9pOZ https://goo.gl/7z0j8p 30 mins talk Labs Symposium (2015) https://goo.gl/SSRsdd 5 min interview (2015) http://goo.gl/0APpE8 Game Jam Using Arcade Games to help Tag images ‘Art Treachery’ and ‘Tag Attack’
  • 61. 61 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Special Jury’s Prize (2015) James Heald – Wikimedia and Map work https://goo.gl/WYZCB2 http://goo.gl/HNQq5e https://goo.gl/VPgffL https://commons.wikimedia.org/ https://goo.gl/djtm1b Labs Symposium (2015)Geotagging maps 50,000 Maps Found in Flickr 1 million Human & Computational Tagging & Community engagement Geo-referencing work https://www.bl.uk/georeferencer
  • 62. 62 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds SherlockNet: Competition Winner 2016 Karen Wang, Luda Zhao and Brian Do Using Convolutional Neural Networks to Automatically Tag and Caption the British Library Flickr Commons 1 million Image Collection 12 categories >15.5 million tags added >100,000 captions bit.ly/sherlocknet Pooled surrounding OCR text on page from similar images Used Microsoft COCO (photographs) & British Museum Prints and Drawings collections as training sets. Tags Captions
  • 63. 63 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds http://goo.gl/dM8ieA Mario Klingeman (2015) Code Artist / Curator http://goo.gl/bNxGZZ Kris Hoffman (2016) Animation for Fashion Week 2016 https://goo.gl/QilqqT Jiayi Chong 2016 - Animation tool https://www.facebook.com/RealmlandStory/ Paul Rand Pierce 2016 Graphic Novel on Facebook Tragic Looking Women 44 Men who Look 44 (Notice the direction faces) A Hat on the Ground Spells trouble Artistic / Creative Works https://www.youtube.com/watch?v=Q3SBxO34Zlc David Normal 2014 and 2015 Collages/Paintings & Lightboxes
  • 64. 64 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Hey there Young Sailor! Ling Low 2016 – Hey there Young Sailor https://www.youtube.com/watch?v=bcOP1E5bRE0VIMEO.COM/SWEETANDLOWFILMS @SWEETNLOWFILMS ON INSTAGRAM @SWEETNLOWLING ON TWITTER The Impatient Sisters Play to fade!
  • 65. 65 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Imaginary Cities – BL Labs Project / Exhibition 16-18 (Michael Takeo Magruder) An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps
  • 66. 66 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Alanna Hilton British Fashion Colleges Council and Teatum Jones
  • 67. 67 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Careful of making conclusions based on ‘black box’ software & techniques (e.g. sentiment analysis, algorithms), learn the assumptions behind them first! Lessons Learned & Challenges… Beware of ‘Black Box’ software…
  • 68. 68 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Breaking Black Boxes – Melodee Beals
  • 69. 69 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Huge appetite to use digital content & data for anyone’s ideas! (e.g. Flickr Commons stats). Lessons Learned & Challenges… Huge demand for open digital content… https://goo.gl/yQ5s4U
  • 70. 70 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Labs mindset… 1. Labs tries to start a conversation, generate positive energy, encourages fun/play/experimentation and tries to support ideas. 2. Start with small experiments, use can be really simple, but OK to think big! 3. Fail faster (don’t be afraid) and persevere. 4. Reject perfectionism! Good enough is sometimes…good enough! 5. Services that allow useful exploration of cultural heritage data are rare! 6. Exploring data is difficult to do with large datasets and often requires specific skills and capabilities that many of our users don’t have – training or collaborations? 7. Celebrate the uses of digital collections, tell the world! 8. Success is sometimes all about the right people, place & right time… https://goo.gl/noASfl
  • 71. 71 @BL_Labs @BL_DigiSchol @LeedsHRI @UniversityLeeds Explore or Imagine Our Data! • CSV of Metadata https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv • 19th Century Books - Book Metadata - 01/09/2013. https://data.bl.uk/digbks/db21.html • Digitised Books - Flickr Tag History - Dec 2013 to March 2016. TSV https://data.bl.uk/digbks/db15.html • Digitised Hebrew Manuscripts - Metadata https://data.bl.uk/hebrewmanuscripts/heb1.html • Digitised Hebrew Manuscripts: Or 2210 - Or 2364 https://data.bl.uk/hebrewmanuscripts/heb8.html • Theatrical playbills from Britain and Ireland (OCR text only) https://data.bl.uk/playbills/pb2.html • Portraits of actors, views of theatres and playbills (covering 1750 - 1821 in a single volume) https://data.bl.uk/singlesheet/por1.html • Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements.1660- 1840. https://data.bl.uk/singlesheet/ad1.html https://data.bl.uk • Have a look at the data. • Data Quality • Issues Or an idea you have thought of what to do with the data! http://labs.bl.uk/Ideas+for+Labs Smaller datasets

Hinweis der Redaktion

  1. 90 seconds (270 words) I manage a project at the British Library called British Library Labs or ‘BL Labs’ for short. It’s made up of a team of 4 people and we also work occasionally with our Digital Research and Digital Scholarship colleagues. The project’s been running for over 4 years and is kindly supported by the Andrew W. Mellon Foundation and the BL. <CLICK> I am going to take you on a journey so that you learn about our experiences of working with the BL‘s digital collections. I will identify issues, challenges, problems and solutions we have encountered and look at the impact our work is having. I will show you how and why we have engaged with a range of people using our data, highlighting their work and findings, and present some of the lessons we have learned and examine the wider impact of the project on the Library and other organisations.<CLICK> A link to download my presentation appears on the bottom of each slide and for those of you using social media I have also included some relevant tags if you would like to tweet.
  2. 3 seconds (10 words) BL Labs encourages researchers, artists, entrepreneurs, educators and anyone else <CLICK>
  3. 73 seconds (220 words) BL Labs is based at the British Library in London, which was founded in 1973, though it’s origins stem back to the British Museum in 1753. It’s probably the largest research library in the world. The St Pancras site which you can see, opened in 1997 and stores many of the frequently requested items around the building including 4 stories below the ground floor. Much of the collection has been built up through legal deposit, where a copy of every UK and Ireland publication must be given to us.<CLICK> The St Pancras building can seat 1,200 researchers across 5 reading rooms where our readers can access collection items. We get around half a million visitors per year.<CLICK> Medium and long term requested items are held at Boston Spa 300km away from London, in huge ‘factory like’ storage facilities. For example, building 37 as pictured, is a low oxygen warehouse, using robots to retrieve items . Boston Spa also has a reading room and it takes 48 hours for requested items to get to London or vice versa. In total, the library has over 700 km of shelving across both sites, growing by 12 km every year. <CLICK> Finally, we also manage the public lending right at Stockton-on-Tees around 400km from London, which is the Author’s right to payment each time an in-copyright book is borrowed from a public library.
  4. 85 seconds The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes forget that the collections the British Library have are truly staggering! We currently estimate them to exceed <click>150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD…. So some big numbers… Over …<click>14 million books <click>60 million patents <click>8 million stamps <click>4 million maps <click>3 million sound recordings <click>1.6 million music scores <click>over .3 million manuscripts <click>0.8 million serials titles (which are of course made up of many many volumes/editions), this is where a lot of our content is, just in case you thought the numbers didn’t add up!
  5. 42 seconds (128 words) The Library focuses most of its work and collaborations through it’s 8 year Living Knowledge vision. Initiated in 2015, to coincide with the 50th anniversary of the creation of the Library, our vision is to make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023. The Library’s two core purposes are to build, curate and preserve the UK national collection of published, written and digital content and to support and stimulate research of all kinds.<CLICK> We also support businesses helping them to innovate and grow, engaging everyone with memorable cultural experiences, inspiring young people and learners of all ages and working with international partners around the world to advance knowledge and mutual understanding.
  6. 23 seconds (71 words) Though the project focusses on working and communicating with Digital Humanities and Digital Scholarship researchers, we have also engaged with amazing Artists, Librarians, Curators, Educators, Entrepreneurs, Archivists, Software Developers and other innovators. Hopefully, I will show you<CLICK> some inspirational examples of work they have done which have used our digital collections.<CLICK> I will also reflect on our experiences, challenges and lessons we have learned working with some amazing and pioneering people.
  7. 75 seconds (225 words) Here are the kinds digital research methods our digital scholars are using.<CLICK> For example, searching for items based on and time and location can reveal very interesting patterns, e.g. when and where works were published. Geotagging digitised objects, putting them in space can add new dimensions to the kinds of research questions we might want to ask. <CLICK> Corpus analysis of text in language and Text mining are methods which can find patterns in text through computational analysis.<CLICK> Tasks that require humans to use technology to complete a task that computers would hard fall under the area of Crowdsourcing and Human Computation<CLICK> Annotation involves augmenting an item with additional information, usually text.<CLICK> Similarly transcribing can be the conversion of speech into text through human or computing power to then be used for further analysis. <CLICK> Providing Application Programming Interfaces or APIs to data can be very powerful ways for computational access to datasets, used by software developers to build software applications for example. <CLICK> Many researchers want to see the patterns that are emerging in large amounts of data and are now using a number of very powerful tools to visualise them to see patterns. <CLICK> What is clear is that digital methods are much more that searching for an individual item in a catalogue and Libraries, publishers, service and content providers have to change to support that.
  8. 6 seconds (20 words) BL Labs focuses on getting people to experiment with its digital collections, things that are already <CLICK> born digital<CLICK> or digitised.
  9. 36 seconds (110 Words) <CLICK> In 2013, legal deposit was extended to cover non-print material, consequently we have been collecting UK websites through the UK Web Archive, e-books, e-journals, CDs, DVDs etc. As a result terabytes and billions of items are being archived at the BL every year. <CLICK> We are the headquarters of the Alan Turing Institute for Data Science and BL Labs is active research partner.<CLICK> We are also part of the Knowledge Quarter London Hub, comprising of 80 world class knowledge based organisations situated within a 1km radius of the BL, sharing ideas, best practice, meetings and events e.g.<CLICK> Companies such as Google and our sister organisation the British Museum to name a few.
  10. 24 seconds (72 words) The BL are world renowned experts in digitising materials from our physical holdings. One common misconception that many people have is that much if not all of our collections are digitised. So, the actual proportion of our collections that are digitised surprises many<CLICK> The figure is around 3% of our physical collections.<CLICK> Much of our digitisation activity happens through partnerships with commercial, philanthropic, charitable and foundation partners<CLICK> What is for certain, is the amount we are digitising is increasing rapidly. Our new programme called Heritage Made Digital for example prioritises those collections for digitisation where there is a clear researcher demand.<CLICK> One important thing we have learned is that researchers need to take heed when doing research based on our digital collections, as they are rarely complete, having gaps and not necessarily being representative of our physical collections.
  11. 76 seconds (228 words) So let’s have a very brief overview of our digital collections, datasets and derived data. <CLICK> We have thousands of playbills from theatres, cuttings from magazines, books and millions of newspaper pages digitised, including their Optically Character Recognised text.<CLICK> We have been using external platforms to host our digital collections because this is often a more effective way to make them more visible on the internet, such as Flickr and Wikimedia Commons. We have of course been helping develop the Qatar Digital Library, making digitised manuscripts available from the middle east to all. The International Dunhuang Project makes digitised manuscripts from China available. The Polonsky foundation is helping us make Hebrew Manuscripts accessible and we have thousands of geo-referenced historic maps as well as an online crowdsourcing geo-referencer tool.<CLICK> We are making millions of Library data available from UK and Irish National Library catalogues through our British National Bibliography service<CLICK> We can provide usage data from our readers. EtHOS holds all UK PhDs, either born digital or some digitised, and as previously mentioned the UK Web Archive.<CLICK> We have been recording English language TV news broadcasts since 2010 and archiving historic and current UK radio programmes.<CLICK> We have derived data from the Digital Music Lab project which analysed world and traditional music to look for similarities across countries, digitised sheet music and digitised environmental sounds, music and oral history.
  12. 56 seconds (169 words) Despite our digital collections being a small fraction of our physical holdings and over 85% only being available onsite, here are some ways you can find out about our openly licensed cultural heritage collections. <CLICK> First, on the Labs website we have created a guide pointing to over 100 digital collections. Then as of today, curators have created nearly 200 collections guides by subject, each one having a section on what is available digitally onsite and online if relevant.<CLICK> As part of the Labs project and overall data strategy for the Library we have created a data service, ‘data.bl.uk’ where users can download over 100 datasets. Importantly, it provides the ability to download entire collections instead of single items. Each collection is treated as a dataset with it’s own citeable Digital Object Identifier (D.O.I) for replicable research purposes. The site also includes derived data from experiments that have been carried out on our digital collections.<CLICK> Please note that not all of these datasets are discoverable on all search engines.
  13. 33 seconds (99 words) Given these challenges, the Library has to do lots of external engagement, to tell people what we have. Every year we have a roadshow around the UK and sometimes we get to go to other places in the world, such as Qatar, thank you Milena.<CLICK> We do this to partly ‘de-bunk’ the myths about the Library.<CLICK> And to show / tell researchers about the reality of our data.<CLICK> What we have learned is that researcher’s project ideas of what they want to do with our digital collections always change once they explore and see the reality of our data.
  14. 28 seconds (85 words) This what I imagine it feels like for a researcher looking for our physical collections. <CLICK> Everything is on an industrial scale and it can feel overwhelming. Sometimes it isn’t always straightforward to find our items, as there are many that are not on our digital library catalogue, e.g. still on card catalogues and some items are in the secret and very secure parts of the Library where you would need very special permission because the items are extremely valuable and fragile for example.
  15. 36 seconds (109 words) Our digital offering is perhaps like this.<CLICK> Imagine entering a boutique sweet shop. We have some lovely things to tempt you, but it’s much smaller than the hypermarket you just visited. The shop keeper tells you there are some things behind the back door in a giant warehouse. However, you will need special access to enter that space. She also states that there are rooms in that warehouse, even she isn’t allowed to look. She isn’t even allowed to share the full list of stock because there are items on there she may never be able to be see because they were meant to be secret.
  16. 26 seconds (66 words) So why is so ‘little’ digitised? Simply put, it costs money, time and resources to digitise physical materials to a professional standard. However, even though our digitised collections are a small fraction of our physical, combined with our born digital collections they still represent an impressive and sometimes un-imaginable amount of data.<CLICK> Currently we have 702 collections, ranging from the UK Web which includes billions of websites, to a collection of 130 digitised Chinese scroll maps. Some items on this list are confidential and require due diligence/risk management before we can tell the world about them.
  17. 35 seconds (106 words) Further analysis of our digital collections reveals that only 15% (that’s 105 collections) are openly licensed of which four fifths are available online. <CLICK> 85% of our digital collections are only available onsite. Each month, more collections are being made available under an open access license, through our ‘Access and Re-use’ committee, but this takes time, especially for collections that were digitised before 2012, when we didn’t have such a group.<CLICK> Here’s a breakdown of our digital collections by type, <CLICK> Our digitised collections include born digital, e-acquistions, and of course the results of many digitisation projects funded by public/private partnerships, some of which are still in progress.
  18. 41 words (125 seconds) Our work in Labs has taught us that it always pays for researchers to know the back ‘story’ of a digital collection especially if they want to use it for research and analysis.<CLICK> There are too many things to consider right now, but a few highlights are such as, ‘are there gaps in the collection?’, ‘can they still be accessed?’, but perhaps most important of all is whether the curator or a human being who knows about the collection is still around who could be asked about it. Our experience has told us that so much will probably be in their head that isn’t written down, information that could be vital, important and useful for knowing about before carrying out research or re-use.
  19. 11 seconds (24 words) Giving access to our openly licensed digitised materials is obviously much easier than<CLICK> Digital collections that are only available onsite such as those that are still within copyright to name one of many reasons.<CLICK>
  20. 9 seconds (28 words) So, how do we give access to onsite-only Digital Collections at the British Library? (that’s the 85% of our data).Well there are further challenges in doing this.
  21. 55 seconds (167 words) <CLICK>Sometimes digital content is only available onsite due to license restrictions, or even only on a specific computer in a reading room! Technically of course, there are actually very few reasons why digital content can’t be online, though it might be too big or it hasn’t been transferred from the original digital media device it was stored on, such as CD, minidisc, Vinyl for example.<CLICK> Sometimes, access is provided through a paywall. Finally, <CLICK> some content is in the happy sunny place, online, open and freely available to all of humanity. The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers who gave us the content in the first place. The Labs project has tried to address this problem by creating a ‘residency model’ where they are security cleared using hot desks in staff areas or trailing areas in the reading rooms <CLICK> for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions.
  22. 49 seconds (148 words) So what kind of conversations do we have with researchers who may want to use our digital collections and data?<CLICK> The dialogue typically can be: ‘Ah, you are ‘lucky’ & we have the exact digital content / data relevant to your research’, informally we call these our ‘lucky dip researchers’.<CLICK> Or the conversation might go like this…’Ah, we don’t exactly have what you are looking for, but here is what we do have, is there anything of interest that you like? Let’s talk…<CLICK> We have learned that engagement can be hard work. But it’s constantly required to maintain interest in our digital collections because they aren’t all instantly discoverable on search engines.<CLICK> We also tend to attract researchers with ‘fuzzier’ and ‘flexible’ research boundaries and those who are possibly open to more interdisciplinary / collaborative research.<CLICK> Finally, we have found that artists find this dialogue easier.
  23. 12 seconds (37 words). In another way, we are trying to match our audiences research needs and digital interests <CLICK> With the digital collections we have<CLICK> It is at this intersection where Labs works best and it usually starts with a conversation.
  24. 24 seconds (72 words) Let’s look a little further at the types of interactions we have with our researchers. We have summarised these phases as ‘Exploration’ where people often ‘rethink’ their ideas of what they want to do with the data, ‘Query-Focused’ where they often have to iterate to come up with a realistic proposal of what they want to do and a ‘Wrap-up’ phase to end their project with us, if it is relevant.
  25. Examples from the Cooper Hewitt collection. I spent 3/5 of my time at the Cooper Hewitt just trying to get the data clean enough to vaguely represent the collection. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places. Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways. More common GLAM issues - What year is 'early 18th century'? What do you do with '1836 (probably)'?
  26. Open Refine is an amazing tool, and I wouldn't have gotten anywhere at Cooper Hewitt without it. It will suggest ways to make the data more consistent. You can then export the data and keep working on it in other tools, or put it into Open Refine. Because Refine runs locally it can be used for sensitive data you mightn't put online. One issue is that GLAMs tend to use question marks to record uncertainty in attribution, but Refine strips out all punctuation, so you have to be careful about preserving it (if that's what you want). Takes in TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML, and Google Data documents. http://freeyourmetadata.org/cleanup/ useful advice
  27. 39 Seconds (117 words) We have been learning that characterising our data is a really valuable way for researchers to begin to understand what we have. Though this is pretty resource intensive, we have carried out some simple experiments. <CLICK> Here, you can see that an analysis of our catalogue data reveals the use of different versions of the Dewey Decimal System across the years.<CLICK> Secondly, in the left column you can see what looks like random data/noise. However, when grouped, we can see the dark blue visualisation indicates there is some similarity in the data, in this case it was subtitles from digitised TV broadcasts.<CLICK> We know this is something we should do more of, if we had more resources.
  28. 21 Seconds (65 Words) Katrina Navickas was particularly interested in the <Click>Chartist Movement who were a group who were campaigning for the vote for working people. <Click>They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common<Click>She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. <Click>and hopefully unearthing new history.
  29. Posts small illustrations taken almost at random from the digitised book corpus to a Tumblr blog. This experiment with undirected engagement was a by-product of work to uncover the hidden wealth of illustrations within the digitised pages.
  30. 27 Seconds (82 Words) Adam Crymble <Click>wanted to harness the power of playing fun games on arcade machines to help with crowdsourcing the tagging of un-described images. He particularly wanted to engage a younger audience into crowdsourcing .<Click>On the right you can see a replica 1980’s arcade machine we built and <Click>and on the bottom left some tagging games that were developed through a ‘Games Jam’ for the machine. <Click>. Let’s take a closer look at two of the games…<Click>
  31. 18 Seconds (56 Words) Indexing BL the 1 million & Mapping the Maps – was led by James Heald and collaboration with others <Click>They produced an index of 1 million 'Mechanical Curator collection' images on <Click>Wikimedia Commons from a collection of largely un-described images. <Click>This gave rise to finding 50,000 maps within the collection partially through a map-tag-a-thon <Click>These are now being geo-referenced. <Click>
  32. 15 seconds (47 Words) Start a conversation, generate positive energy, be nice, have fun and try to support ideas.<CLICK> Start with small experiments, but think big! <CLICK> Fail faster (don’t be afraid) and persevere. <CLICK> Reject perfectionism! Good enough is sometimes…good enough! <CLICK> Celebrate the uses of digital collections, tell the world!