SlideShare ist ein Scribd-Unternehmen logo
1 von 122
1
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon Foundation
Mahendra Mahey
Experiment with our
Digital Collections
Mahendra Mahey
Manager of BL Labs
Working with the British Library’s Digital Collections & Data:
Insights from British Library Labs & a new role for Libraries
10:00 to 12:45, 26 March 2018
BL Labs Roadshow 2018
CityLIS, London
UK.
Running since March 2013
Core Team
• Adam Farquhar (PI)
• Mahendra Mahey
• Ben O’Steen
• Eleanor Cooper (0.5)
2
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
The British Library
Inside the British Library
Space for 1200 readers, around 500,000 visitors per year
Building 37 uses low oxygen and robots
Reading room and delivery to London
Many items stored at Document Supply and Storage centre 48 hours away
Stockton-on-Tees
Author right to payment each time their books
are borrowed from public libraries.
St Pancras, London, UK
Many books are stored 4 stories below the building
UK Legal Deposit Library – Reference only
Founded in 1973 though origins stem back to British Museum Library 1753
Boston-Spa
3
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Collections – not just books!
> 180*million items
> 0.8* m serial titles
> 8* m stamps
> 14* m books
> 6* m sound recordings
> 4* m maps
> 1.6* m musical scores
> 0.3* m manuscripts
> 60* m patents
King’s Library *Estimates
4
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Living Knowledge Vision (2015 – 2023)
Custodianship Research Business
Culture Learning International
To make our intellectual heritage accessible to everyone,
for research, inspiration and enjoyment and be the most open, creative
and innovative institution of its kind by 2023 (50 year anniversary).
Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK
Roly Keating (Chief Executive Officer of the British Library)
To make our intellectual heritage accessible to everyone,
for research, inspiration and enjoyment and be the most open, creative
and innovative institution of its kind by 2023 (50 year anniversary).
5
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Wider…not just Digital Humanities
Researchers
Researchers
https://goo.gl/WutNyi
Artists
http://goo.gl/nNKhQ2
Librarians
Curators
https://goo.gl/9NWZUW
Software Developers
https://goo.gl/7QQ5Tf
Archivists
https://goo.gl/x7b4tg
Educators
https://goo.gl/qh01Mi
6
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
• An area of scholarly activity, born from humanities computing, at the
intersection of computing/digital technologies and the
humanities.
• The field both employs technology in the pursuit of humanities
research, and subjects technology to humanistic questioning and
interrogation.
• DH is collaborative, crossdisciplinary, and computationally
engaged research, teaching, and publishing.
https://en.wikipedia.org/wiki/Digital_humanities
Defining digital humanities (DH)
7
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Digital research methods
Digital Scholarship
Visualisations
Application Programming Interfaces (APIs)
for datasets e.g. Metadata, Images, etc
Transcribing
Annotation
Location based searching & Geo-tagging
Corpus analysis, Text Mining &
Natural Language Processing
Crowdsourcing
Human Computation
In 20 years time?
8
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
What about Digital?
Born Digital Digitised
9
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
/
Knowledge Quarter London
80 knowledge organisations (as of 26/03/18) within 1 mile radius of
Kings Cross, http://www.knowledgequarter.london
http://www.turing.ac.uk (Headquartered at the British Library)
UK Web Archive and e-legal deposit (2013)
http://www.webarchive.org.uk/ukwa/
Born digital
Data all around us at
Kings Cross!
Born digital
Data all around us at
Kings Cross!
Born digital
Data all around us at
Kings Cross!
10
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
All our physical
items are digitised
right?
11
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
#bldigital
1-2 %* digitised
* estimate
Digitisation
Partnerships
Commercial & Other Organisations
Amount
increasing rapidly
Bias in digitisation
http://goo.gl/bR9UJL
Sample Generator
12
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Playbills, Books, Newspapers
(includes Optical Character Recognition (OCR))
Digital collections and Datasets
British National
Bibliography
http://bnb.data.bl.uk
http://sounds.bl.ukhttp://dml.city.ac.uk/
Music (Recordings & Sheet) & Sounds
http://goo.gl/frSMJt
Broadcast News (TV and Radio)
http://goo.gl/cwThHw
http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ
Usage data
EtHOS
Web ArchiveImages, Manuscripts & Maps
http://www.qdl.qa/
Qatar Digital Library
http://idp.bl.uk/
International
Dunhuang
Project
Maps
http://www.bl.uk/maps/
Hebrew Manuscripts
http://goo.gl/4sbCp9
Flickr &
Wikimedia Commons
https://goo.gl/LZRmaZ
13
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Finding Open Cultural Heritage Datasets
Collection Guides (199 as of 26/03/2018)
https://www.bl.uk/collection-guides/
Datasets about our collections
Bibliographic datasets relating to our published and
archival holdings
Datasets for content mining
Content suitable for use in text and data mining
research
Datasets for image analysis
Image collections suitable for large-scale image-
analysis-based research
Datasets from UK Web Archive
Data and API services available for accessing UK Web
Archive
Digital mapping
Geospatial data, cartographic applications, digital aerial
photography and scanned historic map materials
https://data.bl.uk
Download collections as zips, no API
Each dataset has a Digital Object Identifier (DOI)
can be referenced for research
Not all discoverable via
search engines!
14
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
How are we
doing this?
15
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Competition
Awards
Projects
Tell us your ideas of what to do with our digital content
Show us what you have already done with our digital
content in research, artistic, commercial and learning and
teaching categories
Talk to us about working on collaborative projects
16
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
• The Library has to go out to meet researchers, regularly and
cyclically to tell them what we have and learn what they
want to do
• Debunk ‘myths’ about the Library
• Show researchers the reality of our data
• Their ideas always change once they see the data!
https://goo.gl/esqpRb
Lots of two-way communication!
BL Labs runs annual ‘Roadshows’
around the UK
17
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Have you got X?
https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg
Looking for Physical Content in the British Library
18
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Have you got X digitised / in digital form?
http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg
Looking for Digitised / Digital Content in the BL
19
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
•Digitisation costs time, resources & access
can depend on restrictions imposed by
funders, legal, ethical, practical etc. …
•Still…700 Digitisation projects
(as of 26/03/2018)
But not all found through search
engines or even online!
So little digitised…why?
© £ 
20
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
https://goo.gl/qpCLlk
https://goo.gl/wMTS3Z
• Dialogue typically:
– you are ‘lucky’ & we have the digital content
/ data relevant to your research
– we don’t have exactly what your looking for,
but is there anything of interest? Let’s talk…
– engagement is hard work and it’s constantly
required to maintain interest in our digital
collections!
• Artists find this dialogue easier…
• We also tend to attract researchers with ‘fuzzier’
research boundaries and possibly open to more
interdisciplinary / collaborative research
What engagement does the BL have with
researchers wanting use our digital content?
21
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Overview of interactions with the
“researcher”
www.bl.uk
Phase 1: Exploration
• Exploration phase allows a researcher to:
• understand the data in an open-ended fashion,
• discover potential tools to work with the data,
• gain awareness of their capabilities and limitations,
• develop a firmer research query and
• gauge the costs, risks and time needed.
• Outputs of the exploration are not intended to be shareable,
beyond personal experience and key features (data size, formats, tool
successes, etc).
22
www.bl.uk
Phase 2: Query-Focussed
• “Query-Focussed”, the familiar project but due to phase 1:
• A firmer and more informed query by the researcher
• Suitable datasets already lined up
• A good idea of the initial toolset and capabilities (human and computer)
required
• Project output is outlined, and relevant reuse applications are begun.
• Clear agreements on what happens at the end of the project – data
deletion, virtual machine deletion/archiving/etc.
• Project may iterate on initial ideas, depending on researcher’s cost/risk
appetite
23
www.bl.uk
Phase 2: Query-Focussed
• Wrap-up
• Work (code, notes) exported and given to researcher
• All derivative data is licenced or retained based on reuse agreements
(Access & Reuse board, etc)
• Provisions made for the project are wound-down, as agreed (derivative
data deleted after a grace period, etc)
24
25
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Openly Licensed Digital Content?
15% Openly
Licensed
Around 80%*
available online
Working through to make more open…
Though some collections will always only be available onsite due to
various reasons including legal, ethical etc
Breakdown by collection*
Manuscripts 59%
Books 9%
Maps and Views 7%
Newspapers 3%
Archives and Records 3%
Paintings, Prints and Drawings 2%
*Based on number of digitisation projects (700 as of 25/13/18)
Largest proportion of funding
Public / Private Partnership
15 %* Openly Licensed – most online
85 %* Available onsite only at the moment
*Estimates
26
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
The Story of the Digital Collection…
Digital
Collection
Curator
Who paid for the digitisation?
Who did the digitisation?
Technology used
Born digital?
Published
Unpublished
Where is it?
Can it still be accessed?
Generates income
Reputational risk in using?
Legalities
Politics when digitised
Personalities involved
Surprises (e.g. gaps)
Descriptive information
Old format not supported
What media was the
digitisation done from?
Is there any background documentation?
No Descriptive information
Inconsistent descriptive information
Still there?
Good to know the background ‘Story’ of a Digital Collection’
if you want to use it for research and make conclusions…
27
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Open Content vs Onsite Only Access
• Access easier for openly licensed content
• More challenging for on-site, in-copyright, non-print legal
deposit, data protected, old content media & contemporary
material (post 1877)
https://goo.gl/Y5zCXg
©
28
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
only in
Reading
Rooms due
to ©
only on
site due to
© or
ethical etc
not online /
available –
various storage
devices,
personal data
online
and open
British Library
online
behind
paywall
Challenges of access to Digital Collections
29
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
How do we give access to
onsite-only
Digital Collections
(85% of our Digital Collections)?
30
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
READING
ROOM
ON
SITE
NOT
ONLINE
OPEN
British Library
£
Labs Residency Model
Challenges of access to Digital Collections
31
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Accessing digital collections onsite
OPEN
£
• Have to be ‘onsite’ (interpretations vary)
• Need to be ‘security cleared’ ‘trusted’ for some collections
– Hence ‘Researcher in Residence Model’
• Permission required (depending on ‘story’ of collection)
• Content could be on various media formats
(not always online)
• 5 - 20 % re-use of material for non commercial research for
some collections, depends on agreements in place
• We are learning ‘pathways’ so that this becomes ‘everyday’ to
provide onsite access to some digital collections in the future
32
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Why are we doing this?
33
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Why are doing this? (1)
Working closely with and listening
to those who want use our digital
collections and data for their work
https://goo.gl/esqpRb
34
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
We can learn how we are and should be supporting them and
this therefore shapes the problems we work on, such as:
https://goo.gl/esqpRb
Why are doing this? (2)
• Access to digital collections / data?
• Advice, guidance, technical
support, training
• Services, Tools and Processes?
• Many more reasons…
35
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Where are the gaps between what users want & what we can
give?
How do we build the bridges to overcome the gaps?
Why are doing this? (3)
https://goo.gl/6CwCeE
36
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
How do we help users ‘navigate’ their way through the
‘maze’ of the Library to what they want to do?
Sometimes requires understanding the culture of the organisation
https://goo.gl/62JnQT
Why are doing this? (4)
37
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
What did people
actually do?
Examples from Text and Images
Over 100 examples (including sound, video):
http://labs.bl.uk/Ideas+for+Labs
http://labs.bl.uk/Other+Uses+of+Collections
38
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Example Pattern of Research
1, 2, 3
1. Find / identify new things in messy stuff
2. Unlock hidden history / data
3. Celebrate new discoveries
39
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Finding / identifying invisible / well hidden
things in ‘messy’ historical data
https://goo.gl/mcpa8B
Not the British Library!
Example Pattern of Research 1
40
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– U.S.
– U.S.A
– U.S.A.
– USA
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.
41
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Open Refine
42
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
• Cultural heritage records contain uncertainty and fuzziness (e.g. date ranges, multiple
values, uncertain or unavailable information)—Curators and staff at institutions often
have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can
have a big impact depending on what you’re doing)
• Optical Character Recognition in particular is an imperfect art-need to consider how
bad it is, how this might effect your findings, and what needs doing to mitigate it.
• Keeping data clean, organised, open and described well will not only make your life
easier, but enable its widespread re-use beyond and increase future impact. (Datasets
you’ve created in the course of your research projects could even be used to enhance
national collections!)
• Decisions always need to be made while normalising information for visualisation.
Documenting them is important for your research but also future re-use!
• Is your aim enquiry or presentation? All of this will have an impact on the tools and
data cleaning choices you make.
Things to consider: Data + Tools
43
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
44
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
#digitalhumanities
dancohen/lists/digitalhumanities
@ProfHacker
@Dhnow
@BL_DigiSchol
And more links to resources here: http://scottbot.net/teaching-yourself-to-code-in-dh/
45
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Unearthing / unlocking
hidden histories & data
to stimulate new research
https://goo.gl/vJ291F
It’s an
18th Century Poem!
Example Pattern of Research 2
46
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Celebrating hidden histories / data
creatively through events, art &
performance
https://goo.gl/Ql0Bwz
Re-enacting, re-discovering history
Example Pattern of Research 3
47
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Experiments with Text
48
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
https://goo.gl/oUNj5N
https://goo.gl/ImAUv4
Finding things in ‘messy’
Optical Character Recognised (OCR) text
Mrs Folly
• Clean up some manually
• Get human ‘ground truth’
• Write computer code (sometimes
it’s machine learning) to find
things reliably in it ‘automatically’
• Try code on messy content
• Tweak if necessary
• Digital ‘lasso’ around content
• Human sift through
Mrs Folly
An example pattern of research
49
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Looking through a rubbish bin?
https://goo.gl/UeEvqs
Good stuff! Some Rubbish
50
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Let’s use Text and Data Mining
Machine Learning / Reading!
51
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Smell of soup & Machine Learning
Thanks to Memo Akten (@memotv on twitter) for the inspiration!
https://goo.gl/toq4Bo
Nasreddin, 13th Century Turkish Sufi
http://web2.uvcs.uvic.ca/elc/studyzone/330/reading/smell1.htm
52
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Machine Learning / Reading
Analogies to how humans read / learn
Machines acquire ‘knowledge’ / data, use that
knowledge / data to make sense / identify patterns
https://goo.gl/k68fTf
https://goo.gl/gXmVQL Can you see the bird?
53
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Need to stress still requires computational
& human effort…
Smart data needs smart people!
https://goo.gl/gDQEAz
Labs doing this on a case by case basis
so methods can vary
Machine Learning / Reading still
requires ‘Human Effort’!
54
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Legalities of Machine Learning /
Text and Data mining
https://goo.gl/toq4Bo
Legalities of Machine Learning / Text and Data
mining still up for discussion…Often misunderstood
Is it the same as humans reading and looking for
patterns…just a bit quicker?
55
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
http://victorianhumour.tubmblr.com
Victorian Meme Machine (2014)
https://goo.gl/HMqDt3
Bob Nicholson
http://victorianhumour.tumblr.com/
Bob Nicholson interviewed on
BBC Radio 4 Making History Programme:
http://goo.gl/fmV9ep
And telling jokes to the public:
http://goo.gl/xIDRhz
Bob obtained further funding from his university
Looking for more collaborations
https://www.youtube.com/watch?v=-GRgj7Q5OM0
Rob Walker, Victorian Mother-in-law Jokes
Victorian Comedy Night, 7 Nov 2016
Learnt about access paths
to digital collections
56
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Play from 1m 5 seconds t o 1m 31 seconds
57
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Katrina Navickas (2015)
Political Meetings Mapper
http://politicalmeetingsmapper.co.uk
https://goo.gl/Qq78Oa
Labs Symposium 2015
https://goo.gl/BSA3be
Interview 2015
The Chartist Newspaper
http://goo.gl/vOLSnH
Chartist Monster Meeting
Chartists Walking Tour and
Re-enactment London
Learnt that domain knowledge
reduces noise
58
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Play for 36 seconds
59
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Black Abolitionist Performances & their
Presence in Britain (2016) – Hannah-Rose Murray
Frederick
Douglass
Ellen
Craft
Josiah
Henson
Ida B
Wells
A Performance by
Joe Williams &
Martelle Edinborough
http://frederickdouglassinbritain.com/
Started to implement
Machine Learning Techniques
60
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Data-mining verse in 18th Century newspapers
BL Labs Project 16-17, Jennifer Batt
https://goo.gl/5Akthd
Slides courtesy Jennifer Batt
61
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
What thoj' among ourrelves, with too much Heat, or t
W: fweutimes.wongle, wvhen we Ihould debate, W –
(A confequential Ill which Freedom drawvs, fl t
A bad Efficf, but from a noble Caufe) t
We can with univeifal Zcal advance, to
To cutb the faithlefs Arrogancccof V rance. hi
Dublin Journal, 10-14 September, 1745 Slides courtesy Jennifer Batt
62
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Verse: 81% lines begin with
initial capital
Prose: 52% lines begin with
initial capital
Westminster Journal 3 March 1745
Slides courtesy Jennifer Batt
Started to refine
Machine Learning Techniques
Jennifer Batt @ the BL on World Poetry Day
‘40,000’ things found…
Possibly using Gale Primary
Sources interface to see if we
can sift this data
63
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Use of Overproof
OCR Correction?
Re-OCR with
ABBY FineReader?
https://www.abbyy.com/en-gb/
http://overproof.projectcomputing.com/
RE-OCR
Cleaning up OCR Text – significant improvement
up (depending on original image quality)
64
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Virtual Infrastructure for OCR text
OCR text ‘scraped’ from
digitised newspapers
and put in internal cloud
Jupyter notebook
Write python code and results
in web browser
http://jupyter.org
Access available for researchers ‘in residence’
https://www.docker.com/
http://dhbox.org/
65
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Experiments with Images
66
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
65,000 digitised 19th Century books
Image: Artwork by Alicia Martin 2007 / 2008
Paid for by:
For a full list:
https://goo.gl/HqPQMS
Subjects include:
Philosophy
Poetry
History
Literature
1789 - 1876
67
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
30 August 2012
68
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
002819694
Unique number
69
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
70
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
71
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
OCR XML Generated by ABBY Fine Reader
Optical Character Recognition
72
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Images from books captured too!
73
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
We did some of our own
experiments…do as we tell others!
Experiment with our
Digital Collections@BL_Labs
74
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Ben O’Steen of @BL_Labs after Hack Event, August 2013
75
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Ben O’Steen of @BL_Labs after Hack Event, August 2013
76
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Ben O’Steen of @BL_Labs after Hack Event, August 2013
77
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
78
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
79
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
80
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
81
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
82
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
1,020,418
images
needed identifying!
83
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
One major problem!
•We know about the books these images come
from but we know nothing about the actual
images!
•How will we identify them?
•How will we find them later?
•How can we do that with 1 million images?
•Try a few experiments!
84
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Running face recognition
on the images
Face Recognition Algorithm
Trained on Photographs

Late August 2013
85
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Face Recognition
Algorithms worked
better for female
faces than men’s
86
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
The Mechanical Curator
Snipped image posted
almost randomly
every hour…
on a Tumblr blog
One of our early followers was…
Ben O’Steen, 30 September 2013
Has a slight ‘mood’…
once image published,
tries to find 8 similar images
e.g. ‘slanty’, ‘circular’ etc.
& then gets ‘bored’
follow…
@MechCuratorBot
87
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Where to put the 1 Million images?
• Internal platform? Would take too long!
• BL Labs is a time limited project, needed evidence of impact
and good example to show the Library what we need, fast!
• External platform? Put our content where the ‘light’ is!
• Wikipedia / Wikimedia –no! Not enough metadata for the
images.
• Another platform that meets our needs…
• Flickr Commons
88
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Flickr Commons (100 + GLAMs as of 08/12/17)
89
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
90
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
British Library Flickr Commons
Why Flickr Commons?
• Free!
• Each image has it’s own unique web address, easy to share
• Can Tag images
• Has Application Programming Interface (API)
Late August 2013
91
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Opportunities
– increasing traffic to Library services
You can purchase
a ‘High Res’ Copy
View in the
Library Item Viewer
Download .pdf
All illustrations
in book
Other illustrations in books
Published in same year
View the item in
the Library Catalogue Tags auto generated
User generated
Tag
Grouping for image
92
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Worked better for female faces than men’s
Press
http://mechanicalcurator.tumblr.com
Posts image every 30 minutes
http://www.flickr.com/photos/britishlibrary/
1,020,418 images
need tagging!
Creative uses of images
Face recognition
Algorithms based on photos
Mechanical Curator
with an algorithmic brain
(Circles, Squares and Slanty etc)
http://goo.gl/qPPgxX
Wikimedia
Flickr Commons
Individual URL & API
Snipping out images
from 65,000 Digitised Books*
>800,000,000* views
>17,000,000* tags
https://goo.gl/FgZ4HM
Work @ BL by Ben O’Steen, Labs
and Digital Research Team*Matt Prior - http://goo.gl/j29Tnx
Since Dec 2013
Tumblr
*Estimates
>More demand to see
physical items
93
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Tagging, Tagging, Tagging…
94
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Tagging a million images
Iterative Crowdsourcing
http://goo.gl/j6fxac
Cardiff University’s
Lost Visions Project
http://www.metadatagames.org/
Metadata Games
James Heald
Mario Klingemann
Chico 45
Use computational methods
Human Tagger
Top British Library Flickr Commons Taggers
18 hard core taggers
How to reward and keep motivated this ‘small group?
Average for ‘crowd’ is 1 tag per person
What kind of ‘task’ can this ‘crowd’ do?
Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
95
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Adam Crymble (2015)
Crowdsource Arcade
http://goo.gl/LBfJ4W
http://goo.gl/OH9pOZ
https://goo.gl/7z0j8p
30 mins talk
Labs Symposium (2015)
https://goo.gl/SSRsdd
5 min interview (2015)
http://goo.gl/0APpE8
Game Jam
Using Arcade Games
to help Tag images
‘Art Treachery’ and ‘Tag Attack’
96
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Play for 42 seconds
97
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Special Jury’s Prize (2015)
James Heald – Wikimedia and Map work
https://goo.gl/WYZCB2
http://goo.gl/HNQq5e
https://goo.gl/VPgffL
https://commons.wikimedia.org/
https://goo.gl/djtm1b
Labs Symposium (2015)Geotagging maps
50,000 Maps
Found in Flickr 1 million
Human & Computational Tagging
& Community engagement
Geo-referencing work
https://www.bl.uk/georeferencer
98
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
SherlockNet: Competition Winner 2016
Karen Wang, Luda Zhao and Brian Do
Using Convolutional Neural Networks to Automatically Tag and Caption
the British Library Flickr Commons 1 million Image Collection
12 categories
>15.5 million tags added
>100,000 captions
bit.ly/sherlocknet
Pooled surrounding
OCR text on page
from similar images
Used Microsoft COCO (photographs) &
British Museum Prints and Drawings
collections as training sets.
Tags Captions
99
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Artistic / Creative Works
http://goo.gl/dM8ieA
Mario Klingeman (2015)
Code Artist / Curator
https://www.youtube.com/watch?v=Q3SBxO34Zlc
David Normal 2014 and 2015
Collages/Paintings & Lightboxes
http://goo.gl/bNxGZZ
Kris Hoffman (2016)
Animation for Fashion Week 2016
https://goo.gl/QilqqT
Jiayi Chong 2016 - Animation tool
https://www.facebook.com/RealmlandStory/
Paul Rand Pierce 2016
Graphic Novel on Facebook
Tragic Looking Women
44 Men who Look 44
(Notice the direction faces)
A Hat on the Ground
Spells trouble
100
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Imaginary Cities – BL Labs Project 16-18
Michael Takeo Magruder
An artistic exploration seeking to create provocative fictional cityscapes for the Information Age
from the British Library’s digital collection of historic urban maps
101
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
British Fashion Colleges Council and
Teatum Jones
Alanna Hilton
102
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Lessons Learned & Challenges…
It all starts from a conversation
• Start with a conversation, our data isn’t visible highly on search engines
(yet!) & not easy to find. Need to create and embrace serendipity &
opportunities for use by talking!
• Need to have several conversations with several stakeholders & tap into
their tacit knowledge that isn’t always written down sometimes to progress
ideas.
• Often misunderstandings because of jargon & different meaning of words.
https://goo.gl/XaHYT9
?
Audience
research &
Digital
interests
Digital
collections
we have
This is where Labs works
103
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Expectations change when researchers
actually see the data, systems &
experience the ‘culture’ of the organisation.
https://goo.gl/ytmWnu
Lessons Learned & Challenges…
Expectations change with researchers see the data
104
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Opening & using digital collections occasionally
requires a need to let go of the emotional &
psychological connection to them
Lessons Learned & Challenges…
Letting ‘go’ of the connection to collections…
https://goo.gl/OYAsmK
105
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Embrace dirty data, it may never be perfect!
Cleaning data is hard work & resource consuming!
Lessons Learned & Challenges…
Embrace ‘dirty data’…use it or clean it!
https://goo.gl/mcpa8B
Not the British Library!
106
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Careful of making conclusions based on
‘black box’ software & techniques (e.g.
sentiment analysis), learn the assumptions
behind them first!
Lessons Learned & Challenges…
Beware of ‘Black Box’ software…
107
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
We tend to work with researchers who can be
‘flexible’ with their research questions & are
willing to embrace challenges, collaborate, the
pioneers!
Lessons Learned & Challenges…
Work with ‘flexible’ collaborative researchers…
https://goo.gl/wMTS3Z
108
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Many researchers have the domain knowledge
but lack technical / digital skills to use Digital
Research methods.
Should they be teamed up with those that want
to solve problems or get trained?
Lessons Learned & Challenges…
Digital skills training needed for Humanities researchers…
https://goo.gl/i5GVfI
https://goo.gl/kwcK8J
109
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Huge appetite to use digital content & data
for anyone’s ideas!
(e.g. Flickr Commons stats).
Lessons Learned & Challenges…
Huge demand for open digital content…
https://goo.gl/yQ5s4U
110
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Labs mindset…
1. Start a conversation, generate positive energy,
be kind, have fun and try to support ideas
2. Start with small experiments, but think big.
3. Fail faster (don’t be afraid) and persevere.
4. Reject perfectionism! Good enough is
sometimes…good enough!
5. Celebrate the uses of digital collections
https://goo.gl/noASfl
111
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
https://goo.gl/SUOO0J
The Magic of Openness!
• If digitised / digital collections are not
used, what is the point of digitising /
keeping them (i.e. apart from
preservation)?
• Opening up our digital collections offers
new ways for the Library’s content to be
re-discovered, remixed, re-imagined and
‘re-energised’
• Generates plenty of examples to inspire
use by others!
112
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Hey there Young Sailor!
Ling Low 2016 – Hey there Young Sailor
https://www.youtube.com/watch?v=bcOP1E5bRE0VIMEO.COM/SWEETANDLOWFILMS
@SWEETNLOWFILMS ON INSTAGRAM
@SWEETNLOWLING ON TWITTER
The Impatient Sisters
Play to fade!
113
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Explore or Imagine Our Data!
• CSV of Metadata
https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv
• 19th Century Books - Book Metadata - 01/09/2013.
https://data.bl.uk/digbks/db21.html
• Digitised Books - Flickr Tag History - Dec 2013 to March 2016.
TSV
https://data.bl.uk/digbks/db15.html
• Digitised Hebrew Manuscripts - Metadata
https://data.bl.uk/hebrewmanuscripts/heb1.html
• Digitised Hebrew Manuscripts: Or 2210 - Or 2364
https://data.bl.uk/hebrewmanuscripts/heb8.html
• Theatrical playbills from Britain and Ireland (OCR text only)
https://data.bl.uk/playbills/pb2.html
• Portraits of actors, views of theatres and playbills (covering
1750 - 1821 in a single volume)
https://data.bl.uk/singlesheet/por1.html
• Volumes of Lysons Collectanea (Amusements), comprising
broadsides, cuttings, advertisements on amusements.1660-
1840.
https://data.bl.uk/singlesheet/ad1.html
https://data.bl.uk
• Have a look at the data.
• Data Quality
• Issues
Or an idea you have thought of
what to do with the data!
http://labs.bl.uk/Ideas+for+Labs
Smaller datasets
114
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
The Future of BL Labs
• Continue to engage with researchers
• Learn what they want to do
• Collect evidence of demand
• Develop Business Model and Support
process to make ‘Business as Usual’ at
the British Library
• Help to create pathway to developing
a ‘Digital Research Suite’ at the
British Library by 2019
http://www.library.pitt.edu/digital-scholarship-services
https://goo.gl/W4TjGt
• Many are being ‘inspired’ by our
model…
115
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Business Model Canvas
https://www.youtube.com/watch?v=QoAOzMTLP5s
116
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Business Model Canvas – Digital Research Lab – British Library
Key Partners Key Activities Value Propositions Customer
Relationships
Customer Segments
Key Resources Channels
Cost Structure Revenue Streams
http://www.businessmodelgeneration.com
117
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Alejandro: Master’s student in Comparative
Literature (Researcher Technical)
• Alejandro wants to speak to somebody about working on a project which will build and analyse a corpus from digitised
literary works and newspaper articles first published from 1950 to the present day. He is particularly interested in how
humorous prose of PG Wodehouse has permeated literary works.
• Alejandro is confident in using digital research methods and tools to help with his research as his undergraduate degree
was in Computer Science. He is a talented programmer, having contributed to a number of Open Source projects and
winning a number of software development competitions at university.
• Research Question/Enquiry: How has the humorous prose of PG Wodehouse permeated modern literary works?
• Services Required:
• Access to the relevant digital collections and data for his research.
• Advice on the potential legal restrictions on carrying out his work.
• Ability to create and store the corpus and use computational tools to analyse it.
• Share the results under an open access license
• Download the final results of his research, and cite the derived datasets on data.bl.uk.
118
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Rosslyn: PhD student in Gender Studies
(Researcher Non-Technical)
• Rosslyn is a gender historian developing research questions around how gender-specific terms are used in writing and how they
may change over time. She thinks she would like to use the UK Web Archive as one of her corpora for this and has a vague idea
that she may need to employ digital research methods after reading an article about natural language processing.
• Rosslyn has never used these methods before and has basic digital skills (e.g. MS Office). Rosslyn would like to sit down with
someone to discuss her idea and get guidance about moving this project forward. She is happy to collaborate with other
researchers.
• Research question/Enquiry: How do gender-specific terms change over time in the UK Web Archive?
• Services Required:
• Access to expert information, advice and training on the appropriate digital research methods to use for this project.
• Case studies and connections / collaborations with other researchers who can help her understand their approaches and
experiences using BL digital collections and data for similar projects.
• Getting a clear understanding of the UK Web Archive, how it can be used, its limitations / challenges and whether it is the
appropriate corpus to carry out her research.
• Ability to create and store the corpus and use computational tools to analyse it.
• Share the results under an open access license.
• Download the final results of his research, and cite the derived datasets on data.bl.uk.
119
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Discussion
• Have a look at the Personas
• Write your own
• We will discuss your needs
• Choose one for Value Proposition
120
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Value Proposition Canvas
Customer SegmentValue Map
121
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Customer Segment
• Customer Jobs
– What are customers trying to get done in their
work?(Important – Insignificant)
• Pains
– What are the bad outcomes, risks, obstacles related to
customer jobs? (Extreme – Moderate)
• Gains
– What outcomes do customers want to achieve or concrete
benefits they are seeking? (Essential – Nice to have)
122
@BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
Value Map
• Pain Relievers
– How products and services alleviate customer pains?
• Gain Creators
– How products and services create customer gains?
• Products and Services
– List all Products and Services value proposition is built
around

Weitere ähnliche Inhalte

Was ist angesagt?

Bl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshopBl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshop
labsbl
 
Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
labsbl
 
Library labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities researchLibrary labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities research
Sally Chambers
 
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research CentreBritish Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
labsbl
 

Was ist angesagt? (20)

British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014
 
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser UniversityBuilding Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
 
BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group
 
British Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of OxfordBritish Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of Oxford
 
Bl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshopBl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshop
 
BL Labs at Arts and Humanities event
BL Labs at Arts and Humanities eventBL Labs at Arts and Humanities event
BL Labs at Arts and Humanities event
 
Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
 
BL Labs CityLIS Talk
BL Labs CityLIS TalkBL Labs CityLIS Talk
BL Labs CityLIS Talk
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanities
 
Melissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLabMelissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLab
 
Library labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities researchLibrary labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities research
 
British Library Labs Presentation Hertfordshire
British Library Labs Presentation HertfordshireBritish Library Labs Presentation Hertfordshire
British Library Labs Presentation Hertfordshire
 
British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018
 
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
 
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research CentreBritish Library Labs Presentation at the Bodleian and Oxford e-research Centre
British Library Labs Presentation at the Bodleian and Oxford e-research Centre
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
BL_English doctoral_open_day_session
BL_English doctoral_open_day_sessionBL_English doctoral_open_day_session
BL_English doctoral_open_day_session
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...
 

Ähnlich wie British Library Labs - CityLIS

British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
The European Library
 
Bl labs ucl_17_06_13
Bl labs ucl_17_06_13Bl labs ucl_17_06_13
Bl labs ucl_17_06_13
labsbl
 

Ähnlich wie British Library Labs - CityLIS (20)

Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
 
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural DataBL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
 
BL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of WolverhamptonBL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of Wolverhampton
 
Bl labs ucl-services
Bl labs ucl-servicesBl labs ucl-services
Bl labs ucl-services
 
Mahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsMahendra Mahey, British Library Labs
Mahendra Mahey, British Library Labs
 
What is BL Labs?
What is BL Labs?What is BL Labs?
What is BL Labs?
 
British Library Labs Presentation at UK Medical Heritage Library Live Lab
British Library Labs Presentation at UK Medical Heritage Library Live LabBritish Library Labs Presentation at UK Medical Heritage Library Live Lab
British Library Labs Presentation at UK Medical Heritage Library Live Lab
 
Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...
 
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
 
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
 
Bl labs ucl_17_06_13
Bl labs ucl_17_06_13Bl labs ucl_17_06_13
Bl labs ucl_17_06_13
 
Digital Research Support by Stella Wisdom
Digital Research Support by Stella WisdomDigital Research Support by Stella Wisdom
Digital Research Support by Stella Wisdom
 
British Library Labs Presentation at Warwick University
British Library Labs Presentation at Warwick UniversityBritish Library Labs Presentation at Warwick University
British Library Labs Presentation at Warwick University
 
British Library Labs 21st Century Curatorship Talk
British Library Labs 21st Century Curatorship TalkBritish Library Labs 21st Century Curatorship Talk
British Library Labs 21st Century Curatorship Talk
 
More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...
 
British Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open UniversityBritish Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open University
 
Aquiles imlr seminar
Aquiles imlr seminarAquiles imlr seminar
Aquiles imlr seminar
 
British Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTBritish Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMT
 
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
 

Mehr von labsbl

Mehr von labsbl (15)

7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments
 
7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update
 
7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award
 
7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award
 
7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
 
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
 
7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award
 
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
 
7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update
 
7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction
 
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
 
Digital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British LibraryDigital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British Library
 
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
 
Experiences and lessons learned through British Library Labs How have we eng...
Experiences and lessons learned through British Library Labs  How have we eng...Experiences and lessons learned through British Library Labs  How have we eng...
Experiences and lessons learned through British Library Labs How have we eng...
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

British Library Labs - CityLIS

  • 1. 1 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation Mahendra Mahey Experiment with our Digital Collections Mahendra Mahey Manager of BL Labs Working with the British Library’s Digital Collections & Data: Insights from British Library Labs & a new role for Libraries 10:00 to 12:45, 26 March 2018 BL Labs Roadshow 2018 CityLIS, London UK. Running since March 2013 Core Team • Adam Farquhar (PI) • Mahendra Mahey • Ben O’Steen • Eleanor Cooper (0.5)
  • 2. 2 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk The British Library Inside the British Library Space for 1200 readers, around 500,000 visitors per year Building 37 uses low oxygen and robots Reading room and delivery to London Many items stored at Document Supply and Storage centre 48 hours away Stockton-on-Tees Author right to payment each time their books are borrowed from public libraries. St Pancras, London, UK Many books are stored 4 stories below the building UK Legal Deposit Library – Reference only Founded in 1973 though origins stem back to British Museum Library 1753 Boston-Spa
  • 3. 3 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  • 4. 4 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary). Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary).
  • 5. 5 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Wider…not just Digital Humanities Researchers Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi
  • 6. 6 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk • An area of scholarly activity, born from humanities computing, at the intersection of computing/digital technologies and the humanities. • The field both employs technology in the pursuit of humanities research, and subjects technology to humanistic questioning and interrogation. • DH is collaborative, crossdisciplinary, and computationally engaged research, teaching, and publishing. https://en.wikipedia.org/wiki/Digital_humanities Defining digital humanities (DH)
  • 7. 7 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Digital research methods Digital Scholarship Visualisations Application Programming Interfaces (APIs) for datasets e.g. Metadata, Images, etc Transcribing Annotation Location based searching & Geo-tagging Corpus analysis, Text Mining & Natural Language Processing Crowdsourcing Human Computation In 20 years time?
  • 8. 8 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk What about Digital? Born Digital Digitised
  • 9. 9 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk / Knowledge Quarter London 80 knowledge organisations (as of 26/03/18) within 1 mile radius of Kings Cross, http://www.knowledgequarter.london http://www.turing.ac.uk (Headquartered at the British Library) UK Web Archive and e-legal deposit (2013) http://www.webarchive.org.uk/ukwa/ Born digital Data all around us at Kings Cross! Born digital Data all around us at Kings Cross! Born digital Data all around us at Kings Cross!
  • 10. 10 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk All our physical items are digitised right?
  • 11. 11 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk #bldigital 1-2 %* digitised * estimate Digitisation Partnerships Commercial & Other Organisations Amount increasing rapidly Bias in digitisation http://goo.gl/bR9UJL Sample Generator
  • 12. 12 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Playbills, Books, Newspapers (includes Optical Character Recognition (OCR)) Digital collections and Datasets British National Bibliography http://bnb.data.bl.uk http://sounds.bl.ukhttp://dml.city.ac.uk/ Music (Recordings & Sheet) & Sounds http://goo.gl/frSMJt Broadcast News (TV and Radio) http://goo.gl/cwThHw http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ Usage data EtHOS Web ArchiveImages, Manuscripts & Maps http://www.qdl.qa/ Qatar Digital Library http://idp.bl.uk/ International Dunhuang Project Maps http://www.bl.uk/maps/ Hebrew Manuscripts http://goo.gl/4sbCp9 Flickr & Wikimedia Commons https://goo.gl/LZRmaZ
  • 13. 13 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Finding Open Cultural Heritage Datasets Collection Guides (199 as of 26/03/2018) https://www.bl.uk/collection-guides/ Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image- analysis-based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials https://data.bl.uk Download collections as zips, no API Each dataset has a Digital Object Identifier (DOI) can be referenced for research Not all discoverable via search engines!
  • 14. 14 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk How are we doing this?
  • 15. 15 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Competition Awards Projects Tell us your ideas of what to do with our digital content Show us what you have already done with our digital content in research, artistic, commercial and learning and teaching categories Talk to us about working on collaborative projects
  • 16. 16 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk • The Library has to go out to meet researchers, regularly and cyclically to tell them what we have and learn what they want to do • Debunk ‘myths’ about the Library • Show researchers the reality of our data • Their ideas always change once they see the data! https://goo.gl/esqpRb Lots of two-way communication! BL Labs runs annual ‘Roadshows’ around the UK
  • 17. 17 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Have you got X? https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg Looking for Physical Content in the British Library
  • 18. 18 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Have you got X digitised / in digital form? http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg Looking for Digitised / Digital Content in the BL
  • 19. 19 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk •Digitisation costs time, resources & access can depend on restrictions imposed by funders, legal, ethical, practical etc. … •Still…700 Digitisation projects (as of 26/03/2018) But not all found through search engines or even online! So little digitised…why? © £ 
  • 20. 20 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk https://goo.gl/qpCLlk https://goo.gl/wMTS3Z • Dialogue typically: – you are ‘lucky’ & we have the digital content / data relevant to your research – we don’t have exactly what your looking for, but is there anything of interest? Let’s talk… – engagement is hard work and it’s constantly required to maintain interest in our digital collections! • Artists find this dialogue easier… • We also tend to attract researchers with ‘fuzzier’ research boundaries and possibly open to more interdisciplinary / collaborative research What engagement does the BL have with researchers wanting use our digital content?
  • 21. 21 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Overview of interactions with the “researcher”
  • 22. www.bl.uk Phase 1: Exploration • Exploration phase allows a researcher to: • understand the data in an open-ended fashion, • discover potential tools to work with the data, • gain awareness of their capabilities and limitations, • develop a firmer research query and • gauge the costs, risks and time needed. • Outputs of the exploration are not intended to be shareable, beyond personal experience and key features (data size, formats, tool successes, etc). 22
  • 23. www.bl.uk Phase 2: Query-Focussed • “Query-Focussed”, the familiar project but due to phase 1: • A firmer and more informed query by the researcher • Suitable datasets already lined up • A good idea of the initial toolset and capabilities (human and computer) required • Project output is outlined, and relevant reuse applications are begun. • Clear agreements on what happens at the end of the project – data deletion, virtual machine deletion/archiving/etc. • Project may iterate on initial ideas, depending on researcher’s cost/risk appetite 23
  • 24. www.bl.uk Phase 2: Query-Focussed • Wrap-up • Work (code, notes) exported and given to researcher • All derivative data is licenced or retained based on reuse agreements (Access & Reuse board, etc) • Provisions made for the project are wound-down, as agreed (derivative data deleted after a grace period, etc) 24
  • 25. 25 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Openly Licensed Digital Content? 15% Openly Licensed Around 80%* available online Working through to make more open… Though some collections will always only be available onsite due to various reasons including legal, ethical etc Breakdown by collection* Manuscripts 59% Books 9% Maps and Views 7% Newspapers 3% Archives and Records 3% Paintings, Prints and Drawings 2% *Based on number of digitisation projects (700 as of 25/13/18) Largest proportion of funding Public / Private Partnership 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment *Estimates
  • 26. 26 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Can it still be accessed? Generates income Reputational risk in using? Legalities Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘Story’ of a Digital Collection’ if you want to use it for research and make conclusions…
  • 27. 27 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Open Content vs Onsite Only Access • Access easier for openly licensed content • More challenging for on-site, in-copyright, non-print legal deposit, data protected, old content media & contemporary material (post 1877) https://goo.gl/Y5zCXg ©
  • 28. 28 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk only in Reading Rooms due to © only on site due to © or ethical etc not online / available – various storage devices, personal data online and open British Library online behind paywall Challenges of access to Digital Collections
  • 29. 29 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk How do we give access to onsite-only Digital Collections (85% of our Digital Collections)?
  • 30. 30 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk READING ROOM ON SITE NOT ONLINE OPEN British Library £ Labs Residency Model Challenges of access to Digital Collections
  • 31. 31 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Accessing digital collections onsite OPEN £ • Have to be ‘onsite’ (interpretations vary) • Need to be ‘security cleared’ ‘trusted’ for some collections – Hence ‘Researcher in Residence Model’ • Permission required (depending on ‘story’ of collection) • Content could be on various media formats (not always online) • 5 - 20 % re-use of material for non commercial research for some collections, depends on agreements in place • We are learning ‘pathways’ so that this becomes ‘everyday’ to provide onsite access to some digital collections in the future
  • 32. 32 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Why are we doing this?
  • 33. 33 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Why are doing this? (1) Working closely with and listening to those who want use our digital collections and data for their work https://goo.gl/esqpRb
  • 34. 34 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk We can learn how we are and should be supporting them and this therefore shapes the problems we work on, such as: https://goo.gl/esqpRb Why are doing this? (2) • Access to digital collections / data? • Advice, guidance, technical support, training • Services, Tools and Processes? • Many more reasons…
  • 35. 35 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Where are the gaps between what users want & what we can give? How do we build the bridges to overcome the gaps? Why are doing this? (3) https://goo.gl/6CwCeE
  • 36. 36 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk How do we help users ‘navigate’ their way through the ‘maze’ of the Library to what they want to do? Sometimes requires understanding the culture of the organisation https://goo.gl/62JnQT Why are doing this? (4)
  • 37. 37 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk What did people actually do? Examples from Text and Images Over 100 examples (including sound, video): http://labs.bl.uk/Ideas+for+Labs http://labs.bl.uk/Other+Uses+of+Collections
  • 38. 38 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Example Pattern of Research 1, 2, 3 1. Find / identify new things in messy stuff 2. Unlock hidden history / data 3. Celebrate new discoveries
  • 39. 39 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Finding / identifying invisible / well hidden things in ‘messy’ historical data https://goo.gl/mcpa8B Not the British Library! Example Pattern of Research 1
  • 40. 40 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Messiness in historical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  • 41. 41 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Open Refine
  • 42. 42 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk • Cultural heritage records contain uncertainty and fuzziness (e.g. date ranges, multiple values, uncertain or unavailable information)—Curators and staff at institutions often have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can have a big impact depending on what you’re doing) • Optical Character Recognition in particular is an imperfect art-need to consider how bad it is, how this might effect your findings, and what needs doing to mitigate it. • Keeping data clean, organised, open and described well will not only make your life easier, but enable its widespread re-use beyond and increase future impact. (Datasets you’ve created in the course of your research projects could even be used to enhance national collections!) • Decisions always need to be made while normalising information for visualisation. Documenting them is important for your research but also future re-use! • Is your aim enquiry or presentation? All of this will have an impact on the tools and data cleaning choices you make. Things to consider: Data + Tools
  • 43. 43 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 44. 44 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk #digitalhumanities dancohen/lists/digitalhumanities @ProfHacker @Dhnow @BL_DigiSchol And more links to resources here: http://scottbot.net/teaching-yourself-to-code-in-dh/
  • 45. 45 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Unearthing / unlocking hidden histories & data to stimulate new research https://goo.gl/vJ291F It’s an 18th Century Poem! Example Pattern of Research 2
  • 46. 46 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Celebrating hidden histories / data creatively through events, art & performance https://goo.gl/Ql0Bwz Re-enacting, re-discovering history Example Pattern of Research 3
  • 47. 47 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Experiments with Text
  • 48. 48 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk https://goo.gl/oUNj5N https://goo.gl/ImAUv4 Finding things in ‘messy’ Optical Character Recognised (OCR) text Mrs Folly • Clean up some manually • Get human ‘ground truth’ • Write computer code (sometimes it’s machine learning) to find things reliably in it ‘automatically’ • Try code on messy content • Tweak if necessary • Digital ‘lasso’ around content • Human sift through Mrs Folly An example pattern of research
  • 49. 49 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Looking through a rubbish bin? https://goo.gl/UeEvqs Good stuff! Some Rubbish
  • 50. 50 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Let’s use Text and Data Mining Machine Learning / Reading!
  • 51. 51 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Smell of soup & Machine Learning Thanks to Memo Akten (@memotv on twitter) for the inspiration! https://goo.gl/toq4Bo Nasreddin, 13th Century Turkish Sufi http://web2.uvcs.uvic.ca/elc/studyzone/330/reading/smell1.htm
  • 52. 52 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Machine Learning / Reading Analogies to how humans read / learn Machines acquire ‘knowledge’ / data, use that knowledge / data to make sense / identify patterns https://goo.gl/k68fTf https://goo.gl/gXmVQL Can you see the bird?
  • 53. 53 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Need to stress still requires computational & human effort… Smart data needs smart people! https://goo.gl/gDQEAz Labs doing this on a case by case basis so methods can vary Machine Learning / Reading still requires ‘Human Effort’!
  • 54. 54 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Legalities of Machine Learning / Text and Data mining https://goo.gl/toq4Bo Legalities of Machine Learning / Text and Data mining still up for discussion…Often misunderstood Is it the same as humans reading and looking for patterns…just a bit quicker?
  • 55. 55 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk http://victorianhumour.tubmblr.com Victorian Meme Machine (2014) https://goo.gl/HMqDt3 Bob Nicholson http://victorianhumour.tumblr.com/ Bob Nicholson interviewed on BBC Radio 4 Making History Programme: http://goo.gl/fmV9ep And telling jokes to the public: http://goo.gl/xIDRhz Bob obtained further funding from his university Looking for more collaborations https://www.youtube.com/watch?v=-GRgj7Q5OM0 Rob Walker, Victorian Mother-in-law Jokes Victorian Comedy Night, 7 Nov 2016 Learnt about access paths to digital collections
  • 56. 56 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Play from 1m 5 seconds t o 1m 31 seconds
  • 57. 57 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Katrina Navickas (2015) Political Meetings Mapper http://politicalmeetingsmapper.co.uk https://goo.gl/Qq78Oa Labs Symposium 2015 https://goo.gl/BSA3be Interview 2015 The Chartist Newspaper http://goo.gl/vOLSnH Chartist Monster Meeting Chartists Walking Tour and Re-enactment London Learnt that domain knowledge reduces noise
  • 58. 58 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Play for 36 seconds
  • 59. 59 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Black Abolitionist Performances & their Presence in Britain (2016) – Hannah-Rose Murray Frederick Douglass Ellen Craft Josiah Henson Ida B Wells A Performance by Joe Williams & Martelle Edinborough http://frederickdouglassinbritain.com/ Started to implement Machine Learning Techniques
  • 60. 60 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Data-mining verse in 18th Century newspapers BL Labs Project 16-17, Jennifer Batt https://goo.gl/5Akthd Slides courtesy Jennifer Batt
  • 61. 61 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk What thoj' among ourrelves, with too much Heat, or t W: fweutimes.wongle, wvhen we Ihould debate, W – (A confequential Ill which Freedom drawvs, fl t A bad Efficf, but from a noble Caufe) t We can with univeifal Zcal advance, to To cutb the faithlefs Arrogancccof V rance. hi Dublin Journal, 10-14 September, 1745 Slides courtesy Jennifer Batt
  • 62. 62 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Verse: 81% lines begin with initial capital Prose: 52% lines begin with initial capital Westminster Journal 3 March 1745 Slides courtesy Jennifer Batt Started to refine Machine Learning Techniques Jennifer Batt @ the BL on World Poetry Day ‘40,000’ things found… Possibly using Gale Primary Sources interface to see if we can sift this data
  • 63. 63 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Use of Overproof OCR Correction? Re-OCR with ABBY FineReader? https://www.abbyy.com/en-gb/ http://overproof.projectcomputing.com/ RE-OCR Cleaning up OCR Text – significant improvement up (depending on original image quality)
  • 64. 64 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Virtual Infrastructure for OCR text OCR text ‘scraped’ from digitised newspapers and put in internal cloud Jupyter notebook Write python code and results in web browser http://jupyter.org Access available for researchers ‘in residence’ https://www.docker.com/ http://dhbox.org/
  • 65. 65 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Experiments with Images
  • 66. 66 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk 65,000 digitised 19th Century books Image: Artwork by Alicia Martin 2007 / 2008 Paid for by: For a full list: https://goo.gl/HqPQMS Subjects include: Philosophy Poetry History Literature 1789 - 1876
  • 67. 67 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk 30 August 2012
  • 68. 68 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk 002819694 Unique number
  • 69. 69 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 70. 70 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 71. 71 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk OCR XML Generated by ABBY Fine Reader Optical Character Recognition
  • 72. 72 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Images from books captured too!
  • 73. 73 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk We did some of our own experiments…do as we tell others! Experiment with our Digital Collections@BL_Labs
  • 74. 74 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Ben O’Steen of @BL_Labs after Hack Event, August 2013
  • 75. 75 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Ben O’Steen of @BL_Labs after Hack Event, August 2013
  • 76. 76 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Ben O’Steen of @BL_Labs after Hack Event, August 2013
  • 77. 77 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 78. 78 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 79. 79 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 80. 80 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 81. 81 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 82. 82 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk 1,020,418 images needed identifying!
  • 83. 83 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk One major problem! •We know about the books these images come from but we know nothing about the actual images! •How will we identify them? •How will we find them later? •How can we do that with 1 million images? •Try a few experiments!
  • 84. 84 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Running face recognition on the images Face Recognition Algorithm Trained on Photographs  Late August 2013
  • 85. 85 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Face Recognition Algorithms worked better for female faces than men’s
  • 86. 86 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk The Mechanical Curator Snipped image posted almost randomly every hour… on a Tumblr blog One of our early followers was… Ben O’Steen, 30 September 2013 Has a slight ‘mood’… once image published, tries to find 8 similar images e.g. ‘slanty’, ‘circular’ etc. & then gets ‘bored’ follow… @MechCuratorBot
  • 87. 87 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Where to put the 1 Million images? • Internal platform? Would take too long! • BL Labs is a time limited project, needed evidence of impact and good example to show the Library what we need, fast! • External platform? Put our content where the ‘light’ is! • Wikipedia / Wikimedia –no! Not enough metadata for the images. • Another platform that meets our needs… • Flickr Commons
  • 88. 88 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Flickr Commons (100 + GLAMs as of 08/12/17)
  • 89. 89 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk
  • 90. 90 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk British Library Flickr Commons Why Flickr Commons? • Free! • Each image has it’s own unique web address, easy to share • Can Tag images • Has Application Programming Interface (API) Late August 2013
  • 91. 91 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Opportunities – increasing traffic to Library services You can purchase a ‘High Res’ Copy View in the Library Item Viewer Download .pdf All illustrations in book Other illustrations in books Published in same year View the item in the Library Catalogue Tags auto generated User generated Tag Grouping for image
  • 92. 92 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Worked better for female faces than men’s Press http://mechanicalcurator.tumblr.com Posts image every 30 minutes http://www.flickr.com/photos/britishlibrary/ 1,020,418 images need tagging! Creative uses of images Face recognition Algorithms based on photos Mechanical Curator with an algorithmic brain (Circles, Squares and Slanty etc) http://goo.gl/qPPgxX Wikimedia Flickr Commons Individual URL & API Snipping out images from 65,000 Digitised Books* >800,000,000* views >17,000,000* tags https://goo.gl/FgZ4HM Work @ BL by Ben O’Steen, Labs and Digital Research Team*Matt Prior - http://goo.gl/j29Tnx Since Dec 2013 Tumblr *Estimates >More demand to see physical items
  • 93. 93 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Tagging, Tagging, Tagging…
  • 94. 94 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Tagging a million images Iterative Crowdsourcing http://goo.gl/j6fxac Cardiff University’s Lost Visions Project http://www.metadatagames.org/ Metadata Games James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers 18 hard core taggers How to reward and keep motivated this ‘small group? Average for ‘crowd’ is 1 tag per person What kind of ‘task’ can this ‘crowd’ do? Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
  • 95. 95 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Adam Crymble (2015) Crowdsource Arcade http://goo.gl/LBfJ4W http://goo.gl/OH9pOZ https://goo.gl/7z0j8p 30 mins talk Labs Symposium (2015) https://goo.gl/SSRsdd 5 min interview (2015) http://goo.gl/0APpE8 Game Jam Using Arcade Games to help Tag images ‘Art Treachery’ and ‘Tag Attack’
  • 96. 96 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Play for 42 seconds
  • 97. 97 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Special Jury’s Prize (2015) James Heald – Wikimedia and Map work https://goo.gl/WYZCB2 http://goo.gl/HNQq5e https://goo.gl/VPgffL https://commons.wikimedia.org/ https://goo.gl/djtm1b Labs Symposium (2015)Geotagging maps 50,000 Maps Found in Flickr 1 million Human & Computational Tagging & Community engagement Geo-referencing work https://www.bl.uk/georeferencer
  • 98. 98 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk SherlockNet: Competition Winner 2016 Karen Wang, Luda Zhao and Brian Do Using Convolutional Neural Networks to Automatically Tag and Caption the British Library Flickr Commons 1 million Image Collection 12 categories >15.5 million tags added >100,000 captions bit.ly/sherlocknet Pooled surrounding OCR text on page from similar images Used Microsoft COCO (photographs) & British Museum Prints and Drawings collections as training sets. Tags Captions
  • 99. 99 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Artistic / Creative Works http://goo.gl/dM8ieA Mario Klingeman (2015) Code Artist / Curator https://www.youtube.com/watch?v=Q3SBxO34Zlc David Normal 2014 and 2015 Collages/Paintings & Lightboxes http://goo.gl/bNxGZZ Kris Hoffman (2016) Animation for Fashion Week 2016 https://goo.gl/QilqqT Jiayi Chong 2016 - Animation tool https://www.facebook.com/RealmlandStory/ Paul Rand Pierce 2016 Graphic Novel on Facebook Tragic Looking Women 44 Men who Look 44 (Notice the direction faces) A Hat on the Ground Spells trouble
  • 100. 100 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Imaginary Cities – BL Labs Project 16-18 Michael Takeo Magruder An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps
  • 101. 101 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk British Fashion Colleges Council and Teatum Jones Alanna Hilton
  • 102. 102 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Lessons Learned & Challenges… It all starts from a conversation • Start with a conversation, our data isn’t visible highly on search engines (yet!) & not easy to find. Need to create and embrace serendipity & opportunities for use by talking! • Need to have several conversations with several stakeholders & tap into their tacit knowledge that isn’t always written down sometimes to progress ideas. • Often misunderstandings because of jargon & different meaning of words. https://goo.gl/XaHYT9 ? Audience research & Digital interests Digital collections we have This is where Labs works
  • 103. 103 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Expectations change when researchers actually see the data, systems & experience the ‘culture’ of the organisation. https://goo.gl/ytmWnu Lessons Learned & Challenges… Expectations change with researchers see the data
  • 104. 104 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Opening & using digital collections occasionally requires a need to let go of the emotional & psychological connection to them Lessons Learned & Challenges… Letting ‘go’ of the connection to collections… https://goo.gl/OYAsmK
  • 105. 105 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Embrace dirty data, it may never be perfect! Cleaning data is hard work & resource consuming! Lessons Learned & Challenges… Embrace ‘dirty data’…use it or clean it! https://goo.gl/mcpa8B Not the British Library!
  • 106. 106 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Careful of making conclusions based on ‘black box’ software & techniques (e.g. sentiment analysis), learn the assumptions behind them first! Lessons Learned & Challenges… Beware of ‘Black Box’ software…
  • 107. 107 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk We tend to work with researchers who can be ‘flexible’ with their research questions & are willing to embrace challenges, collaborate, the pioneers! Lessons Learned & Challenges… Work with ‘flexible’ collaborative researchers… https://goo.gl/wMTS3Z
  • 108. 108 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Many researchers have the domain knowledge but lack technical / digital skills to use Digital Research methods. Should they be teamed up with those that want to solve problems or get trained? Lessons Learned & Challenges… Digital skills training needed for Humanities researchers… https://goo.gl/i5GVfI https://goo.gl/kwcK8J
  • 109. 109 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Huge appetite to use digital content & data for anyone’s ideas! (e.g. Flickr Commons stats). Lessons Learned & Challenges… Huge demand for open digital content… https://goo.gl/yQ5s4U
  • 110. 110 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Labs mindset… 1. Start a conversation, generate positive energy, be kind, have fun and try to support ideas 2. Start with small experiments, but think big. 3. Fail faster (don’t be afraid) and persevere. 4. Reject perfectionism! Good enough is sometimes…good enough! 5. Celebrate the uses of digital collections https://goo.gl/noASfl
  • 111. 111 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk https://goo.gl/SUOO0J The Magic of Openness! • If digitised / digital collections are not used, what is the point of digitising / keeping them (i.e. apart from preservation)? • Opening up our digital collections offers new ways for the Library’s content to be re-discovered, remixed, re-imagined and ‘re-energised’ • Generates plenty of examples to inspire use by others!
  • 112. 112 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Hey there Young Sailor! Ling Low 2016 – Hey there Young Sailor https://www.youtube.com/watch?v=bcOP1E5bRE0VIMEO.COM/SWEETANDLOWFILMS @SWEETNLOWFILMS ON INSTAGRAM @SWEETNLOWLING ON TWITTER The Impatient Sisters Play to fade!
  • 113. 113 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Explore or Imagine Our Data! • CSV of Metadata https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv • 19th Century Books - Book Metadata - 01/09/2013. https://data.bl.uk/digbks/db21.html • Digitised Books - Flickr Tag History - Dec 2013 to March 2016. TSV https://data.bl.uk/digbks/db15.html • Digitised Hebrew Manuscripts - Metadata https://data.bl.uk/hebrewmanuscripts/heb1.html • Digitised Hebrew Manuscripts: Or 2210 - Or 2364 https://data.bl.uk/hebrewmanuscripts/heb8.html • Theatrical playbills from Britain and Ireland (OCR text only) https://data.bl.uk/playbills/pb2.html • Portraits of actors, views of theatres and playbills (covering 1750 - 1821 in a single volume) https://data.bl.uk/singlesheet/por1.html • Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements.1660- 1840. https://data.bl.uk/singlesheet/ad1.html https://data.bl.uk • Have a look at the data. • Data Quality • Issues Or an idea you have thought of what to do with the data! http://labs.bl.uk/Ideas+for+Labs Smaller datasets
  • 114. 114 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk The Future of BL Labs • Continue to engage with researchers • Learn what they want to do • Collect evidence of demand • Develop Business Model and Support process to make ‘Business as Usual’ at the British Library • Help to create pathway to developing a ‘Digital Research Suite’ at the British Library by 2019 http://www.library.pitt.edu/digital-scholarship-services https://goo.gl/W4TjGt • Many are being ‘inspired’ by our model…
  • 115. 115 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Business Model Canvas https://www.youtube.com/watch?v=QoAOzMTLP5s
  • 116. 116 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Business Model Canvas – Digital Research Lab – British Library Key Partners Key Activities Value Propositions Customer Relationships Customer Segments Key Resources Channels Cost Structure Revenue Streams http://www.businessmodelgeneration.com
  • 117. 117 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Alejandro: Master’s student in Comparative Literature (Researcher Technical) • Alejandro wants to speak to somebody about working on a project which will build and analyse a corpus from digitised literary works and newspaper articles first published from 1950 to the present day. He is particularly interested in how humorous prose of PG Wodehouse has permeated literary works. • Alejandro is confident in using digital research methods and tools to help with his research as his undergraduate degree was in Computer Science. He is a talented programmer, having contributed to a number of Open Source projects and winning a number of software development competitions at university. • Research Question/Enquiry: How has the humorous prose of PG Wodehouse permeated modern literary works? • Services Required: • Access to the relevant digital collections and data for his research. • Advice on the potential legal restrictions on carrying out his work. • Ability to create and store the corpus and use computational tools to analyse it. • Share the results under an open access license • Download the final results of his research, and cite the derived datasets on data.bl.uk.
  • 118. 118 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Rosslyn: PhD student in Gender Studies (Researcher Non-Technical) • Rosslyn is a gender historian developing research questions around how gender-specific terms are used in writing and how they may change over time. She thinks she would like to use the UK Web Archive as one of her corpora for this and has a vague idea that she may need to employ digital research methods after reading an article about natural language processing. • Rosslyn has never used these methods before and has basic digital skills (e.g. MS Office). Rosslyn would like to sit down with someone to discuss her idea and get guidance about moving this project forward. She is happy to collaborate with other researchers. • Research question/Enquiry: How do gender-specific terms change over time in the UK Web Archive? • Services Required: • Access to expert information, advice and training on the appropriate digital research methods to use for this project. • Case studies and connections / collaborations with other researchers who can help her understand their approaches and experiences using BL digital collections and data for similar projects. • Getting a clear understanding of the UK Web Archive, how it can be used, its limitations / challenges and whether it is the appropriate corpus to carry out her research. • Ability to create and store the corpus and use computational tools to analyse it. • Share the results under an open access license. • Download the final results of his research, and cite the derived datasets on data.bl.uk.
  • 119. 119 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Discussion • Have a look at the Personas • Write your own • We will discuss your needs • Choose one for Value Proposition
  • 120. 120 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Value Proposition Canvas Customer SegmentValue Map
  • 121. 121 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Customer Segment • Customer Jobs – What are customers trying to get done in their work?(Important – Insignificant) • Pains – What are the bad outcomes, risks, obstacles related to customer jobs? (Extreme – Moderate) • Gains – What outcomes do customers want to achieve or concrete benefits they are seeking? (Essential – Nice to have)
  • 122. 122 @BL_Labs #citylis @BL_DigiSchol labs@bl.uk http://labs.bl.uk Value Map • Pain Relievers – How products and services alleviate customer pains? • Gain Creators – How products and services create customer gains? • Products and Services – List all Products and Services value proposition is built around

Hinweis der Redaktion

  1. 140 seconds The British Library is the national library of the UK and one of the largest research libraries in the world . The Library moved to a new purpose built building in 1997 <click> the largest of it’s kind that was built in the UK in the 20th century. Many frequently used items are stored 5 stories below the main building at St Pancras in London and many might not know that part of the building is meant to look like a ship on a journey to discovery!<click>. <click to switch off> The building can sit 1,200 researchers at any one time across 5 reading rooms. <click>Medium and long term requested items are held at Boston Spa in Yorkshire in a low oxygen warehouse, using robot to retrieve items. In total, the library has 625 km of shelving, growing by 12 km every year. Whilst we acquire items through purchase or gifts, much of the collection has been built up through legal deposit. That is, by law, a copy of every UK and Ireland print publication must be given to the British Library by its publishers. Around 3 million items are added per year. In 2013, legal deposit was extended to cover non-print material which means by law we take in digitally published items as well, which means regular mass crawls of the entire UK web domain as well as ebooks, ejournals etc.
  2. 85 seconds The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes forget that the collections the British Library have are truly staggering! We currently estimate them to exceed <click>150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD…. So some big numbers… Over …<click>14 million books <click>60 million patents <click>8 million stamps <click>4 million maps <click>3 million sound recordings <click>1.6 million music scores <click>over .3 million manuscripts <click>0.8 million serials titles (which are of course made up of many many volumes/editions), this is where a lot of our content is, just in case you thought the numbers didn’t add up!
  3. https://goo.gl/WutNyi http://goo.gl/nNKhQ2 https://goo.gl/9NWZUW https://goo.gl/7QQ5Tf https://goo.gl/x7b4tg https://upload.wikimedia.org/wikipedia/commons/a/a2/Interactive_whiteboard_at_CeBIT_2007.jpg
  4. Get clearer annotation image and transcription (perhaps TILT)
  5. 6 Seconds (20 Words) So <Click> ‘how’ do we try and engage those who might be interested in the BL’s digital collections and data? <Click>
  6. 17 Seconds (53 Words) <Click>The British Library is one of the largest Library’s in the world <Click> with an estimated 180 million physical items, with only a small proportion being digitised. <Click>We estimate this is around 1-2%, but no one really knows exactly how much. However, increasingly more items are being stored as ‘born’ digital, such as the UK Web Archive<Click>
  7. Have balance of Multimedia Broadcast news and radio, sounds asave our sounds Books and newspapers Images BNB Qatar Digital library Hebrew manuscripts
  8. <click>The British Library faces many challenges of access to our Digital collections! <click> Sometimes digital content is only available onsite due to license restrictions, <click>or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online <click> though it might be too big or hasn’t been transferred from other digital storage media. <click>Sometimes access is through a paywall. Finally, <click>some content is in the happy sunny place, online, open and freely available. The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers. The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
  9. <click>The British Library faces many challenges of access to our Digital collections! <click> Sometimes digital content is only available onsite due to license restrictions, <click>or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online <click> though it might be too big or hasn’t been transferred from other digital storage media. <click>Sometimes access is through a paywall. Finally, <click>some content is in the happy sunny place, online, open and freely available. The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers. The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
  10. Examples from the Cooper Hewitt collection. I spent 3/5 of my time at the Cooper Hewitt just trying to get the data clean enough to vaguely represent the collection. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places. Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways. More common GLAM issues - What year is 'early 18th century'? What do you do with '1836 (probably)'?
  11. Open Refine is an amazing tool, and I wouldn't have gotten anywhere at Cooper Hewitt without it. It will suggest ways to make the data more consistent. You can then export the data and keep working on it in other tools, or put it into Open Refine. Because Refine runs locally it can be used for sensitive data you mightn't put online. One issue is that GLAMs tend to use question marks to record uncertainty in attribution, but Refine strips out all punctuation, so you have to be careful about preserving it (if that's what you want). Takes in TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML, and Google Data documents. http://freeyourmetadata.org/cleanup/ useful advice
  12. 21 Seconds (65 Words) Katrina Navickas was particularly interested in the <Click>Chartist Movement who were a group who were campaigning for the vote for working people. <Click>They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common<Click>She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. <Click>and hopefully unearthing new history.
  13. 970 files from a selection of 19th century newspaper titles from the BL corpus for us to correct using the overProof post-OCR correction software The best way to measure the improvement made by the correction process is to compare the OCR'ed text and the automatically corrected text with a perfect correction made by a human (known as the "ground truth"). Hannah-Rose's 5 small human-corrected samples are show as green dots. These are not only smaller than the other files, but their raw error rate is much lower at 13.3%. OverProof was measured as reducing this to 5.4%, a removal of almost 60% of errors. The red dotted-line indicates the correction "break-even" point: the further under the line, the better the quality of the document after correction. In the graph below, the grey line shows distribution of files across error rates before correction and the green line after correction.
  14. Watch out the gunner and skunk as they will make an appearance again!
  15. 50 seconds Here is the anatomy of a Flickr record, importantly we have created links to many of the Library’s services <click>some of this lovely traffic is going back to the Library and hopefully generating more interest in our services, from downloading a pdf of the book to purchasing a high res scan of the image. <click>Tags are added from the original book record, including the approximate page number the image came from<click>users of Flickr can add their own tags, and I have mentioned they have already started doing it.
  16. Posts small illustrations taken almost at random from the digitised book corpus to a Tumblr blog. This experiment with undirected engagement was a by-product of work to uncover the hidden wealth of illustrations within the digitised pages.
  17. 27 Seconds (82 Words) Adam Crymble <Click>wanted to harness the power of playing fun games on arcade machines to help with crowdsourcing the tagging of un-described images. He particularly wanted to engage a younger audience into crowdsourcing .<Click>On the right you can see a replica 1980’s arcade machine we built and <Click>and on the bottom left some tagging games that were developed through a ‘Games Jam’ for the machine. <Click>. Let’s take a closer look at two of the games…<Click>
  18. 18 Seconds (56 Words) Indexing BL the 1 million & Mapping the Maps – was led by James Heald and collaboration with others <Click>They produced an index of 1 million 'Mechanical Curator collection' images on <Click>Wikimedia Commons from a collection of largely un-described images. <Click>This gave rise to finding 50,000 maps within the collection partially through a map-tag-a-thon <Click>These are now being geo-referenced. <Click>