Digital Research at the British Library, by Stella Wisdom
1. Digital Research
@ British Library
Stella Wisdom, Digital Curator
Contemporary Society and Culture
Doctoral Open Day 2020
2. Digital Research
⢠Using digital technologies to change the way research is done and make it
possible to tackle new research questions.
⢠Digital tools have transformed the research process, specifically two
fundamental aspects of research: search and analysis
⢠Help overcome the traditionally most difficult aspects of being a researcher:
finding information, and interpreting it
OCR/HTR Data Visualisation Text and Data Mining
Digital Mapping Crowdsourcing 3D Modelling
3. Examples & Benefits
OCR/HTR
Data Visualisation
Text and Data
Mining
Digital Mapping
Crowdsourcing
3D Modelling
⢠Data mining OCRed newspapers to learn about a certain
historical aspect in a specific country
⢠Creating data visualisations (charts, maps, other viz) from
metadata or from mined data
⢠Benefits:
⢠Scale: Explore a bigger body of material computationally â
distant reading
⢠Perspective: See trends, patterns and relationships not
apparent from close reading
⢠Speed: Test an idea or hypothesis on a large dataset
4. Meet the Digital Research Team
Neil Fitzgerald
Head of Digital
Research
Stella Wisdom
Contemporary
British
Nora McGregor
Europe & Americas
Dr Mia Ridge
Western
Heritage
Dr Adi Keinan-
Schoonbaert
Asia & Africa
Dr Rossitza
Atanassova
Digitisation
Tom Derrick
2 Centuries of
Indian Print
Daniel van Strien
Living with Machines
Olivia Vane
Living with
Machines
Claire Austin
Living with
Machines
Dr Giorgia Tolfo
Living with
Machines
Deirdre Sullivan
Apprentice
We aim to meet the needs of a fast-changing research landscape and to support scholars
who wish to deeply integrate digital content, data, and methods into their work.
5. How we can help you
We support access and use of our digital collections through:
⢠Improving processes for getting content in digital form and online
⢠Engaging with global conversation on digital tools, standards and interfaces
⢠Enhancing digital skills for BL staff through our Digital Scholarship Training
Programme
⢠Collaborative projects
⢠Offer digital research support and guidance
⢠Outreach through our blogs and social media
⢠Events, competitions, and awards
6.
7. What Iâll talk about today:
⢠Research
â Big Data + Old History
â Political Meetings Mapper
â Representation of disease in nineteenth-century newspapers
â Chronotopic Cartographies of Literature and Litcraft
⢠Making
â Artistic Use of the British Libraryâs Mechanical Curator images on Flickr
â Read Watch Play Online Jams
⢠Odyssey Jam 2017
⢠Gothic Novel Jam 2018
⢠Collecting and preserving
â Emerging Formats
⢠Ambient Literature
⢠Interactive Fiction in the UK Web Archive
⢠EThOS & Multimedia PhD Theses
⢠Events
â Interactive Fiction Summer School
â International Games Week in Libraries
⢠WordPlay
⢠AdventureX
â New Media Writing Prize
8. Big Data + Old History
https://youtu.be/tp4y-_VoXdA
9. Political Meetings Mapper
âI was able to do in minutes with a python code what Iâd spent the last ten years trying
to do by hand!â
Dr. Katrina Navickas, BL Labs Winner 2015
5,519 meetings discovered in 462 towns
and villages across the UK
http://politicalmeetingsmapper.co.uk/maps
11. Combining Text Analysis and Geographic Information Systems to investigate the representation of
disease in nineteenth-century newspapers
Spatial Humanities: Texts, GIS, Places at Lancaster University
http://www.lancaster.ac.uk/fass/projects/spatialhum.wordpress/
Paul Atkinson (history), Ian Gregory (digital humanities), Andrew Hardie (linguistics), Daniel Kershaw (computer science), Amelia Joulain-
Jay (linguistics), Catherine Porter (geography) and Paul Rayson (computer science)
Research focuses on the discussion of disease, in nineteenth-century newspapers
Case study used London based newspaper, the Era, which has been digitised and made
available by the British Library.
The digitised corpus (1838-1900, over 377 millions words) was explored using innovative
qualitative and quantitative mechanisms to determine how the Era discussed and
portrayed disease.
12. âBeing able to link the map and
the underlying text allows us
to understand how patterns
vary from place to place. ...
This spatial depiction of
disease mentions not only
allows us to explore the
temporal geography of
newspaper interest in different
diseases, it also allows for a
comparison with other
patterns and information such
as those found in official
reports and statistics.â
Places associated with a range of common nineteenth century diseases
Combining Text Analysis and Geographic Information Systems to investigate the
representation of disease in nineteenth-century newspapers
13. Litcraft is the outreach/impact
element for
Chronotopic Cartographies
AHRC Funded Three Year Project
(2017-20)
PI: Professor Sally Bushell
RAs: James Butler, Duncan Hay,
Rebecca Hutcheon
14. The Litcraft component of the project is a
semi-standalone series of developments
aimed at encouraging elements of literary
environmental criticism for younger
audiences.
Primary and secondary English lessons do
not typically focus on the descriptions of
textual setting; one of our aims is to
introduce this analytic field, through
designing a series of standalone gaming-
based resources that engage with
landscape and world design.
https://chronotopiccartography.wordpress.com/litcraft/
15.
16. Aims of the Litcraft project
â To have an impact on society by changing attitudes towards
our literary culture and heritage
â To work with schools, libraries and museums to re-engage
children with literary classics by using a popular gaming
environment
â To experience literary worlds in new ways by using the digital
environment for reading, writing and interpreting
â To link our educational resources directly to the KS2 curriculum
and present them in an easy-to-use way
17. Core Structure
At the heart of LITCRAFT is a structure that moves
between reading texts externally, immersing
oneself in the game world created from the text,
and writing about that experience.
Resources and lesson plans reflect this âvirtuous
circleâ
TEXT [external; verbal]
WORLD [Immersive; experiential]
21. Lesson Plans4 â 6 Sessions in Litcraft with a series of linked activities
Template for each session:
⢠Pre- Reading task
⢠Pre-Vocabulary task
⢠Audio of text to listen to or read OR Whole Class
reading session
⢠Immersive IN GAME task
⢠Writing task (building on immersive activity)
⢠Follow up task
22.
23. Session 1: My Shore Adventure
PRE-GAME â read extract (short or long) and
answer questions
25. Many creative projects have used Public Domain
images from the Microsoft Partnership
Digitisation Project 2006-8
⢠68,000 volumes (47,000+ titles) published in
the 19th century mostly in English
⢠Excluded authors active 1850-1901 and who
died after 1936
⢠Output: 25 million pages
https://publicdomainreview.org/collections/the-british-librarys-mechanical-curator-million/
26. The illustrations were extracted algorithmically from the
digitised books:
26
<?xml version="1.0"
encoding="UTF-8" ?>
- <mets:mets
xmlns:xsi="http://ww
w.w3.org/2001/XML
Schema-instance"
xmlns:mets="http://w
ww.loc.gov/METS/"
xsi:schemaLocation=
"http://www.loc.gov/
METS/
http://www.loc.gov/
standards/mets/ver
sion18/mets.xsd
info:lc/xmlns/premi
s-v2
Image snipped out
Algorithmically
From ALTO XML
Image taken from page 207 of 'London and its Environs. A
picturesque survey of the metropolis and the suburbs ...
Translated by Henry Frith. With ... illustrations'
ALTO XML
28. The illustrations were uploaded to Flickr and albums were
created through crowd-sourced tagging
29. Odyssey Jam 2017
https://itch.io/jam/odysseyjam
Writing challenge tied in with Read Watch Play, a partnership of libraries
worldwide encouraging themed discussions of books, films, music and games,
each month they have a theme and for March 2017 it was #waterread.
30. Odyssey Jam 2017 entries
https://itch.io/jam/odysseyjam/entries
We encouraged entrants to make use of the digitised images on Flickr that The
British Library had released under a creative commons license.
Some games used these images, e.g. No One and 108 suitors.
31. 200th anniversary of the
publication of Frankenstein. A
perfect opportunity to run a gothic
novel themed challenge.
Gothic Novel Jam with Read Watch
Play; participants to make
something creative inspired by the
gothic novel genre and share it on
the itch.io Gothic Novel Jam site.
Entries invited to include stories,
poetry, art, games, music, films,
pictures, soundscapes, or any
other type of digital media
response.
We wanted participants to use
images from the British Library
Flickr account as inspiration
32. Gothic Novel Jam 2018
We received 46 entries submitted by people from all around the world including UK,
Australia, America and France.
https://itch.io/jam/gothic-novel-jam/entries
33. We encouraged entrants to use the digitised images on Flickr that The British Library had
released as Public Domain. As a glow brings out a haze by Eldridge Misnomer
is a lovely example of how these illustrations are used as a key part of the storytelling.
34. The Emerging Formats Project
34
âThe term 'emerging formats' refers to types of publication that are in scope to
collect under the UKâs Non-Print Legal Deposit Regulations, but whose content
and structure are more challenging compared to those currently collected.â
(https://www.bl.uk/projects/emerging-formats)
Aims:
Identify publications that are in scope to collect
Identify and explore the collection management needs
of these more-complex digital publications
35. Identify Collect
⢠Outreach activities led us to collecting
selected works
⢠Collaborations help us understand the
digital publishing landscape and provide
us with examples
⢠We are testing our collection methods to
see what works for complex digital
publications
35
36. Collecting 80 Days, Inkle
⢠PC Version & Android app
⢠Source code
⢠Contextual information
43. Interactive Narratives Collection
⢠Part of E-publishing Trends/Emerging
Formats
⢠183 sites (and counting!)
⢠Mix of text, sound and video
elements
⢠Different types of interactions
(hypertext, parser-based, choice-
based)
⢠Captured with Heritrix +
Webrecorder
43
https://www.webarchive.org.uk/en/ukwa/collection/1836
55. EThOS & Multimedia PhD Theses
Coral Manton worked on a British Library research placement investigating multimedia and non-
text PhD research outputs and how EThOS might develop to meet the challenge of evolving
digital theses.
She interviewed doctoral students from various disciplines as case studies
http://blogs.bl.uk/digital-scholarship/2016/09/multimedia-phd-research-and-non-text-theses.html
https://www.bl.uk/case-studies/sam-martinhttps://www.bl.uk/case-studies/rob-sherman
56. BL Research Repository
⢠An open access repository for the research produced by staff and research
associates of the British Library.
⢠Houses material such
as journal articles,
conference papers,
books and book
chapters, reports,
datasets, images,
exhibition texts and
blog posts.
⢠https://bl.iro.bl.uk/
57. Stay tuned and keep in touch
Web: http://www.bl.uk/digital
Blog:
http://britishlibrary.typepad.co.uk/digit
al-scholarship/
Email: digitalresearch@bl.uk
Twitter: @BL_DigiSchol
Hinweis der Redaktion
Examples of tools and methods that we tackle at the Library.
Some of these processes help get data ready for analysis (e.g. turning images of items into transcribed and annotated texts), while others support the analysis of large collections or enable public engagement.
OCR/HTR: creating machine-readable text for search/analysis
Data viz: for analysis or publication
Text and data mining: applying classifications to or analysing texts, images or media.
Digital mapping: displaying/analysing spatial data
Crowdsourcing: public participation and learning
Creating 3D models of collection items and making them available for use
Other examples:
Using online platforms to compare images from collection items in different locations (IIIF), e.g. illustrations or calligraphy in manuscripts
Using GIS to see the relationship between historical events and their spatial environment
Studying 3D models of collection items (Oracle Bones)
You can:
Explore a bigger body of material computationally â âreadingâ thousands, or hundreds of thousands, of volumes of text, images or media files
See trends, patterns and relationships not apparent from close reading individual items, or gain a broad overview of a topic
Test an idea or hypothesis on a large dataset; generate classification data about people, places, concepts
Research Question:
Chartism was the biggest popular movement for democracy in 19th Century British history. They campaigned for the vote for all men. The Chartists advertised their meeting in the Northern Star newspaper from 1838 to 1850.
The question is, how many of the meetings took place and where? We started with 1841-1845.
Source Collections:
19th Century Digitised Newspapers, specifically Northern Star newspaper
Digitised and Georeferenced Map of Oxford Street
Digital/Computational Techniques:
The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually.
We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting.
Outcome: 5,519 meetings discovered in 462 towns and villages across the UK! http://politicalmeetingsmapper.co.uk/maps/
Research Question:
Chartism was the biggest popular movement for democracy in 19th Century British history. They campaigned for the vote for all men. The Chartists advertised their meeting in the Northern Star newspaper from 1838 to 1850.
The question is, how many of the meetings took place and where? We started with 1841-1845.
Source Collections:
19th Century Digitised Newspapers, specifically Northern Star newspaper
Digitised and Georeferenced Map of Oxford Street
Digital/Computational Techniques:
The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually.
We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting.
Outcome: 5,519 meetings discovered in 462 towns and villages across the UK! http://politicalmeetingsmapper.co.uk/maps/
Beneficial âcombination of the computerâs ability to summarise patterns in large volumes of data, with the more traditional humanities skills of understanding subtly and nuance in documents written by humansâ
Litcraft is a resource designed and run by Professor Sally Bushell and Dr James Butler. I am standing in for them today as an RA on the main project for literary mapping which Litcraft is a part of. Iâll be presenting on this main project (âChronotopic Cartographiesâ at another panel during this conference).
Read from slide
The success of Litcraft comes from its core structure which creates a linked resource working both outside and inside the game world and connecting the two. At its heart is a virtual circle that always begins with the text, preparing and setting up for the ing-game task, then comes out again at the end.
Our work with libraries draws upon and adapts the core builds made originally for use in schools. We used the same model / structure but adapted it.
The first world we created was for Treasure Island â using the map at the front of Stevensonâs novel we created an accurate scale map in Minecraft.
Our second build was for Kensukeâs Kingdom by a living author Michael Morpugo. We chose this because it was a shorter simpler text â although very good and moving â but it made the resource more accessible.
The lesson plans are built around the linked resource with pre- and post- tasks.
This slide gives you the contents of Treasure Island with the highlighted chapters having an in-game task relating to them. When working with schools we assume that the whole book is being read, but in libraries we also created a minimum read version â a short extract that MUST be read and discussed before going into Minecraft. Tasks are spread across the text.
So â for example â the first task is centred on the chapter called âMy Shore Adventureâ in which Jim Hawkins runs onto the island and explores it. Children begin by looking at the map from the book (which can also be used to navigate on the minecraft island since it is an accurate scale-model); think about Jimâs character, and read the passage â then they go into Minecraft and undertake the first task which is a scavenger hunt. In this way they replicate and re-enact what happens in the book so that the virtual world reinforces the textual.
In-game there is always a distinctive starting beacon with chests under it. Each chest corresponds to a challenge activity. The first chest says âScavenger Huntâ, Children open the chest, take out a book and follow the instructions in it. They can read the book on-screen. They can also write in books within the game and screen shots can be taken to get their writing out from it.
60 seconds
The Library digitised 68,000 predominantly 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your <click>IPad via the Historical Books app developed by BiblioLabs.
There are 22 million individual page images, along with full text scans of these images, all of which contain untold quantity of useful data such as names of people, places, historical events, dates.
with no restrictions on use by Microsoft
So the question became then, what next? What can 68,000 books tell us?
60 seconds
As the books were scanned for text, this had a fortunate âside effectâ the software not only tries to detect the text on the page but also where the images might be. There had already been some interest in the images from the community of researchers. It seemed easy to extract them.
s part of the Labs competition, Matt Prior attended one of our hack events and when examining our book data and was very interested in the images from the books.
Meanwhile the algorithm that Ben had written to snip the images from the OCR scans was still churning away, how many were there going to be? The Mechanical Curator could publish them every hour, but was there somewhere we could put them all for people to browse when they wanted. Importantly if we did put them somewhere, could we get people to help us add descriptions to the individual images making them infinitely more discoverable.]
With an algorithm by Ben OâSteen we snipped out images from digitised books and put them on to Flickr on December 13 2013, there were over a million, but the problem we had was that we knew which books they came from (author/dates), but we didnâtâ have any information about the images. By releasing them onto flickr, we have got people to start tagging them and using them in very creative ways.
Hosting them internally was not an option and there was not sufficient metadata to put them on Wikipedia. Flickr seemed the obvious option as it is a platform that can support high usage, did not require metadata, allowed tagging and it is free for public domain images.
Set up in 2010 the team was formed as a way of dedicating focus on the changing research landscape in the digital realm. Now embedded in collection areas, and as youâll see later, joining the library explicitly as part of major digitisation projects.
Main activities:
Working behind the scenes to get content in digital form and online
Offering digital research support and guidance
Supporting collaborative projects
Running events, competitions, and awards
The UK Web Archive is a collection of millions of websites, captured by the British Library on behalf of the six UK Legal Deposit Libraries. Each year, they make one broad crawl of the UK Domain to capture all websites with a UK top-level suffix (i.e., .uk, .scot, .cymru and .wales) plus any others which have been identified as hosted or based in the UK.
This two-day course on interactive fiction is inspired by our exhibition, Marvellous and Mischievous: Literatureâs Young Rebels. Create your own interactive digital narrative for children, led by digital artist and writer Rob Sherman (former British Library Artists in Residence)