A Critique of the Proposed National Education Policy Reform
Digital Scholarship at the British Library
1. Digital Scholarship
@ British Library
Stella Wisdom, Digital Curator
@miss_wisdom
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
2. www.bl.uk 2
The British Library is the
national library
of the UK.
By law we receive a copy
of every publication
produced in the UK and
Ireland
https://www.bl.uk/
3. www.bl.uk 3
Well over 150 Million
physical items alone are
stored in London and in
Yorkshire….growing by
roughly 3 million each year.
If you saw 5 items a day it
would take you 80,000
years to see the whole
collection.
Roughly 3% of this is
digital/digitised though that’s
growing.
4. www.bl.uk 4
The UK Web Archive
http://www.webarchive.org.uk
A collaboration between all of the UK Legal Deposit Libraries:
• Bodleian Libraries, Oxford University
• British Library
• Cambridge University Libraries
• National Library of Scotland
• National Library of Wales
• Trinity College, Dublin
Aims to collect all UK websites at least once per year
5. www.bl.uk 5
Born-Digital Manuscripts and Personal Digital Archives
The Wendy Cope Archive
http://www.wired.co.uk/news/archive/2011-05/10/british-library-digital-archives
6. www.bl.uk 6
Born-Digital Manuscripts and Personal Digital Archives
Enhanced Curation: Hanif Kureishi's writing study
https://www.bl.uk/collection-items/hanif-kureishis-writing-study
Panoramic view of writer Hanif Kureishi’s study created by taking a series of
photographs of the room and digitally stitching them together to make one image.
7. www.bl.uk
What is Digital Scholarship?
7
"Allows research areas to be
investigated in new ways,
using new tools, leading to
new discoveries and analysis
to generate new
understanding."
Dr Adam Farquhar
Head of Digital Scholarship
British Library
• There’s been a technological and computational
shift in scholarship
• Digital tools have transformed the research
process, specifically two fundamental aspects
of research: search and analysis
• Digital tools help overcome the traditionally
most difficult aspects of being a researcher:
finding information, and interpreting it
8. www.bl.uk 8
Combines methodologies from traditional
humanities & social science disciplines
with computational tools provided by
computing disciplines to, for instance:
• Explore a bigger body of material
computationally than by individually
reading entire texts
• Look for trends, patterns and
relationships not apparent from close
reading
• Gain a broad overview of a topic
• Test an idea or hypothesis on a large
dataset
We draw heavily on the work being done
in digital humanities, but are not limited to
it.
Defining Digital Scholarship
9. www.bl.uk 9
• Our digital collections are only going to grow…
• Meanwhile digital scholars are, today, using technology in innovative ways,
expectations have already changed, they’re seeking access at scale to our
collections for computational analysis.
• We’ve much to gain from understanding digital methods and having closer
collaborations with digital scholars—there’s a synergy in solving shared issues
(e.g. correcting OCR, enriched collections metadata, conquering back-
cataloguing).
• Digital scholarship is collaborative, requires input across disciplines and domain
expertise, our curatorial experts have an essential role to play in that.
• The Digital Research Team and BL Labs aim to keep pace of this digital turn,
understand service requirements and support colleagues keen to make the most
of it.
Where does the Library fit in?
10. www.bl.uk 10
The Digital Scholarship Department
Mission
Enable the use of the British Library’s digital
collections for research, inspiration, creativity, and
enjoyment.
Goal
Ensure the Library is able to meet the emerging
needs of everyone who wants to deeply
integrate digital content, data, and methods into
their work.
www.bl.uk/digital
11. www.bl.uk 11
Meet the Digital Research Team
The Digital Research Team is a
cross-disciplinary mix of curators,
researchers, librarians and
programmers supporting the
creation and innovative use of
British Library's digital collections.
Neil Fitzgerald
Head of Digital
Research
Stella Wisdom
Contemporary
British
Nora McGregor
Europe &
Americas
Dr Mia Ridge
Western
Heritage
Dr Adi Keinan-
Schoonbaert
Asia & Africa
Dr Rossitza
Atanassova
Digitisation
Tom Derrick
2 Centuries of
Indian Print
12. www.bl.uk 12
The Digital Scholarship Department
Support Digital
Scholars
Connect & Share
Expertise
Invest in our Staff
Agents for Change
Innovate &
Collaborate
Training Programme & Hack & Yack
Reading Group/21st Century Talks
Arabic OCR Competition
Libcrowds Playbills Crowdsourcing
LIBER Digital Humanities Working Group
Data.bl.uk
Digital Reading Room Pilot
BL Labs Competition & Awards
Collaborative PhDs & PhD Placements
13. www.bl.uk
Opportunities for researchers
You can:
• Explore a bigger body of material computationally - 'reading'
thousands, or hundreds of thousands, of volumes of text, images or
media files
• See trends, patterns and relationships not apparent from close
reading individual items, or gain a broad overview of a topic
• Test an idea or hypothesis on a large dataset; generate classification
data about people, places, concepts
13
Adapted from Mia Ridge’s blog post, Some challenges and opportunities for digital scholarship in 2018 (25 April 2018)
Scale
Perspective
Speed
14. www.bl.uk 14
Political Meetings Mapper
“I was able to do in minutes with a python code what I’d spent the last ten
years trying to do by hand!”
Dr. Katrina Navickas, BL Labs Winner 2015
5,519 meetings discovered in 462 towns
and villages across the UK
http://politicalmeetingsmapper.co.uk/maps
16. www.bl.uk 16
200th anniversary of the
publication of Frankenstein. A
perfect opportunity to run a gothic
novel themed challenge.
Gothic Novel Jam with Read
Watch Play; participants to make
something creative inspired by the
gothic novel genre and share it on
the itch.io Gothic Novel Jam site.
Entries invited to include stories,
poetry, art, games, music, films,
pictures, soundscapes, or any
other type of digital media
response.
We wanted participants to use
images from the British Library
Flickr account as inspiration
17. www.bl.uk
Microsoft Partnership Digitisation
2006-8
• 68,000 volumes (47,000+ titles) published in the 19th century mostly in English
• Excluded authors active 1850-1901 and who died after 1936
• Output: 25 million pages
• Digitised content is public domain
18. www.bl.uk 18
Extracting Images from OCR
18
<?xml version="1.0"
encoding="UTF-8" ?>
- <mets:mets
xmlns:xsi="http://ww
w.w3.org/2001/XML
Schema-instance"
xmlns:mets="http://w
ww.loc.gov/METS/"
xsi:schemaLocation=
"http://www.loc.gov/
METS/
http://www.loc.gov/
standards/mets/ver
sion18/mets.xsd
info:lc/xmlns/premi
s-v2
Image snipped out
Algorithmically
From ALTO XML
Image taken from page 207 of 'London and its Environs. A
picturesque survey of the metropolis and the suburbs ...
Translated by Henry Frith. With ... illustrations'
ALTO XML
20. www.bl.uk 20
Gothic Novel Jam 2018
We received 46 entries submitted by people from all around the world including UK,
Australia, America and France.
https://itch.io/jam/gothic-novel-jam/entries
21. www.bl.uk 21
We encouraged entrants to use the digitised images on Flickr that The British
Library had released as Public Domain. As a glow brings out a haze by Eldridge
Misnomer
is a lovely example of how these illustrations are used as a key part of the
storytelling.
22. www.bl.uk 22
Emerging Formats Project
This project builds our ability to collect publications designed for mobile devices
that respond to reader interaction or are structured as databases.
Focus on three format types: eBook mobile apps, web-based interactive narratives
and structured data.
https://www.bl.uk/projects/emerging-formats
Set up in 2010 the team was formed as a way of dedicating focus on the changing research landscape in the digital realm. Now embedded in collection areas, and as you’ll see later, joining the library explicitly as part of major digitisation projects.
Main activities:
Working behind the scenes to get content in digital form and online
Offering digital research support and guidance
Supporting collaborative projects
Running events, competitions, and awards
Set up in 2010, the DS team was formed as a way of dedicating focus on the changing research landscape.
Now embedded in collection areas, or joining the library as part of major digitisation projects.
Research Question:
Chartism was the biggest popular movement for democracy in 19th Century British history. They campaigned for the vote for all men. The Chartists advertised their meeting in the Northern Star newspaper from 1838 to 1850.
The question is, how many of the meetings took place and where? We started with 1841-1845.
Source Collections:
19th Century Digitised Newspapers, specifically Northern Star newspaper
Digitised and Georeferenced Map of Oxford Street
Digital/Computational Techniques:
The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually.
We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting.
Outcome: 5,519 meetings discovered in 462 towns and villages across the UK! http://politicalmeetingsmapper.co.uk/maps/
Research Question:
Chartism was the biggest popular movement for democracy in 19th Century British history. They campaigned for the vote for all men. The Chartists advertised their meeting in the Northern Star newspaper from 1838 to 1850.
The question is, how many of the meetings took place and where? We started with 1841-1845.
Source Collections:
19th Century Digitised Newspapers, specifically Northern Star newspaper
Digitised and Georeferenced Map of Oxford Street
Digital/Computational Techniques:
The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually.
We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting.
Outcome: 5,519 meetings discovered in 462 towns and villages across the UK! http://politicalmeetingsmapper.co.uk/maps/
60 seconds
The Library digitised 68,000 predominantly 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your <click>IPad via the Historical Books app developed by BiblioLabs.
There are 22 million individual page images, along with full text scans of these images, all of which contain untold quantity of useful data such as names of people, places, historical events, dates.
with no restrictions on use by Microsoft
So the question became then, what next? What can 68,000 books tell us?
60 seconds
As the books were scanned for text, this had a fortunate ‘side effect’ the software not only tries to detect the text on the page but also where the images might be. There had already been some interest in the images from the community of researchers. It seemed easy to extract them.
s part of the Labs competition, Matt Prior attended one of our hack events and when examining our book data and was very interested in the images from the books.
Meanwhile the algorithm that Ben had written to snip the images from the OCR scans was still churning away, how many were there going to be? The Mechanical Curator could publish them every hour, but was there somewhere we could put them all for people to browse when they wanted. Importantly if we did put them somewhere, could we get people to help us add descriptions to the individual images making them infinitely more discoverable.]
With an algorithm by Ben O’Steen we snipped out images from digitised books and put them on to Flickr on December 13 2013, there were over a million, but the problem we had was that we knew which books they came from (author/dates), but we didn’t’ have any information about the images. By releasing them onto flickr, we have got people to start tagging them and using them in very creative ways.
Hosting them internally was not an option and there was not sufficient metadata to put them on Wikipedia. Flickr seemed the obvious option as it is a platform that can support high usage, did not require metadata, allowed tagging and it is free for public domain images.