(1) The British Library's Digital Scholarship team aims to enable the use of the library's digital collections for research, inspiration, creativity, and enjoyment.
(2) The team is cross-disciplinary and supports the creation and innovative use of the library's digital collections.
(3) Recent projects include making Arabic manuscripts searchable through handwriting recognition software, digitizing South Asian printed books from 1713-1914, and exploring optical character recognition for languages like Bengali.
7th BL Labs Symposium (2019): 12_Digital Research team projects update
1. Neil Fitzgerald, Head of Digital Research
BL Labs Symposium 2019
@N_Fitzgerald
Digital Scholarship Update
2. www.bl.uk
The British Library's Digital Scholarship team
2
Our mission is to enable the use of the British Library’s digital
collections for research, inspiration, creativity, and enjoyment.
Connect and
share
Support digital
scholars
Agents for
change
Invest in our
staff
Innovate and
collaborate
3. Neil Fitzgerald
Head
Digital Research Team
Mahendra Mahey
Manager
BL Labs
Rossitza Atanassova
Digital Curator
Digitisation
Adi Keinan-Schoonbaert
Digital Curator
Asian & African
Stella Wisdom
Digital Curator
Contemporary British
Mia Ridge
Digital Curator
Western Heritage
Tom Derrick
Digital Curator
Two Centuries Indian Print
Nora McGregor
Digital Curator
European & American
The Digital Scholarship Team is
a cross-disciplinary mix of
curators, researchers,
librarians and programmers
supporting the creation and
innovative use of British
Library's digital collections.
Filipe Bento
Technical Lead
BL Labs
BL Labs Team
Deirdre Sullivan
Digital Research and
Coordinator Apprentice Maja Maricevic
Head of Higher Education
and Science
4. www.bl.uk
DH Award 2018: Best Blog
4
The Digital Scholarship Department is delighted to have won
the 2019 DH Award for ‘Best interesting Digital Humanities
Blog Post or Series of Posts’.
The Digital Humanities Awards are a set of annual awards
where the public is able to nominate resources for the
recognition of talent and expertise in the digital humanities
community.
The awards are intended as an awareness-raising activity to
help put interesting Digital Humanities resources in the
spotlight and engage Digital Humanities users (and the
general public) in the work of the community.
https://blogs.bl.uk/digital-scholarship
http://dhawards.org/
5. www.bl.uk 5
Our aim: to make Arabic texts fully
searchable and available for large-scale
analysis
Main objective: To train Handwritten
Text Recognition (HTR) software to read
historical Arabic manuscripts
Collection: Scientific Manuscripts
available on QDL (https://www.qdl.qa/en)
Automatic Transcription of Historical Handwritten
Arabic Texts
Method:
• Running competitions to find an optimal solution for Arabic HTR
• Participants used our ground truth set to train their recognition software and then
evaluate how accurately the software automatically transcribed the text
• Ground Truth: a complete and accurate record of every character and word in the
scanned images
6. www.bl.uk 6
All ground truth resources will be hosted by the
British Library and made freely available for anyone
wishing to advance the state-of-the-art in text
recognition technology
Resources:
• https://www.bl.uk/projects/arabic-htr
• https://www.primaresearch.org/RASM2019/
• https://blogs.bl.uk/digital-
scholarship/2019/02/automatic-transcription-of-
historical-arabic-scientific-manuscripts-round-
2.html
Automatic Transcription of Historical Handwritten
Arabic Texts
7. www.bl.uk 7
• Digitising and cataloguing rare and unique printed books from
the British Library's South Asian printed books collection, 1713
to 1914, mostly Bengali
• Digital Curator Tom Derrick is exploring OCR technologies for
Bengali print, digital research approaches to Book History and
more
• To support computationally driven research, such as text mining,
we’re providing the digitisation outputs on data.bl.uk under
public domain license
Two Centuries of Indian Print
Right: Pleasing Tales designed to improve the understanding, and direct the conduct of young persons, 1825
https://www.bl.uk/projects/two-centuries-of-indian-print
8. www.bl.uk 8
• The project is exploring OCR solutions for Bengali text and Quarterly
Lists (challenging table layouts)
• Benefit: this enables search and research at scale across many items
• Currently running an OCR competition in collaboration with PRImA
(Pattern Recognition and Image Analysis) Research Lab at Salford
University
• Aim: finding find the best automated text recognition solution for
Bengali and Indian languages
• Resources:
• https://www.primaresearch.org/REID2019/
• https://blogs.bl.uk/digital-scholarship/2019/02/competition-to-
automate-text-recognition-for-printed-bangla-books.html
Two Centuries of Indian Print: OCR
9. www.bl.uk 9
Two Centuries of Indian Print: OCR
Quarterly Lists: descriptive catalogue
records of books published quarterly and
by province of British India between 1867
and 1947. The Quarterly Lists are available
to download as searchable PDFs and as
OCR XML via the British Library's datasets
portal, data.bl.uk.
12. www.bl.uk
Living with Machines: data science, digital history
The national institute for data science and
artificial intelligence, The Alan Turing Institute,
offers the expertise to harness this data to
answer research questions at scale.
A five-year, £9.2 million research project combining expertise from the
nation's research library with data-driven analysis
The British Library has digitised millions of pages
from its collections and established a Digital
Scholarship team to enable the use of its digital
collections for research, inspiration, creativity,
and enjoyment
+
https://www.bl.uk/projects/living-with-machines
12
13. Training library staff in digital scholarship
Digital Curators dedicate 20% of time to training staff throughout the Library in
the opportunities for and practices of digital scholarship, which is primarily
delivered via the Digital Scholarship Training Programme (DSTP).
Our mission:
Provide colleagues with the space and opportunity to delve into and explore all
that digital content and new technologies have to offer in the research domain
today.
Create a variety of opportunities for staff to develop necessary skills and
knowledge to support emerging areas of modern scholarship.
14. Training library staff in digital scholarship
Now in its 7th year, the DSTP includes a wide range of training opportunities:
https://www.bl.uk/projects/digital-scholarship-training-programme
• Formal training courses
• Hands-on workshops
• Monthly Hack & Yacks
• 21st Century Curatorship talks
• Monthly Digital Scholarship Reading Group
In 2018/2019 we delivered 40 training events,
amounting to 224 training days! 848 attendees!
15. www.bl.uk
Computing for Cultural Heritage PGCert
The British Library and partners Birkbeck University and The National
Archives have been awarded £222,420 in funding by the Institute of
Coding (IoC) to co-develop a one-year part-time postgraduate
Certificate (PGCert), Computing for Cultural Heritage, as part of a £4.8
million University skills drive.
15
Throughout 2019-20, Nora McGregor, Digital Curator, will work closely
with a newly appointed Lecturer at Birkbeck to develop a new part-time
PGCert, covering topics such as;
• Module 1: Demystifying computing for heritage professionals
• Module 2: Analytic tools for cultural heritage professionals
• Module 3: Work-based digital project design and development
Trial
Autumn term 2019: Module 1 (15 credits) 6 hrs week/2 nights for 5 weeks
Spring term 2020: Module 3 (30 credits)
There are fully funded places on the trial for 20 staff from within the
British Library and the National Archives to attend in order to evaluate the
framework and programme content before it is fully launched in Autumn
2020.
Project page: https://www.bl.uk/projects/computingculturalheritage
Contact: nora.mcgregor@bl.uk
16. www.bl.uk 16
Digital Scholarship Training Seasons
A 'season' is a new, flexible format for learning and maintaining skills in the Library, with
training delivered through shorter modules that combine to build your knowledge of a
particular topic over time.
We know that it's hard to find the time to attend a whole day workshop, and that sometimes
you're only interested in specific aspects of a digital method or tool.
Running shorter sessions over a longer time-frame also allows us to respond to the rapid pace
of change for a subject like text and data mining, and gives you time to try out methods
between sessions.
Each season will have an introductory module outlining key concepts and terms, then you can
attend as many or as few of sessions as you like, depending on the skills you want to learn,
maintain or put into practice.
17. www.bl.uk 17
Season of Text & Data Mining
Led by Mia Ridge
Text and data mining (TDM) uses automated analytical techniques to analyse text and data for
patterns, trends and other useful information. TDM methods have been applied to digitised and digital
historic, cultural and scientific collections to help scholars answer new research questions, or
investigate questions at scale, analysing hundreds or hundreds of thousands of items.
In addition to supporting new forms of digital scholarship that apply TDM methods, institutions like the
British Library may also be able to use TDM to enhance records to make collection items more
discoverable. TDM in cultural heritage draws on data science, 'distant reading' and other techniques to
categorise items; identify concepts and entities such as people, places and events; apply sentiment
analysis and analyse items at scale.
Course 120 Content Mining in Digital Scholarship
18. www.bl.uk 18
Season of Place
Led by Adi Keinan-Schoonbaert
Recent season of talks and workshops on Digital Mapping for Cultural Heritage Collections. In
recent years digital mapping technologies have transformed the way we interact with the
world through GPS, mobile apps and spatial data. British Library collection items are replete
with geographic information, for example, place of publication, place-names within content
and many others we might not have considered.
Digital mapping provides a different perspective on Library collections, creating possibilities for
discovery and analysis and supporting new forms of digital scholarship and research. Anyone
with an interest in the collections could search, visualise, and analyse via geospatial web tools
or desktop Geographical Information System (GIS) applications. The digital scholarship ‘Season
of Place’ aims to open up these technologies for use on the library’s collections.
Course 108 Digital Mapping
19. www.bl.uk 19
Season of Emerging Formats
Led by Stella Wisdom
The Digital Scholarship and Contemporary British Collections teams are excited to announce a
season of talks and workshops about 'emerging formats', these are types of digital publications
that are in scope to collect under the UK’s Non-Print Legal Deposit Regulations, but whose
content and structure are more challenging compared to those currently collected.
Working with the UK legal deposit libraries, the British Library is building its knowledge and
capability before it can collect these publications and make them available onsite to readers. The
British Library's Emerging Formats project focused on three format types:
• eBook mobile apps
• web-based interactive narratives
• structured data
Course 122 Introduction to Emerging Formats
22. www.bl.uk 22
Get in touch!
Web: http://www.bl.uk/subjects/digital-scholarship
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
Email: digitalresearch@bl.uk
Twitter: : @BL_DigiSchol
Hinweis der Redaktion
We support Digital Scholars
We promote the use of British Library’s digital collections and data and offer support for anyone wishing to use them in exciting and innovative ways. We work closely with scholars to understand their needs, enable access to content, and provide guidance and technical assistance to fulfil their digital and data-intensive project goals.
o Examples: BL Labs, external training, collaborative PhDs.
We Connect & Share
Through our connection to a global ecosystem of scholars, labs and institutions operating in the digital scholarship domain we maintain awareness of developing trends in this changing research landscape. We share knowledge, expertise and experience across this vibrant community and can leverage the network to connect Library users to the resources they seek.
o Examples: LIBER DH and RLUK working groups
We are Agents for Change
We ensure the Library’s systems, services and policies will meet the needs of anyone wishing to undertake computational and data-driven research based on our digital collections and data. We develop and pilot new digital scholarship services that can be transitioned into production.
o Examples: Data.bl.uk and plans in development for a more ‘Digital Reading Room’
We invest in our Staff
We are building the Library’s capacity to understand and support the emerging needs of digital scholars by investing in our staff skill development. We provide colleagues with the space and opportunity to delve into and explore all that digital content and new technologies have to offer in the research domain today. Through our bespoke programme of workshops, hands-on training, lectures and reading groups we raise awareness of the opportunities new digital methods bring to our users and our profession.
o Examples: Training programme/Digital Curator Matrix
We Innovate & Collaborate
We undertake innovative research, projects and collaborations, applying and experimenting with digital methods on our own collections to find solutions to address barriers to access for users.
o Examples: Bengali/Arabic OCR, Mechanical Curator, Libcrowds/Playbills, IIIF.
We work in the open as much as possible across a range of channels, e.g. our Digital Scholarship blog – over the last year we’ve written about the following projects as worked progressed on the blog.
Earlier this year, the British Library in collaboration with PRImA Research Lab and the Alan Turing Institute launched a competition on the Recognition of Historical Arabic Scientific Manuscripts. This competition was held in the context of the 15th International Conference on Document Analysis and Recognition (ICDAR2019). It was the second competition of this type, following the first one which took place in 2018.
The Library has an extensive collection of Arabic manuscripts, comprising of almost 15,000 works. We have been digitising several hundred manuscripts as part of the British Library/Qatar Foundation Partnership, making them available on Qatar Digital Library. A natural next-step would be the creation of machine-readable content from scanned images, for enhanced search and whole new avenues of research.
Running a competition helps us identify software providers and tool developers, as well as introduce us to the specific challenges that pattern recognition systems face when dealing with historic, handwritten materials. For this year’s competition we provided a ground truth set of 120 images and associated XML files: 20 pages to be used to train text recognition systems to automatically identify Arabic script, and 100 pages to evaluate the training.
Aside from providing larger training and evaluation sets, for this year’s competition we’ve added an extra challenge – marginalia. Notes written in the margins are often less consistent and less coherent than main blocks of text, and can go in different directions. The competition set out three different challenges: page segmentation, text line detection and Optical Character Recognition (OCR). Tackling marginalia was a bonus challenge!
When evaluating the results, PRImA compared established systems used in industry and academia – Tesseract 4.0, ABBYY FineReader Engine 12 (FRE12), and Google Cloud Vision API. The evaluation approach was the same as last year’s in order to gain an insight into the algorithms.
At the end of 2015, an international partnership led by the British Library received funding from the Newton Fund to digitise rare material from its South Asian printed books collection. The Two Centuries of Indian Print project has digitised more than 1,000 early printed Bengali books which are now available online and is currently digitising a range of the other 22 South Asian languages in our collections to drive digital scholarship opportunities for non-Western materials.
The project is exploring how digital research methods and tools can be applied to this digitised collection, this is especially important as many DH tools are optimised for working with Western language materials.
For the first time the project has made freely available in digital format the library's collection of bound Quarterly Lists. These are descriptive catalogue records of books published quarterly and by province of British India between 1867 and 1947. The Quarterly Lists are available to download as searchable PDFs and as OCR XML via the British Library's datasets portal, data.bl.uk.
Map shows the location of the printers that were active in Kolkata and when clicking on one of the place markers shows some information for each printer about how many books were printed there, average number of copies printed and the average number of pages and price of a book across all the books they printed. It is using all the data from July-December 1867 from one of our Quarterly Lists.
We also want to explore different methods of presenting and providing access to our data, here the Quarterly Lists data is visualised with Tableau Public – one of the tools we experimented with during one of our Hack & Yack sessions, a casual, hands-on session arranged by the Digital Research Team every THIRD Tuesday of the month to work through an online tutorial at everyone's own pace but with support of colleagues. We use it as an opportunity to explore new tools/techniques/applications relevant to digital research and keep our own skills up to speed. These sessions supplement our larger digital scholarship training programme.
Grew out of a desire for The Alan Turing Institute and British Library to partner with each other. BL and other humanities scholars had been working for some time to interest Turing in the interesting problems that historical data presents to data science and AI; very much in line with our wider programmes of work in Digital Scholarship
Running since 2012, this innovative digital skill training initiative has provided the time and space for colleagues to develop digital skills and new ways of thinking. We aim to have something for everyone, from introductory courses aimed at novices to more advanced opportunities. It is very important to us that learning is inclusive and accessible, but also challenging
In 2018/2019 alone the team held 40 different staff training events! Within that, 147 individuals (60% women) attended 15 of our formal courses.
Background
A recent job advertisement for a curatorial role at the British Library reflects the changing nature, and digital competency requirements for professionals working in the cultural heritage sector:
-contribute to and undertake work on digitisation and digital projects
-assist in implementing new technologies to make the collections more accessible through online presence or through digital tools
-have experience or familiarity with a variety of information technology skills underpinning digital research methods and practices (e.g. geo-referencing, text mining)
This is no less the case for professionals already working in post, who have often come to their role many years ago, having deep domain expertise in a particular subject, yet now find themselves with increased responsibility for assisting on the design and delivery of complex digital projects, without a foundation in computing to truly empower them. Additionally, due to the scale and diversity of the digital collections held by the BL, and changing Library services and researcher demands, it is of great importance that all staff are aware of the issues, opportunities and strategies involved in working with large-scale digital collections and developing innovative digital projects. This requires having an understanding of approaches used in programming, data science, big data, machine learning, text mining, data analytics, cloud computing, and visualisation.
My colleague Nora McGregor has said: “Over the last seven years, the Digital Curator Team have delivered a ground-breaking Digital Scholarship training programme for staff at British Library. In this time we’ve experienced first-hand the incredible transformations that arise when time, space and opportunity is created for colleagues eager to keep apace of the technological innovations that underpin their work. This is an exciting opportunity to consolidate all that we’ve learned about the skills and knowledge they seek and encode it in a course uniquely designed to meet our needs in the cultural heritage sector."
Over time the delivery of our internal training programme has evolved to reflect the changing needs of staff working in operational and curatorial roles. Implemented over the last year the programme is now structured in more flexible way to accommodate the differing needs of all staff.
An example of a related project is our work with Transkribus. Transkribus is software designed to improve automatic handwritten text recognition. It works by training algorithms to understand handwritten text by comparing images of digitised pages with 'ground truth' transcriptions of those pages.
Following a pilot with records from the East India Office (we have 9 miles of holdings just for this one collection), the British Library signed a memorandum of understanding with the READ Project in 2017 and became a founding member of the newly established READ-COOP over the summer of 2019. A European Cooperative Society with limited liability will serve as the basis for sustaining and further developing the Transkribus platform and related services and tools.
Handwritten text recognition (HTR) will be as transformative for handwritten documents as optical character recognition was for printed materials. Our work with this project should help integrate HTR into the BL’s digitisation and digital library workflows.
A detailed overview of the work done for this season will soon be available in a co-authored article in the Journal of Map and Geography Libraries: Special Issue on Information Literacy Instruction.
A project we have also provided support to was a two-day hack event to produce a JavaScript web map with time slider component (Web Maps-T) and specifications for Timeline visualisation. The main aim is to enhance the ability to visualise Linked Open Data (LOD) on web maps.
Outcomes:
Web Maps-T: A GitHub repository containing a Minimum Viable Product (MVP) web maps with time-slides (a component for use within broader systems)
Timeline Visualisations: GitHub repository containing specifications, design outlines and user-stories for visualising temporal data
White Paper: summarising the hack event, position papers, Web Maps-T MVP and timeline, plans for their integration and next steps for the component
The nature of storytelling and publishing is changing through the possibilities offered by digital technologies and the definition of a digital story includes dynamic publications created for mobile devices and the internet.
In the United Kingdom, Legal Deposit Libraries have the right to collect material published digitally such as websites, blogs, eBooks and e-journals. However, what happens when an eBook/app behaves in an unexpected way and needs to turn to external sources of information to explain a story? What tools and methods do libraries need to store these eBooks/apps? What challenges are posed by software and hardware? How is the relationship between creators, libraries, technology companies and user communities changing? What do researchers need to access emerging formats in a library?
Working with colleagues across the Library over the last year this programme of activity enabled us to start to explore these issues and will feed into our plans for the forthcoming year.
If you’re interested to find out more about the range of activities we’re involved in please see the case studies on our webpages.
We have good experience of working with external research partners to attract joint funding from research councils and trusts. We welcome proposals that promise to produce research that leads to mutually beneficial outcomes.
If you’d like to know more than please get in touch or follow developments via the channels on screen.