SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Melissa Terras, James
Baker, James
Hetherington, David
Beavan, Martin Zaltz
Austwick, Anne Welsh,
Helen O'Neill, Will Finley,
Oliver Duke-Williams, and
Adam Farquhar
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Exceptions: quotations, embeds from external sources, logos, and marked images.
Enabling Complex
Analysis of Large-Scale
Digital Collections
Humanities Research, High Performance
Computing, and transforming access to
British Library Digital Collections Data, code, viz: github.com/UCL-
dataspring
Overview
Barriers to computational approaches:
● fragmentation of communities,
resources, and tools;
● lack of interoperability;
● lack of technical skills
Data, code, viz: github.com/UCL-dataspring
Method
60k books from the British Library:
●
17th
- 19th
century
● 224GB compressed ALTO XML
● UCL High Performance Computing
● 4 humanities researchers
● Research questions to
computational queries
Data, code, viz: github.com/UCL-dataspring
Data, code, viz: github.com/UCL-dataspring
UCL’s Legion Cluster supercomputing facility. Photo: Tony Slade, © UCL Creative Media Services (all rights reserved)
Method
60k books from the British Library:
●
17th
- 19th
century
● 224GB compressed ALTO XML
● UCL High Performance Computing
● 4 humanities researchers
● Research questions to
computational queries
Data, code, viz: github.com/UCL-dataspring
Results
It worked!:
● Case Study 1: History of Medicine
● Case Study 2: History of Images
● Technical barriers
● Search ‘recipes’
Data, code, viz: github.com/UCL-dataspring
Case Study 1
History of Medicine Oliver Duke-Williams, UCL
Data, code, viz: github.com/UCL-dataspring
Case
Study 2
History of
Images
Will Finley,
Sheffield
Data, code, viz: github.com/UCL-dataspring
Case
Study 2
History of
Images
Will Finley,
Sheffield
Data, code, viz: github.com/UCL-dataspring
Technical
Major sticking point:
● Using humanities data on HPCs
Best practice recommendations:
● Derived datasets
● Normalisations
● Documentating decisions
● Fixed/defined dataset
Data, code, viz: github.com/UCL-dataspring
Generic searches:
● for all variants of a word
● that return keywords in context
traced over time
● for a word or phrase that ignore
another word or phrase
● for a word when in close proximity
to word a second word
● based on image metadata
Data, code, viz: github.com/UCL-dataspring
Conclusions
Recommendations for enabling
complex analysis of large-scale digital
collections in the humanities:
● 1 Invest in research software engineer capacity
to deploy and maintain openly licensed large-
scale digital collections from across the GLAM
sector in order to facilitate research in the arts,
humanities and social and historical sciences,
● 2 Invest in training library staff to run these initial
queries in collaboration with humanities faculty,
to support work with subsets of data that are
produced, and to document and manage
resulting code and derived data.
Data, code, viz: github.com/UCL-dataspring
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Exceptions: quotations, embeds from external sources, logos, and marked images.
Special thanks to UCL
Research Computing and
British Library Digital
Research for their hard work
and support!
Data, code, viz: github.com/UCL-
dataspring
Melissa Terras, James
Baker, James
Hetherington, David
Beavan, Martin Zaltz
Austwick, Anne Welsh,
Helen O'Neill, Will Finley,
Oliver Duke-Williams, and
Adam Farquhar

Weitere ähnliche Inhalte

Was ist angesagt?

The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Mahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsMahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsResearchLibrariesUK
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...Nuno Freire
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017Bethan Ruddock
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentConstance Malpas
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?OCLC
 
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...labsbl
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...LIBER Europe
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanitieslabsbl
 
British Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open UniversityBritish Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open Universitylabsbl
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data toolsJisc RDM
 
IIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF_io
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Projectariadnenetwork
 
British Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTBritish Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTlabsbl
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorJulien A. Raemy
 

Was ist angesagt? (20)

The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Mahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsMahendra Mahey, British Library Labs
Mahendra Mahey, British Library Labs
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environment
 
Dash UCCSC 2016
Dash UCCSC 2016Dash UCCSC 2016
Dash UCCSC 2016
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...
 
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
 
Edina cigs-21-september-2012
Edina cigs-21-september-2012Edina cigs-21-september-2012
Edina cigs-21-september-2012
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanities
 
British Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open UniversityBritish Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open University
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data tools
 
IIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single Institution
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Project
 
British Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTBritish Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMT
 
Ukla uksg 2013_final
Ukla uksg 2013_finalUkla uksg 2013_final
Ukla uksg 2013_final
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
 

Andere mochten auch (15)

Gdz ukrainska mova_bilyaev
Gdz ukrainska mova_bilyaevGdz ukrainska mova_bilyaev
Gdz ukrainska mova_bilyaev
 
Data Fusion Poster
Data Fusion PosterData Fusion Poster
Data Fusion Poster
 
ден на победата
ден на победатаден на победата
ден на победата
 
Uusimmat kohteet turkissa Asunto Alanyasta Turkista
Uusimmat kohteet turkissa Asunto Alanyasta TurkistaUusimmat kohteet turkissa Asunto Alanyasta Turkista
Uusimmat kohteet turkissa Asunto Alanyasta Turkista
 
The Hard Disk as the new Paper Archive
The Hard Disk as the new Paper ArchiveThe Hard Disk as the new Paper Archive
The Hard Disk as the new Paper Archive
 
Museum Ceria's Company Profile
Museum Ceria's Company ProfileMuseum Ceria's Company Profile
Museum Ceria's Company Profile
 
Museum Label for Kids ~ Ajeng
Museum Label for Kids ~ AjengMuseum Label for Kids ~ Ajeng
Museum Label for Kids ~ Ajeng
 
Importance on Conference Call Etiquette
Importance on Conference Call EtiquetteImportance on Conference Call Etiquette
Importance on Conference Call Etiquette
 
[SLIDE FACTORY] [CV slide] Vũ Trà Mi
[SLIDE FACTORY] [CV slide] Vũ Trà Mi[SLIDE FACTORY] [CV slide] Vũ Trà Mi
[SLIDE FACTORY] [CV slide] Vũ Trà Mi
 
Abstencionistas, abstenerse
Abstencionistas, abstenerseAbstencionistas, abstenerse
Abstencionistas, abstenerse
 
Tema 4 1 16
Tema 4 1 16Tema 4 1 16
Tema 4 1 16
 
Tema 5 hegemonía y transmisión de la cultura
Tema 5   hegemonía y transmisión de la cultura   Tema 5   hegemonía y transmisión de la cultura
Tema 5 hegemonía y transmisión de la cultura
 
1b) A2 Media - Language Analysis
1b) A2 Media - Language Analysis1b) A2 Media - Language Analysis
1b) A2 Media - Language Analysis
 
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three BuzzwordsMicroservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
 
Proposal Company Job Fair Depok
Proposal Company Job Fair DepokProposal Company Job Fair Depok
Proposal Company Job Fair Depok
 

Ähnlich wie Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections

Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACIDaniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit youUoLResearchSupport
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...Trevor Owens
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryHeinz Pampel
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 

Ähnlich wie Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections (20)

Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACI
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit you
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Open science platforms
Open science platformsOpen science platforms
Open science platforms
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Ppt hk pres_final
Ppt hk pres_finalPpt hk pres_final
Ppt hk pres_final
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 

Mehr von James Baker

1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...James Baker
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experienceJames Baker
 
Decolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentDecolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentJames Baker
 
The Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectThe Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectJames Baker
 
Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007James Baker
 
Forensic Recovery from Data Storage
Forensic Recovery from Data StorageForensic Recovery from Data Storage
Forensic Recovery from Data StorageJames Baker
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experienceJames Baker
 
Who is the Digital Historian?
Who is the Digital Historian?Who is the Digital Historian?
Who is the Digital Historian?James Baker
 
Image Recognition with Pastec
Image Recognition with PastecImage Recognition with Pastec
Image Recognition with PastecJames Baker
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of DataJames Baker
 
Library Carpentry: software skills training for library professionals, Chart...
 Library Carpentry: software skills training for library professionals, Chart... Library Carpentry: software skills training for library professionals, Chart...
Library Carpentry: software skills training for library professionals, Chart...James Baker
 
Hard disks as archives of everyday life
Hard disks as archives of everyday lifeHard disks as archives of everyday life
Hard disks as archives of everyday lifeJames Baker
 
Ditching the Digital
Ditching the DigitalDitching the Digital
Ditching the DigitalJames Baker
 
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...James Baker
 
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...James Baker
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...James Baker
 
Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsJames Baker
 
On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...James Baker
 
Me in three minutes
Me in three minutesMe in three minutes
Me in three minutesJames Baker
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...James Baker
 

Mehr von James Baker (20)

1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experience
 
Decolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentDecolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-present
 
The Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectThe Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open Project
 
Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007
 
Forensic Recovery from Data Storage
Forensic Recovery from Data StorageForensic Recovery from Data Storage
Forensic Recovery from Data Storage
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experience
 
Who is the Digital Historian?
Who is the Digital Historian?Who is the Digital Historian?
Who is the Digital Historian?
 
Image Recognition with Pastec
Image Recognition with PastecImage Recognition with Pastec
Image Recognition with Pastec
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of Data
 
Library Carpentry: software skills training for library professionals, Chart...
 Library Carpentry: software skills training for library professionals, Chart... Library Carpentry: software skills training for library professionals, Chart...
Library Carpentry: software skills training for library professionals, Chart...
 
Hard disks as archives of everyday life
Hard disks as archives of everyday lifeHard disks as archives of everyday life
Hard disks as archives of everyday life
 
Ditching the Digital
Ditching the DigitalDitching the Digital
Ditching the Digital
 
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
 
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...
 
Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: Basics
 
On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...
 
Me in three minutes
Me in three minutesMe in three minutes
Me in three minutes
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...
 

Kürzlich hochgeladen

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 

Kürzlich hochgeladen (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections

  • 1. Melissa Terras, James Baker, James Hetherington, David Beavan, Martin Zaltz Austwick, Anne Welsh, Helen O'Neill, Will Finley, Oliver Duke-Williams, and Adam Farquhar This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: quotations, embeds from external sources, logos, and marked images. Enabling Complex Analysis of Large-Scale Digital Collections Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections Data, code, viz: github.com/UCL- dataspring
  • 2. Overview Barriers to computational approaches: ● fragmentation of communities, resources, and tools; ● lack of interoperability; ● lack of technical skills Data, code, viz: github.com/UCL-dataspring
  • 3. Method 60k books from the British Library: ● 17th - 19th century ● 224GB compressed ALTO XML ● UCL High Performance Computing ● 4 humanities researchers ● Research questions to computational queries Data, code, viz: github.com/UCL-dataspring
  • 4. Data, code, viz: github.com/UCL-dataspring UCL’s Legion Cluster supercomputing facility. Photo: Tony Slade, © UCL Creative Media Services (all rights reserved)
  • 5. Method 60k books from the British Library: ● 17th - 19th century ● 224GB compressed ALTO XML ● UCL High Performance Computing ● 4 humanities researchers ● Research questions to computational queries Data, code, viz: github.com/UCL-dataspring
  • 6. Results It worked!: ● Case Study 1: History of Medicine ● Case Study 2: History of Images ● Technical barriers ● Search ‘recipes’ Data, code, viz: github.com/UCL-dataspring
  • 7. Case Study 1 History of Medicine Oliver Duke-Williams, UCL Data, code, viz: github.com/UCL-dataspring
  • 8. Case Study 2 History of Images Will Finley, Sheffield Data, code, viz: github.com/UCL-dataspring
  • 9. Case Study 2 History of Images Will Finley, Sheffield Data, code, viz: github.com/UCL-dataspring
  • 10. Technical Major sticking point: ● Using humanities data on HPCs Best practice recommendations: ● Derived datasets ● Normalisations ● Documentating decisions ● Fixed/defined dataset Data, code, viz: github.com/UCL-dataspring
  • 11. Generic searches: ● for all variants of a word ● that return keywords in context traced over time ● for a word or phrase that ignore another word or phrase ● for a word when in close proximity to word a second word ● based on image metadata Data, code, viz: github.com/UCL-dataspring
  • 12. Conclusions Recommendations for enabling complex analysis of large-scale digital collections in the humanities: ● 1 Invest in research software engineer capacity to deploy and maintain openly licensed large- scale digital collections from across the GLAM sector in order to facilitate research in the arts, humanities and social and historical sciences, ● 2 Invest in training library staff to run these initial queries in collaboration with humanities faculty, to support work with subsets of data that are produced, and to document and manage resulting code and derived data. Data, code, viz: github.com/UCL-dataspring
  • 13. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: quotations, embeds from external sources, logos, and marked images. Special thanks to UCL Research Computing and British Library Digital Research for their hard work and support! Data, code, viz: github.com/UCL- dataspring Melissa Terras, James Baker, James Hetherington, David Beavan, Martin Zaltz Austwick, Anne Welsh, Helen O'Neill, Will Finley, Oliver Duke-Williams, and Adam Farquhar