SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Building a Collection of the Historical UK Web for scholarly use 
Helen Hockx-Yu 
Head of Web Archiving, British Library
www.bl.uk 
2 
The UK Web Domain 
4th TLD after .com, .de and .net 
Over 10 million .uk registered domain 
UK organisations also use non .uk domain names (eg .com or .org) – scale unknown 
Non-print Legal Deposit (since April 2013) applies to 
the open (freely available) web: .uk and other UK-published (non .uk) websites, such as .com, .org… 
also e-journals, e-books, news web pages and other digital publications, either by harvesting or mutual agreement on other delivery methods
www.bl.uk 
3 
Web Archiving at the British Library 
Collect UK digital heritage and provide continued access to archived web resources 
Started web archiving in 2003: Open UK Web Archive 
Selective, topical collections and key sites 
Consortium sharing infrastructure and development effort; agreement on who collects what 
Curating collections with organisations and researchers 
Archiving UK Web for non-print Legal Deposit since April 2013: Legal Deposit UK Web Archive 
Comprehensive national archive with on-site access only 
Joint responsibility of six Legal Deposit Libraries (LDLs)
www.bl.uk 
4 
Domain Crawl 
News 
Special collection 
Special collection 
Domain crawl: 
•Broad sweep of UK domain 
•Once or twice a year 
Events & key sites and news: 
•Events of UK interest 
•High value, high impact sites 
•National & regional news 
Special Collection: 
•Focused, thematic collections 
•Support priority subjects 
Key sites 
Events 
Special collection 
Special collection 
Collecting strategy for websites
www.bl.uk 
5 
UK websites – territoriality explained 
An online work is considered as “published in the UK” and therefore in scope for Legal Deposit, if it meets either of the following criteria: 
(a) it is made available to the public from a website with a domain name which relates to the United Kingdom or to a place within the United Kingdom; or 
(b) it is made available to the public by a person and any of that person’s activities relating to the creation or the publication of the work take place within the United Kingdom 
The Legal Deposit Libraries (Non-Print Works) Regulations, 2013
www.bl.uk 
6 
Territoriality - implementation 
All websites with a .uk domain name 
Including embedded content (eg CSS, images) regardless where it is hosted 
non .uk websites have to meet at least one criteria 
UK Hosting: check external IP geo-location database and add in-scope URLs to the fetch-chain 
UK postal address 
Correspondence 
Professional judgement
www.bl.uk 
7 
UK Domain Crawl 
2013 domain crawl stats 
3.86 million seeds 
1.9 billion URLs (web pages, docs, images) 
~31TB 
Duration: 70days 
2014 domain crawl 
90 million seeds (starting URLs) 
Started on 19th June 2014 
Collected 52TB of data (by 9th December (incl. 4.4GB of viruses & 3TB of homepage screenshots) 
Nearly 2 million non .uk domains
www.bl.uk 
8 
The “access” paradoxes 
Completeness versus openness of web archives 
Legal Deposit national collections have restricted access 
Documents-centred versus data driven 
Essentially a scale issue 
Pre-selected or defined collections not relevant to all researchers; difficulty in finding relevant content in large scale web archive. 
Arbitrary (national) boundaries often irrelevant to research question but most heritage institutions operation within certain geographical areas 
…
www.bl.uk 
9 
9 
Web archive as historical document
www.bl.uk 
10 
Collaboration with researchers 
Building collections 
Researchers’ involvement in scoping collections, selecting and describing websites 
Creation of specific, (narrow) topical collections 
Formulating research question 
Brain-storm sessions, workshops, discussion, surveys etc. 
Lack of awareness & baseline knowledge 
Challenging: you don’t know what you don’t know 
Co-development of access services 
This is changing how we collect and store data
www.bl.uk 
11 
JISC UK Web Domain dataset (1996-2013) 
Collaboration between the Internet Archive (IA), the Joint Information Systems Committee (JISC) and the British Library 
Extracted copies of UK websites from the Internet Archives collection 
1st tranche : 1996 – 2010, 30TB, 2.5 billion URLs 
2nd tranche: 2010 – April 2013, 27.5TB, 1.5 billion URLs (estimated) 
Research agreement between JISC and IA, upholding IA’s Terms of Use 
Access via IA’s Wayback Machine 
Allows replication / extraction of derivative or secondary datasets 
BL hosts the dataset on behalf of JISC 
Data used by research projects 
Institute of Historical Research project: Analytical Access to the Domain Dark Archive (AADDA) 
Oxford Internet Institute project: Big data for political science
www.bl.uk 
12 
Completed work 
Analytical Access to the Domain Dark Archive Project 
Use cases & experimental UI 
Demonstrating the Value of the UK Web Domain Dataset for Social Science Research 
 Analysis of link graph 
Paper accepted for WebSci’14: Mapping the UK Webspace: Fifteen Years of British Universities on the Web 
MA thesis by Jules Mataly: The Three Truths of Margaret Thatcher: Creating and Analysing 
Secondary datasets under open licence 
Format profile, Geoindex, Host Link Graph
www.bl.uk 
13 
Exploring Host Link Graph 
Courtesy of Peter Webster, Rainer Simon and Jules Mataly
www.bl.uk 
14 
Visualising links (to and from bl.uk) 
Interactive version How it is done
www.bl.uk 
15 
Visualising links (to and from bl.uk) 
Interactive version How it is done
www.bl.uk 
16 
Evolution of the UK web (2004 -2013)
www.bl.uk 
17 
Memento service
www.bl.uk 
18 
Big UK Domain Data for Arts and Humanities 
Funded by the UK Arts and Humanities Research Council as one of the 21 “Big Data” projects 
Collaboration between the Institution of Historical Research, Oxford Internet Institute, British Library and Aarhus University 
Develop theoretical and methodological framework for the study of web archives 
Build on ADDAA: researchers and the BL co-produce access tools 
 A major study of the history of UK web space from 1996 to 2013 + sub-projects covering a range of disciplines 
 Also an online training course and peer-reviewed journal articles.
www.bl.uk 
19 
Web archiving researcher bursaries
www.bl.uk 
20 
Query building 
Corpus formation and handling 
Annotation and curation 
In-corpus analysis 
Whole-dataset analysis 
Shine
www.bl.uk 
21 
What’s in it for us? 
Helps researchers understand the value of web archives and explore new ways of using these for scholarly research 
Allows BL to obtain hands-on experience with indexing and processing large scale web archive datasets 
(Prototypes) analytics and visualisations can be applied to our own Legal Deposit collection 
Enables BL to participate in various UK, European and international projects 
Helps curators understand characteristics of large scale digital corpora 
Improve the way we collet and store web archive
www.bl.uk 
22 
Web archives for reference AND for analytics 
Base-line knowledge self-explanatory 
Focus on national events for curated collections; provide means to assemble research corpora 
Link to what we do not have 
Offer a bag of tools to support scholarly use 
The go-to state 
Exploit open licences, changes to copyright law 
Online access to selected websites, metadata and secondary datasets 
The British Library Collection Development Policy for websites 
Lobbying – review of Non-print Legal Deposit Regulations in 2018

Weitere ähnliche Inhalte

Was ist angesagt?

Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPCTuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPCWARCnet
 
Linking the 20th century paper history to the sum of all knowledge
Linking the 20th century paper history to the sum of all knowledgeLinking the 20th century paper history to the sum of all knowledge
Linking the 20th century paper history to the sum of all knowledgeJoachim Neubert
 
The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...ukcorr
 
Turning your catalogue into Linked Data
Turning your catalogue into Linked DataTurning your catalogue into Linked Data
Turning your catalogue into Linked DataBernard Scaife
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationWARCnet
 
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...WARCnet
 
Wikidata as opportunity for special collections: the 20th Century Press Archi...
Wikidata as opportunity for special collections: the 20th Century Press Archi...Wikidata as opportunity for special collections: the 20th Century Press Archi...
Wikidata as opportunity for special collections: the 20th Century Press Archi...Joachim Neubert
 
Seeing In The Dark: Discovery and data-mining of restricted web archives
Seeing In The Dark: Discovery and data-mining of restricted web archivesSeeing In The Dark: Discovery and data-mining of restricted web archives
Seeing In The Dark: Discovery and data-mining of restricted web archivesAndy Jackson
 
2010 nalis presentation1
2010 nalis presentation12010 nalis presentation1
2010 nalis presentation1Richard Ovenden
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsJon Voss
 
For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...Torsten Reimer
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria
 
Data visualisation workshop
Data visualisation workshopData visualisation workshop
Data visualisation workshoplukemckernan
 
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomCorpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomStella Wisdom
 
An introduction to the International Internet Preservation Consortium. Mary Pitt
An introduction to the International Internet Preservation Consortium. Mary PittAn introduction to the International Internet Preservation Consortium. Mary Pitt
An introduction to the International Internet Preservation Consortium. Mary PittBiblioteca Nacional de España
 
Tuesday 5 May: Definition and Representation of National Web Domains across W...
Tuesday 5 May: Definition and Representation of National Web Domains across W...Tuesday 5 May: Definition and Representation of National Web Domains across W...
Tuesday 5 May: Definition and Representation of National Web Domains across W...WARCnet
 
Internet in space - Networkshop44
Internet in space - Networkshop44Internet in space - Networkshop44
Internet in space - Networkshop44Jisc
 
European Research at UGent: how to comply with open access mandate
European Research at UGent: how to comply with open access mandateEuropean Research at UGent: how to comply with open access mandate
European Research at UGent: how to comply with open access mandateOpenAccessBelgium
 
Adlug annual meeting 2013
Adlug annual meeting 2013Adlug annual meeting 2013
Adlug annual meeting 2013@CULT Srl
 
The Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge ProducersThe Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge ProducersWorld Bank Publications
 

Was ist angesagt? (20)

Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPCTuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
 
Linking the 20th century paper history to the sum of all knowledge
Linking the 20th century paper history to the sum of all knowledgeLinking the 20th century paper history to the sum of all knowledge
Linking the 20th century paper history to the sum of all knowledge
 
The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...
 
Turning your catalogue into Linked Data
Turning your catalogue into Linked DataTurning your catalogue into Linked Data
Turning your catalogue into Linked Data
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman Presentation
 
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
 
Wikidata as opportunity for special collections: the 20th Century Press Archi...
Wikidata as opportunity for special collections: the 20th Century Press Archi...Wikidata as opportunity for special collections: the 20th Century Press Archi...
Wikidata as opportunity for special collections: the 20th Century Press Archi...
 
Seeing In The Dark: Discovery and data-mining of restricted web archives
Seeing In The Dark: Discovery and data-mining of restricted web archivesSeeing In The Dark: Discovery and data-mining of restricted web archives
Seeing In The Dark: Discovery and data-mining of restricted web archives
 
2010 nalis presentation1
2010 nalis presentation12010 nalis presentation1
2010 nalis presentation1
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & Museums
 
For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Data visualisation workshop
Data visualisation workshopData visualisation workshop
Data visualisation workshop
 
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomCorpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
 
An introduction to the International Internet Preservation Consortium. Mary Pitt
An introduction to the International Internet Preservation Consortium. Mary PittAn introduction to the International Internet Preservation Consortium. Mary Pitt
An introduction to the International Internet Preservation Consortium. Mary Pitt
 
Tuesday 5 May: Definition and Representation of National Web Domains across W...
Tuesday 5 May: Definition and Representation of National Web Domains across W...Tuesday 5 May: Definition and Representation of National Web Domains across W...
Tuesday 5 May: Definition and Representation of National Web Domains across W...
 
Internet in space - Networkshop44
Internet in space - Networkshop44Internet in space - Networkshop44
Internet in space - Networkshop44
 
European Research at UGent: how to comply with open access mandate
European Research at UGent: how to comply with open access mandateEuropean Research at UGent: how to comply with open access mandate
European Research at UGent: how to comply with open access mandate
 
Adlug annual meeting 2013
Adlug annual meeting 2013Adlug annual meeting 2013
Adlug annual meeting 2013
 
The Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge ProducersThe Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge Producers
 

Ähnlich wie Building a Collection of the Historical UK Web for scholarly use

Supporting research with open services at the British Library, Sara Gould, Op...
Supporting research with open services at the British Library, Sara Gould, Op...Supporting research with open services at the British Library, Sara Gould, Op...
Supporting research with open services at the British Library, Sara Gould, Op...Crossref
 
Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013Digital History
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsjohnkayebl
 
Internet Archive: Archive-It and Contract Crawling, C. Mumma
Internet Archive: Archive-It and Contract Crawling, C. MummaInternet Archive: Archive-It and Contract Crawling, C. Mumma
Internet Archive: Archive-It and Contract Crawling, C. MummaNetwerk Digitaal Erfgoed
 
Digitised Content: What universities can learn from publishers and what publi...
Digitised Content: What universities can learn from publishers and what publi...Digitised Content: What universities can learn from publishers and what publi...
Digitised Content: What universities can learn from publishers and what publi...Alastair Dunning
 
Peter webster interrogating the archived uk web
Peter webster   interrogating the archived uk webPeter webster   interrogating the archived uk web
Peter webster interrogating the archived uk webDigital History
 
Opening up the archives: from basement to browser
Opening up the archives: from basement to browserOpening up the archives: from basement to browser
Opening up the archives: from basement to browserAmanda Hill
 
Going, going, gone - Can legal deposit save us from the digital black hole? -...
Going, going, gone - Can legal deposit save us from the digital black hole? -...Going, going, gone - Can legal deposit save us from the digital black hole? -...
Going, going, gone - Can legal deposit save us from the digital black hole? -...CONUL Conference
 
Information About Legal
Information About LegalInformation About Legal
Information About Legallegalwebsite
 
Legal Deposit
Legal DepositLegal Deposit
Legal DepositCrossref
 
Digital archaeology and museums
Digital archaeology and museumsDigital archaeology and museums
Digital archaeology and museumsdejp3
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three dri_ireland
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK WebScott A. Hale
 
BL Labs presentation given to the Digital Scholarship Team
BL Labs presentation given to the Digital Scholarship TeamBL Labs presentation given to the Digital Scholarship Team
BL Labs presentation given to the Digital Scholarship Teamlabsbl
 
Open access at the British Library
Open access at the British Library Open access at the British Library
Open access at the British Library Crossref
 
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEWEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEBogdan Trifunovic
 

Ähnlich wie Building a Collection of the Historical UK Web for scholarly use (20)

Supporting research with open services at the British Library, Sara Gould, Op...
Supporting research with open services at the British Library, Sara Gould, Op...Supporting research with open services at the British Library, Sara Gould, Op...
Supporting research with open services at the British Library, Sara Gould, Op...
 
Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013
 
Digital Research Support by Stella Wisdom
Digital Research Support by Stella WisdomDigital Research Support by Stella Wisdom
Digital Research Support by Stella Wisdom
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientists
 
Internet Archive: Archive-It and Contract Crawling, C. Mumma
Internet Archive: Archive-It and Contract Crawling, C. MummaInternet Archive: Archive-It and Contract Crawling, C. Mumma
Internet Archive: Archive-It and Contract Crawling, C. Mumma
 
Digitised Content: What universities can learn from publishers and what publi...
Digitised Content: What universities can learn from publishers and what publi...Digitised Content: What universities can learn from publishers and what publi...
Digitised Content: What universities can learn from publishers and what publi...
 
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
Digital Research Support by Stella Wisdom, 20th & 21st Century CollectionsDigital Research Support by Stella Wisdom, 20th & 21st Century Collections
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
 
Peter webster interrogating the archived uk web
Peter webster   interrogating the archived uk webPeter webster   interrogating the archived uk web
Peter webster interrogating the archived uk web
 
Opening up the archives: from basement to browser
Opening up the archives: from basement to browserOpening up the archives: from basement to browser
Opening up the archives: from basement to browser
 
Going, going, gone - Can legal deposit save us from the digital black hole? -...
Going, going, gone - Can legal deposit save us from the digital black hole? -...Going, going, gone - Can legal deposit save us from the digital black hole? -...
Going, going, gone - Can legal deposit save us from the digital black hole? -...
 
Information About Legal
Information About LegalInformation About Legal
Information About Legal
 
Legal Deposit
Legal DepositLegal Deposit
Legal Deposit
 
Digital archaeology and museums
Digital archaeology and museumsDigital archaeology and museums
Digital archaeology and museums
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK Web
 
BL Labs presentation given to the Digital Scholarship Team
BL Labs presentation given to the Digital Scholarship TeamBL Labs presentation given to the Digital Scholarship Team
BL Labs presentation given to the Digital Scholarship Team
 
3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise
 
Open access at the British Library
Open access at the British Library Open access at the British Library
Open access at the British Library
 
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEWEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
 

Mehr von ALISS

Library champions for disability Meeting Notes January 22nd 2021
Library champions for disability Meeting Notes January 22nd 2021Library champions for disability Meeting Notes January 22nd 2021
Library champions for disability Meeting Notes January 22nd 2021ALISS
 
Disability- higher education, libraries, teaching and learning bibliography m...
Disability- higher education, libraries, teaching and learning bibliography m...Disability- higher education, libraries, teaching and learning bibliography m...
Disability- higher education, libraries, teaching and learning bibliography m...ALISS
 
What is crowdsourcing?
What is crowdsourcing?What is crowdsourcing?
What is crowdsourcing?ALISS
 
Creating Digital Collections Through Crowdsourcing
Creating Digital Collections Through CrowdsourcingCreating Digital Collections Through Crowdsourcing
Creating Digital Collections Through CrowdsourcingALISS
 
The sound of the Crowd: David Tomkins, Bodleian Digital Library
The sound of the Crowd: David Tomkins, Bodleian Digital Library The sound of the Crowd: David Tomkins, Bodleian Digital Library
The sound of the Crowd: David Tomkins, Bodleian Digital Library ALISS
 
Incorporating student content at city- Diane Bell, City University
Incorporating student content at city- Diane Bell, City UniversityIncorporating student content at city- Diane Bell, City University
Incorporating student content at city- Diane Bell, City UniversityALISS
 
July2015cooke.
July2015cooke.July2015cooke.
July2015cooke.ALISS
 
ALISS AGM Minutes 2015
ALISS AGM Minutes 2015ALISS AGM Minutes 2015
ALISS AGM Minutes 2015ALISS
 
Developing digital literacies in undergraduate students: SADL project -
Developing digital literacies in undergraduate students: SADL project - Developing digital literacies in undergraduate students: SADL project -
Developing digital literacies in undergraduate students: SADL project - ALISS
 
News media at the British Library
News media at the British LibraryNews media at the British Library
News media at the British LibraryALISS
 
How SCIE supports the information needs of health and social care professionals
How SCIE supports the information needs of health and social care professionalsHow SCIE supports the information needs of health and social care professionals
How SCIE supports the information needs of health and social care professionalsALISS
 
Searching systematically: supporting authors of Cochrane reviews.
Searching systematically: supporting authors of Cochrane reviews.  Searching systematically: supporting authors of Cochrane reviews.
Searching systematically: supporting authors of Cochrane reviews. ALISS
 
Jo Wood, Cafcass –Build it and they will come: developing an in-house service...
Jo Wood, Cafcass –Build it and they will come: developing an in-house service...Jo Wood, Cafcass –Build it and they will come: developing an in-house service...
Jo Wood, Cafcass –Build it and they will come: developing an in-house service...ALISS
 
Speedy professional conversations around learning and teaching in higher educ...
Speedy professional conversations around learning and teaching in higher educ...Speedy professional conversations around learning and teaching in higher educ...
Speedy professional conversations around learning and teaching in higher educ...ALISS
 
The Digital Documents Harvesting and Processing Tool (Document Harvester)
The Digital Documents Harvesting and Processing Tool (Document Harvester)The Digital Documents Harvesting and Processing Tool (Document Harvester)
The Digital Documents Harvesting and Processing Tool (Document Harvester)ALISS
 
Legal Deposit in a Digital Age: an overview
Legal Deposit in a Digital Age: an overviewLegal Deposit in a Digital Age: an overview
Legal Deposit in a Digital Age: an overviewALISS
 
Useful resources for student training and orientation
Useful resources for student training and orientationUseful resources for student training and orientation
Useful resources for student training and orientationALISS
 
Doing something different staff development and workplace learning at Cardiff...
Doing something different staff development and workplace learning at Cardiff...Doing something different staff development and workplace learning at Cardiff...
Doing something different staff development and workplace learning at Cardiff...ALISS
 
Knowledge, skills and reskilling – where does the MSc fit in?
Knowledge, skills and reskilling – where does the MSc fit in?Knowledge, skills and reskilling – where does the MSc fit in?
Knowledge, skills and reskilling – where does the MSc fit in?ALISS
 
Start with the Staff
Start with the StaffStart with the Staff
Start with the StaffALISS
 

Mehr von ALISS (20)

Library champions for disability Meeting Notes January 22nd 2021
Library champions for disability Meeting Notes January 22nd 2021Library champions for disability Meeting Notes January 22nd 2021
Library champions for disability Meeting Notes January 22nd 2021
 
Disability- higher education, libraries, teaching and learning bibliography m...
Disability- higher education, libraries, teaching and learning bibliography m...Disability- higher education, libraries, teaching and learning bibliography m...
Disability- higher education, libraries, teaching and learning bibliography m...
 
What is crowdsourcing?
What is crowdsourcing?What is crowdsourcing?
What is crowdsourcing?
 
Creating Digital Collections Through Crowdsourcing
Creating Digital Collections Through CrowdsourcingCreating Digital Collections Through Crowdsourcing
Creating Digital Collections Through Crowdsourcing
 
The sound of the Crowd: David Tomkins, Bodleian Digital Library
The sound of the Crowd: David Tomkins, Bodleian Digital Library The sound of the Crowd: David Tomkins, Bodleian Digital Library
The sound of the Crowd: David Tomkins, Bodleian Digital Library
 
Incorporating student content at city- Diane Bell, City University
Incorporating student content at city- Diane Bell, City UniversityIncorporating student content at city- Diane Bell, City University
Incorporating student content at city- Diane Bell, City University
 
July2015cooke.
July2015cooke.July2015cooke.
July2015cooke.
 
ALISS AGM Minutes 2015
ALISS AGM Minutes 2015ALISS AGM Minutes 2015
ALISS AGM Minutes 2015
 
Developing digital literacies in undergraduate students: SADL project -
Developing digital literacies in undergraduate students: SADL project - Developing digital literacies in undergraduate students: SADL project -
Developing digital literacies in undergraduate students: SADL project -
 
News media at the British Library
News media at the British LibraryNews media at the British Library
News media at the British Library
 
How SCIE supports the information needs of health and social care professionals
How SCIE supports the information needs of health and social care professionalsHow SCIE supports the information needs of health and social care professionals
How SCIE supports the information needs of health and social care professionals
 
Searching systematically: supporting authors of Cochrane reviews.
Searching systematically: supporting authors of Cochrane reviews.  Searching systematically: supporting authors of Cochrane reviews.
Searching systematically: supporting authors of Cochrane reviews.
 
Jo Wood, Cafcass –Build it and they will come: developing an in-house service...
Jo Wood, Cafcass –Build it and they will come: developing an in-house service...Jo Wood, Cafcass –Build it and they will come: developing an in-house service...
Jo Wood, Cafcass –Build it and they will come: developing an in-house service...
 
Speedy professional conversations around learning and teaching in higher educ...
Speedy professional conversations around learning and teaching in higher educ...Speedy professional conversations around learning and teaching in higher educ...
Speedy professional conversations around learning and teaching in higher educ...
 
The Digital Documents Harvesting and Processing Tool (Document Harvester)
The Digital Documents Harvesting and Processing Tool (Document Harvester)The Digital Documents Harvesting and Processing Tool (Document Harvester)
The Digital Documents Harvesting and Processing Tool (Document Harvester)
 
Legal Deposit in a Digital Age: an overview
Legal Deposit in a Digital Age: an overviewLegal Deposit in a Digital Age: an overview
Legal Deposit in a Digital Age: an overview
 
Useful resources for student training and orientation
Useful resources for student training and orientationUseful resources for student training and orientation
Useful resources for student training and orientation
 
Doing something different staff development and workplace learning at Cardiff...
Doing something different staff development and workplace learning at Cardiff...Doing something different staff development and workplace learning at Cardiff...
Doing something different staff development and workplace learning at Cardiff...
 
Knowledge, skills and reskilling – where does the MSc fit in?
Knowledge, skills and reskilling – where does the MSc fit in?Knowledge, skills and reskilling – where does the MSc fit in?
Knowledge, skills and reskilling – where does the MSc fit in?
 
Start with the Staff
Start with the StaffStart with the Staff
Start with the Staff
 

Kürzlich hochgeladen

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 

Kürzlich hochgeladen (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

Building a Collection of the Historical UK Web for scholarly use

  • 1. Building a Collection of the Historical UK Web for scholarly use Helen Hockx-Yu Head of Web Archiving, British Library
  • 2. www.bl.uk 2 The UK Web Domain 4th TLD after .com, .de and .net Over 10 million .uk registered domain UK organisations also use non .uk domain names (eg .com or .org) – scale unknown Non-print Legal Deposit (since April 2013) applies to the open (freely available) web: .uk and other UK-published (non .uk) websites, such as .com, .org… also e-journals, e-books, news web pages and other digital publications, either by harvesting or mutual agreement on other delivery methods
  • 3. www.bl.uk 3 Web Archiving at the British Library Collect UK digital heritage and provide continued access to archived web resources Started web archiving in 2003: Open UK Web Archive Selective, topical collections and key sites Consortium sharing infrastructure and development effort; agreement on who collects what Curating collections with organisations and researchers Archiving UK Web for non-print Legal Deposit since April 2013: Legal Deposit UK Web Archive Comprehensive national archive with on-site access only Joint responsibility of six Legal Deposit Libraries (LDLs)
  • 4. www.bl.uk 4 Domain Crawl News Special collection Special collection Domain crawl: •Broad sweep of UK domain •Once or twice a year Events & key sites and news: •Events of UK interest •High value, high impact sites •National & regional news Special Collection: •Focused, thematic collections •Support priority subjects Key sites Events Special collection Special collection Collecting strategy for websites
  • 5. www.bl.uk 5 UK websites – territoriality explained An online work is considered as “published in the UK” and therefore in scope for Legal Deposit, if it meets either of the following criteria: (a) it is made available to the public from a website with a domain name which relates to the United Kingdom or to a place within the United Kingdom; or (b) it is made available to the public by a person and any of that person’s activities relating to the creation or the publication of the work take place within the United Kingdom The Legal Deposit Libraries (Non-Print Works) Regulations, 2013
  • 6. www.bl.uk 6 Territoriality - implementation All websites with a .uk domain name Including embedded content (eg CSS, images) regardless where it is hosted non .uk websites have to meet at least one criteria UK Hosting: check external IP geo-location database and add in-scope URLs to the fetch-chain UK postal address Correspondence Professional judgement
  • 7. www.bl.uk 7 UK Domain Crawl 2013 domain crawl stats 3.86 million seeds 1.9 billion URLs (web pages, docs, images) ~31TB Duration: 70days 2014 domain crawl 90 million seeds (starting URLs) Started on 19th June 2014 Collected 52TB of data (by 9th December (incl. 4.4GB of viruses & 3TB of homepage screenshots) Nearly 2 million non .uk domains
  • 8. www.bl.uk 8 The “access” paradoxes Completeness versus openness of web archives Legal Deposit national collections have restricted access Documents-centred versus data driven Essentially a scale issue Pre-selected or defined collections not relevant to all researchers; difficulty in finding relevant content in large scale web archive. Arbitrary (national) boundaries often irrelevant to research question but most heritage institutions operation within certain geographical areas …
  • 9. www.bl.uk 9 9 Web archive as historical document
  • 10. www.bl.uk 10 Collaboration with researchers Building collections Researchers’ involvement in scoping collections, selecting and describing websites Creation of specific, (narrow) topical collections Formulating research question Brain-storm sessions, workshops, discussion, surveys etc. Lack of awareness & baseline knowledge Challenging: you don’t know what you don’t know Co-development of access services This is changing how we collect and store data
  • 11. www.bl.uk 11 JISC UK Web Domain dataset (1996-2013) Collaboration between the Internet Archive (IA), the Joint Information Systems Committee (JISC) and the British Library Extracted copies of UK websites from the Internet Archives collection 1st tranche : 1996 – 2010, 30TB, 2.5 billion URLs 2nd tranche: 2010 – April 2013, 27.5TB, 1.5 billion URLs (estimated) Research agreement between JISC and IA, upholding IA’s Terms of Use Access via IA’s Wayback Machine Allows replication / extraction of derivative or secondary datasets BL hosts the dataset on behalf of JISC Data used by research projects Institute of Historical Research project: Analytical Access to the Domain Dark Archive (AADDA) Oxford Internet Institute project: Big data for political science
  • 12. www.bl.uk 12 Completed work Analytical Access to the Domain Dark Archive Project Use cases & experimental UI Demonstrating the Value of the UK Web Domain Dataset for Social Science Research  Analysis of link graph Paper accepted for WebSci’14: Mapping the UK Webspace: Fifteen Years of British Universities on the Web MA thesis by Jules Mataly: The Three Truths of Margaret Thatcher: Creating and Analysing Secondary datasets under open licence Format profile, Geoindex, Host Link Graph
  • 13. www.bl.uk 13 Exploring Host Link Graph Courtesy of Peter Webster, Rainer Simon and Jules Mataly
  • 14. www.bl.uk 14 Visualising links (to and from bl.uk) Interactive version How it is done
  • 15. www.bl.uk 15 Visualising links (to and from bl.uk) Interactive version How it is done
  • 16. www.bl.uk 16 Evolution of the UK web (2004 -2013)
  • 18. www.bl.uk 18 Big UK Domain Data for Arts and Humanities Funded by the UK Arts and Humanities Research Council as one of the 21 “Big Data” projects Collaboration between the Institution of Historical Research, Oxford Internet Institute, British Library and Aarhus University Develop theoretical and methodological framework for the study of web archives Build on ADDAA: researchers and the BL co-produce access tools  A major study of the history of UK web space from 1996 to 2013 + sub-projects covering a range of disciplines  Also an online training course and peer-reviewed journal articles.
  • 19. www.bl.uk 19 Web archiving researcher bursaries
  • 20. www.bl.uk 20 Query building Corpus formation and handling Annotation and curation In-corpus analysis Whole-dataset analysis Shine
  • 21. www.bl.uk 21 What’s in it for us? Helps researchers understand the value of web archives and explore new ways of using these for scholarly research Allows BL to obtain hands-on experience with indexing and processing large scale web archive datasets (Prototypes) analytics and visualisations can be applied to our own Legal Deposit collection Enables BL to participate in various UK, European and international projects Helps curators understand characteristics of large scale digital corpora Improve the way we collet and store web archive
  • 22. www.bl.uk 22 Web archives for reference AND for analytics Base-line knowledge self-explanatory Focus on national events for curated collections; provide means to assemble research corpora Link to what we do not have Offer a bag of tools to support scholarly use The go-to state Exploit open licences, changes to copyright law Online access to selected websites, metadata and secondary datasets The British Library Collection Development Policy for websites Lobbying – review of Non-print Legal Deposit Regulations in 2018