Our historical record is increasingly digital and more of it is being preserved by web archives. As the holdings of web archives grow, archivists are faced with the prospect of subdividing them into collections so they are easier to understand and manage. How they structure collections, however, is not constrained to a single solution. In this work, we review the collection structures of the popular web archive platforms at Archive-It, the National Library of Australia (NLA), the Croatian Web Archive (HAW), the Library of Congress web archive (LC), the United Kingdom Web Archive (UKWA), and Conifer. We note a plethora of different approaches to web archive collection management structure. Some web archive collections support sub-collections and some permit embargoes. Curatorial decisions may be attributed to a single organization or many. Archived web pages are known by many names: mementos, copies, captures, or snapshots. Some platforms restrict a memento to a single collection and some platforms allow a memento to be a member of many collections. Knowledge of these collection structures has implications for many efforts. Visitors will need this knowledge to understand how to navigate through collections. Future archivists will need it to understand what options are available before designing their collections. Platform designers will need it to know what possibilities exist. Tools exist to consume web archive collections, and the developers of these tools also need to understand these collection structures so they can meet the needs of their users. Thus, with this analysis, we provide important information for archivists, visitors, and developers alike.
Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists
1. Creating Structure in Web Archives With Collections:
Different Concepts From Web Archivists
Presented By:
Himarsha R. Jayanetti
Department of Computer Science
Old Dominion University, Norfolk, Virginia
@HimarshaJ @WebSciDL @oducs
TPDL ‘22, The 26th International Conference on Theory and Practice of Digital Libraries, Padua, Italy, 20 - 23 September 2022
Himarsha R. Jayanetti, Shawn M. Jones, Martin Klein, Alex Osbourne, Paul Koerbin, Michael L. Nelson, and Michele C. Weigle
2. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Web Archives Preserve the Content of Web Pages as They Were at a
Specific Point in Time
Archived web pages, or mementos, are increasingly used by researchers, including journalists, social scientist, and historians.
2
A screenshot of the https://oduwsdl.github.io/ live web page
URI-R: Original
Resource URI-M:
Memento
A screenshot of the https://oduwsdl.github.io/ web page archived on
2020-11-18T23:04:53Z (Memento-Datetime)
https://web.archive.org/web/20201118230453/https://oduwsdl.github.io/
URI-T:
TimeMap
https://web.archive.org/web/
*/https://oduwsdl.github.io/
3. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
The Term “Web Archives” Mean the Same Thing Among
These Eight Web Archive Platforms
3
“Old Dominion University Social Media” Collection
at Archive-It
Collections at the Internet Archive
Webpage Snapshots
Captures
Archived copies
4. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
But There are Different Terms Used for “Memento”
Among Web Archives
4
Captures
Archived Copies
Webpage Snapshots
Memento
“Old Dominion University Social Media” Collection
at Archive-It
Collections at the Internet Archive
5. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
The Term “Collection” Has Slightly Different Meanings
Among Web Archives
5
Captures
Archived Copies
Webpage Snapshots
Memento
“Old Dominion University Social Media” Collection
at Archive-It
Collections at the Internet Archive
6. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
We Differentiated Collections From the Greater Web Archive
(Web Archive as a Whole)
6
Greater Web
Archive: Pandora
Collection: Indigenous
Australians
https://pandora.nla.gov.au/subject/12
https://pandora.nla.gov.au/
7. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Some Web Archive Collections Contain Sub-Collections
7
7
No Sub-collections
Has Sub-collections
* In conifer: lists acts
as sub-collections.
*
https://archive-it.org/collections/2697
https://webarchive.nla.gov.au/collection/15003
Archive-It
Trove
8. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Curated By (Attribution):
Single Entity, Different Organizational Collaborators, or the Greater Web Archive
8
8
Greater Web Archive
Single account
Organizational collaborators https://webarchive.nla.gov.au/collection/13842
https://archive-it.org/collections/7635
9. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Some Web Archive Collections Support Private Collections
9
No Private Collections
Support for Private Collections
https://support.archive-it.org/hc/en-us/articles/208334003-Controlling-access-to-your-web-archives-
https://conifer.rhizome.org/himarshaj
10. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Some Platforms Embargo Some of Their Resources
10
Do not embargo resources
Embargo resources
https://www.webarchive.org.uk/en/ukwa/collection/3942
https://www.loc.gov/item/lcwaN0006607/
11. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchies: How a Visitor or Crawler Navigates Each
Collection for Information
Type 1 Type 2
11
Type 1 collections are original resource focused. Type 2 collections are archived resource (memento) focused.
12. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: Archive-It
Navigational Hierarchy Collection Landing Page
12
Type 1
13. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: Library of Congress (LC)
Navigational Hierarchy Collection Items Page
13
Type 1
14. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: Croatian Web Archive (HAW)
Navigational Hierarchy Subcategory Landing Page
14
Type 1
15. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: Conifer
Navigational Hierarchy Collection Landing Page
15
Type 2
16. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: United Kingdom Web Archive (UKWA)
Navigational Hierarchy Collection Landing Page
16
Type 2
17. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: National Library of Australia’s (NLA) Trove
Navigational Hierarchy
Collection Landing Page
TEP Page
17
Type 2
18. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: National Library of Australia’s (NLA) PANDORA
Navigational Hierarchy Collection Landing Page
18
Type 1
19. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: National Library of Australia’s (NLA) PANDORA and Trove
19
20. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Navigational Hierarchy: Internet Archive (IA)
Navigational Hierarchy
Collection Landing Page
20
Type 2
Internet Archive’s (IA) user account web archives.
https://archive.org/details/@shawnmjones?tab=web-archive
21. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Key Takeaways
● As web archives grow archivists create collections to make their web archives simpler
to comprehend and handle.
● Similarities among these collection structures:
○ Account-centric & General web archives.
○ Restrict a memento to a single collection or share mementos between
collections.
○ Attribute curation to a single entity or different organizational collaborators.
○ Most offer sub-collections & some offer embargo resources.
● Two types of navigational hierarchies:
○ Type 1: an original resource supports the collection’s theme.
○ Type 2: a memento supports the collection’s theme.
● We explored existing platforms rather than making recommendations on how a web
archive collection should be created.
21
Himarsha R. Jayanetti
hjaya002@odu.edu
@HimarshaJ
Technical Report at arXiv
22. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Backup slides …
22
23. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Different Web Archive Platform Have Different Names for Mementos
23
Webpage snapshots
Captures
Archived copies
https://haw.nsk.hr/en/publikacija/4818/
https://webarchive.nla.gov.au/collection/11676
https://www.loc.gov/item/lcwaN0023449/
24. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Human-Readable TimeMaps (URI-Ts) Are Rendered
as a List or Calendar With Links to Each URI-M
List View
Calendar View
24
25. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Trove’s Machine-readable TimeMap:
https://webarchive.nla.gov.au/bamboo-service/tep/{TEP_ID}
25
JSON Viewer
https://webarchive.nla.gov.au/bamboo-service/tep/33161
26. Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists, TPDL ‘22 Padua, Italy. @HimarshaJ @WebSciDL
Details on Different Web Archive Platform Collection Structures
26