SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Web Archiving:
Description and Access
Lily Pregill
NYARC Coordinator & Systems Manager
Metropolitan New York Library Council
Web Archiving Series, Part 3
February 29, 2016
Chocolate + peanut butter approach
Descriptive metadata + full-text indexing are both essential
to drive discovery and retrieval of web archives
What is NYARC?
2009
2010
2006
2012
2015
2013
Brooklyn Museum + The Frick Collection + MoMA
New York Art Resources Consortium (NYARC) formed
Launched Arcade, shared Millennium ILS
Archive-It and Auction Catalogs Pilot Project
Mellon Grant: Reframing Collection for a Digital Age
Mellon Grant: Making the Black Hole Gray
Launched NYARC Discovery
Archive-It
Thematic Collections
Art Resources
Artists’ Websites
Auction Houses
Catalogues Raisonnés
NYC Galleries
Restitution of Lost or Looted Art
Institution-based Collections
Brooklyn Museum
The Frick Collection
MoMA
NYARC
10 collections > 250 websites + growing…
http://nyarc.org/webarchive
Accessing Web Archives
URL driven search
Multiple levels of search
Combined full-text and DC metadata search on collection page
NYARC Discovery
Arcade, NYARC’s classic catalog
http://arcade.nyarc.org
Archive-It
http://nyarc.org/webarchive
NYARC Discovery
http://discovery.nyarc.org
NYARC Discovery
Info icon hover text:
NYARC Discovery
NYARC Discovery: surfacing uncataloged content
Search: maya angelou bearden train
NYARC Discovery: discover local blog posts
Metadata Profile
http://www.nyarc.org/sites/default/files/web-archiving-profile.pdf
583 ##
ǂa capture
ǂc [date captured]
ǂh New York Art Resources
Consortium
ǂ5 NyNyARC
ǂ2 pet
[code for PREMIS event type]
Developed by Rebecca Gunther
Metadata Workflow
• Connexion: Begin cataloging in Connexion
• Use Extract Metadata tool
• Apply Local Constant Data built off the metadata profile
• Upload to WorldCat
• Export to local Millennium system (Arcade)
• Millennium records ingested by Primo/NYARC Discovery weekly
Metadata Workflow: Constant Data Example
m o d
007 c ǂb r ǂd c ǂe n
040 FXM ǂb eng ǂe rda ǂc FXM
049 FXMA
300 1 online resource : ǂb color illustrations
336 text ǂb txt ǂ2 rdacontent
336 still image ǂb sti ǂ2 rdacontent
337 computer ǂb c ǂ2 rdamedia
338 online resource ǂb cr ǂ2 rdacarrier
520 Summary
583 capture ǂc year ǂh New York Art Resources Consortium ǂ2 pet ǂ5 NyNyARC
588 Description of the resource based on live site viewed on Month, Day, Year,
and archived site; title from home page.
655 7Web sites. ǂ2 aat
85640ǂz Live site
85640ǂu ǂz Archived site
Metadata: WorldCat
Metadata: Arcade (local catalog)
Metadata: NYARC Discovery
Where can I learn more?
Archive-It
• OpenSearch API
https://webarchive.jira.com/wiki/display/search/OpenSearch+API
• Metadata in Archive-It
https://webarchive.jira.com/wiki/display/ARIH/Metadata+in+Archive-It
NYARC Web Archiving Reports
• Archive-It and Online Auction Catalogs (2010)
http://www.nyarc.org/sites/default/files/ait_leahy_report.pdf
• Reframing Collections for a Digital Age: Final Report (2013)
http://www.nyarc.org/sites/default/files/reports/reframing_final_report2013.pdf
• Making the Black Hole Gray: Final Report (2016)
http://www.nyarc.org/sites/default/files/making_the_black_hole_gray_final_report.pdf
NYARC Documentation
• Metadata Application Profile
http://www.nyarc.org/sites/default/files/web-archiving-profile.pdf
• Metadata for Web Archived Resources: Recommendations for Further Exploration
http://www.nyarc.org/sites/default/files/Recommendations%20for%20further%20exploration-final.pdf
• NYARC Wiki
http://wiki.nyarc.org
Website coming soon ….. OCLC Research Partners Web Archiving Metadata Working Group
Thank you!
Lily Pregill
pregill@frick.org
@technelily

Weitere ähnliche Inhalte

Ähnlich wie Web Archiving: Description and Access

Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
Lewis Crawford
 
Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013
Digital History
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Micah Altman
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
Samantha Norling
 

Ähnlich wie Web Archiving: Description and Access (20)

Brief Overview of the New York Art Resources consrtium's Web Archiving Program
Brief Overview of the New York Art Resources consrtium's Web Archiving ProgramBrief Overview of the New York Art Resources consrtium's Web Archiving Program
Brief Overview of the New York Art Resources consrtium's Web Archiving Program
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27
 
Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013Peter Webster - Digital History - 11 June 2013
Peter Webster - Digital History - 11 June 2013
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Linked Data - Exposing what we have
Linked Data - Exposing what we haveLinked Data - Exposing what we have
Linked Data - Exposing what we have
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Metadata - Linked Data
Metadata - Linked DataMetadata - Linked Data
Metadata - Linked Data
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
 
Tanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseTanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection Database
 
From Record to Graph
From Record to GraphFrom Record to Graph
From Record to Graph
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
Next Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounNext Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 Calhoun
 
Integrating volunteers and Experts
Integrating volunteers and ExpertsIntegrating volunteers and Experts
Integrating volunteers and Experts
 
Integrating archaeological data: The ARIADNE Infrastructure, Achille Felicett...
Integrating archaeological data: The ARIADNE Infrastructure, Achille Felicett...Integrating archaeological data: The ARIADNE Infrastructure, Achille Felicett...
Integrating archaeological data: The ARIADNE Infrastructure, Achille Felicett...
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
Using Linked Data: American Art Collaborative, Oct. 3, 2016
Using Linked Data:  American Art Collaborative, Oct. 3, 2016Using Linked Data:  American Art Collaborative, Oct. 3, 2016
Using Linked Data: American Art Collaborative, Oct. 3, 2016
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Web Archiving: Description and Access

  • 1. Web Archiving: Description and Access Lily Pregill NYARC Coordinator & Systems Manager Metropolitan New York Library Council Web Archiving Series, Part 3 February 29, 2016
  • 2. Chocolate + peanut butter approach Descriptive metadata + full-text indexing are both essential to drive discovery and retrieval of web archives
  • 3. What is NYARC? 2009 2010 2006 2012 2015 2013 Brooklyn Museum + The Frick Collection + MoMA New York Art Resources Consortium (NYARC) formed Launched Arcade, shared Millennium ILS Archive-It and Auction Catalogs Pilot Project Mellon Grant: Reframing Collection for a Digital Age Mellon Grant: Making the Black Hole Gray Launched NYARC Discovery
  • 4. Archive-It Thematic Collections Art Resources Artists’ Websites Auction Houses Catalogues Raisonnés NYC Galleries Restitution of Lost or Looted Art Institution-based Collections Brooklyn Museum The Frick Collection MoMA NYARC 10 collections > 250 websites + growing… http://nyarc.org/webarchive
  • 5. Accessing Web Archives URL driven search Multiple levels of search Combined full-text and DC metadata search on collection page
  • 6. NYARC Discovery Arcade, NYARC’s classic catalog http://arcade.nyarc.org Archive-It http://nyarc.org/webarchive NYARC Discovery http://discovery.nyarc.org
  • 9. NYARC Discovery: surfacing uncataloged content Search: maya angelou bearden train
  • 10. NYARC Discovery: discover local blog posts
  • 11. Metadata Profile http://www.nyarc.org/sites/default/files/web-archiving-profile.pdf 583 ## ǂa capture ǂc [date captured] ǂh New York Art Resources Consortium ǂ5 NyNyARC ǂ2 pet [code for PREMIS event type] Developed by Rebecca Gunther
  • 12. Metadata Workflow • Connexion: Begin cataloging in Connexion • Use Extract Metadata tool • Apply Local Constant Data built off the metadata profile • Upload to WorldCat • Export to local Millennium system (Arcade) • Millennium records ingested by Primo/NYARC Discovery weekly
  • 13. Metadata Workflow: Constant Data Example m o d 007 c ǂb r ǂd c ǂe n 040 FXM ǂb eng ǂe rda ǂc FXM 049 FXMA 300 1 online resource : ǂb color illustrations 336 text ǂb txt ǂ2 rdacontent 336 still image ǂb sti ǂ2 rdacontent 337 computer ǂb c ǂ2 rdamedia 338 online resource ǂb cr ǂ2 rdacarrier 520 Summary 583 capture ǂc year ǂh New York Art Resources Consortium ǂ2 pet ǂ5 NyNyARC 588 Description of the resource based on live site viewed on Month, Day, Year, and archived site; title from home page. 655 7Web sites. ǂ2 aat 85640ǂz Live site 85640ǂu ǂz Archived site
  • 17. Where can I learn more? Archive-It • OpenSearch API https://webarchive.jira.com/wiki/display/search/OpenSearch+API • Metadata in Archive-It https://webarchive.jira.com/wiki/display/ARIH/Metadata+in+Archive-It NYARC Web Archiving Reports • Archive-It and Online Auction Catalogs (2010) http://www.nyarc.org/sites/default/files/ait_leahy_report.pdf • Reframing Collections for a Digital Age: Final Report (2013) http://www.nyarc.org/sites/default/files/reports/reframing_final_report2013.pdf • Making the Black Hole Gray: Final Report (2016) http://www.nyarc.org/sites/default/files/making_the_black_hole_gray_final_report.pdf NYARC Documentation • Metadata Application Profile http://www.nyarc.org/sites/default/files/web-archiving-profile.pdf • Metadata for Web Archived Resources: Recommendations for Further Exploration http://www.nyarc.org/sites/default/files/Recommendations%20for%20further%20exploration-final.pdf • NYARC Wiki http://wiki.nyarc.org Website coming soon ….. OCLC Research Partners Web Archiving Metadata Working Group