SlideShare a Scribd company logo
1 of 22
Download to read offline
A survey of web-based art resources with
findings applicable to FARL electronic records
collection development
Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini
Frick Art Reference Library
Deborah Kempe, Chief, Collections Management & Access
Web Survey and Collection Development
Coffee on the terrace
M-LEAD-TWO
Intern enterprises -
"collection assessments, digital resource surveys,
web archiving, provide support for important
consortial programs such as shared resources"
● Brooklyn Museum: Mark Daly, Ronnette Hope,
Project Manager: Emily Atwater
● NYARC Latin American Resources (MOMA):
Ralph Baylor
● FARL: Gretchen Nadasky, Alison Rhonemus
Frick Art Reference Library
In early 2011, the Frick Art Reference Library
and the Thomas J. Watson Library at The
Metropolitan Museum of Art completed a pilot
project to address coordinated collecting of
born-digital auction catalogs using ContentDM
and Archive-It.
FARL web archiving program is situated in Collection Development.
Current plans for website capture include online auction catalogs and art web resources
cataloged by NYARC.
Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction
catalogs.
My project focused on NYARC cataloged websites.
Web Archiving
"The Internet Archive is already doing it.”
Actually, the IA is providing the tools for
other institutions to use in archiving.
ARCHIVE - IT
uses open source tools developed by the
Internet Archive
● Heritrix Web Crawler
● Wayback Interface
● WARC format, an ISO standard
the report and manual checks
Partner and WAYBACK interface
Quality Assurance
• Password protected sites – can not be archived
• Javascript – more complicated implementation
can be difficult to capture and display. Ongoing
area of development.
• Videos -- difficulty with some proprietary formats
• Form and Database driven content --‐ may be
archived using a sitemap or other direct links to the
content.
Evaluating seeds
Robots.txt Blocks
The crawler by default respects all robots.txt files. Check
post--‐crawl reports for blocked seeds or documents
If your site is blocked:
a) Contact the site owner and ask if they will un--‐block
b) Ask your Partner Specialist to turn on “ignore robots”
feature in your account
Notes:
/ denotes single directory seed
subdomains.archive.org (add individually or expand seed)
Site Survey Criteria
● html/flash/pdf
● images
● embedded material
● links
● directories and subdomains
● terms, rights statements and permissions
Obvious ruse
More of the obvious
Sites created without the intention of
being archived are the sites in need of
archiving.
Survey Says
● 257 cataloged entries
● 168 resources are possible to capture
● 82 resources would require more research or
display definite red flags for web archiving.
● PDFs are available for at least some of the
content in 75 resources.
● Flash was an element in 23 resources
● 16 sites used HTML5
● 54 used a CMS like Drupal or WordPress
There were 3 cataloged resources no longer
available on the live web but viewable through
Internet Archive.
Another 2 defunct resources were not available
through Internet Archive.
The main page for one of these lost resources was
available as a snapshot in WAYBACK but the actual
cataloged resource was not available.
Change is Constant
Archive-It Updates:
● Heritrix 1 series to Heritrix 3 series
(February)
● Archive-It 4.8
(May)
Archive-It 4.8
Plans
● Upcoming grants
● Capture of NYARC institution websites
● Include Wayback interface links in
Arcade catalog records
● Continue to identify websites for
capture and implement capture
Conclusions
○ Digital resources not prevalent enough to
reassign current staff
○ Website capture most costly in terms of staff time
○ Copyright continues to be an issue
○ Long term digital preservation needs yet to be
assessed
○ Capture of Frick Collection sites and NYARC will
pose as a challenging test case

More Related Content

Viewers also liked (7)

Portfolio of mierza miranti
Portfolio of mierza mirantiPortfolio of mierza miranti
Portfolio of mierza miranti
 
Ch05 6
Ch05 6Ch05 6
Ch05 6
 
Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)
 
Receitas 6ªC
Receitas   6ªCReceitas   6ªC
Receitas 6ªC
 
IDC Archiving
IDC ArchivingIDC Archiving
IDC Archiving
 
Ds 02 015
Ds 02 015Ds 02 015
Ds 02 015
 
Sonasoft email archiving
Sonasoft email archivingSonasoft email archiving
Sonasoft email archiving
 

Similar to Farl web archiving

Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
Samantha Norling
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
Essam Obaid
 
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
Liber2012
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
charper
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
Tola Odugbesan
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
The Frick Collection
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 

Similar to Farl web archiving (20)

Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive Awards
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congress
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
The Commons and Digital Humanities
The Commons and Digital HumanitiesThe Commons and Digital Humanities
The Commons and Digital Humanities
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Spotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resourcesSpotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resources
 
IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019
 

Recently uploaded

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 

Farl web archiving

  • 1. A survey of web-based art resources with findings applicable to FARL electronic records collection development Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini Frick Art Reference Library Deborah Kempe, Chief, Collections Management & Access Web Survey and Collection Development Coffee on the terrace
  • 2. M-LEAD-TWO Intern enterprises - "collection assessments, digital resource surveys, web archiving, provide support for important consortial programs such as shared resources" ● Brooklyn Museum: Mark Daly, Ronnette Hope, Project Manager: Emily Atwater ● NYARC Latin American Resources (MOMA): Ralph Baylor ● FARL: Gretchen Nadasky, Alison Rhonemus
  • 3. Frick Art Reference Library In early 2011, the Frick Art Reference Library and the Thomas J. Watson Library at The Metropolitan Museum of Art completed a pilot project to address coordinated collecting of born-digital auction catalogs using ContentDM and Archive-It.
  • 4. FARL web archiving program is situated in Collection Development. Current plans for website capture include online auction catalogs and art web resources cataloged by NYARC. Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction catalogs. My project focused on NYARC cataloged websites.
  • 5. Web Archiving "The Internet Archive is already doing it.” Actually, the IA is providing the tools for other institutions to use in archiving.
  • 6. ARCHIVE - IT uses open source tools developed by the Internet Archive ● Heritrix Web Crawler ● Wayback Interface ● WARC format, an ISO standard
  • 7.
  • 8. the report and manual checks Partner and WAYBACK interface Quality Assurance
  • 9. • Password protected sites – can not be archived • Javascript – more complicated implementation can be difficult to capture and display. Ongoing area of development. • Videos -- difficulty with some proprietary formats • Form and Database driven content --‐ may be archived using a sitemap or other direct links to the content. Evaluating seeds
  • 10. Robots.txt Blocks The crawler by default respects all robots.txt files. Check post--‐crawl reports for blocked seeds or documents If your site is blocked: a) Contact the site owner and ask if they will un--‐block b) Ask your Partner Specialist to turn on “ignore robots” feature in your account Notes: / denotes single directory seed subdomains.archive.org (add individually or expand seed)
  • 11. Site Survey Criteria ● html/flash/pdf ● images ● embedded material ● links ● directories and subdomains ● terms, rights statements and permissions
  • 13. More of the obvious Sites created without the intention of being archived are the sites in need of archiving.
  • 14. Survey Says ● 257 cataloged entries ● 168 resources are possible to capture ● 82 resources would require more research or display definite red flags for web archiving. ● PDFs are available for at least some of the content in 75 resources. ● Flash was an element in 23 resources ● 16 sites used HTML5 ● 54 used a CMS like Drupal or WordPress
  • 15. There were 3 cataloged resources no longer available on the live web but viewable through Internet Archive. Another 2 defunct resources were not available through Internet Archive. The main page for one of these lost resources was available as a snapshot in WAYBACK but the actual cataloged resource was not available.
  • 16.
  • 17.
  • 18.
  • 19. Change is Constant Archive-It Updates: ● Heritrix 1 series to Heritrix 3 series (February) ● Archive-It 4.8 (May)
  • 21. Plans ● Upcoming grants ● Capture of NYARC institution websites ● Include Wayback interface links in Arcade catalog records ● Continue to identify websites for capture and implement capture
  • 22. Conclusions ○ Digital resources not prevalent enough to reassign current staff ○ Website capture most costly in terms of staff time ○ Copyright continues to be an issue ○ Long term digital preservation needs yet to be assessed ○ Capture of Frick Collection sites and NYARC will pose as a challenging test case