SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
A survey of web-based art resources with
findings applicable to FARL electronic records
collection development
Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini
Frick Art Reference Library
Deborah Kempe, Chief, Collections Management & Access
Web Survey and Collection Development
Coffee on the terrace
M-LEAD-TWO
Intern enterprises -
"collection assessments, digital resource surveys,
web archiving, provide support for important
consortial programs such as shared resources"
● Brooklyn Museum: Mark Daly, Ronnette Hope,
Project Manager: Emily Atwater
● NYARC Latin American Resources (MOMA):
Ralph Baylor
● FARL: Gretchen Nadasky, Alison Rhonemus
Frick Art Reference Library
In early 2011, the Frick Art Reference Library
and the Thomas J. Watson Library at The
Metropolitan Museum of Art completed a pilot
project to address coordinated collecting of
born-digital auction catalogs using ContentDM
and Archive-It.
FARL web archiving program is situated in Collection Development.
Current plans for website capture include online auction catalogs and art web resources
cataloged by NYARC.
Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction
catalogs.
My project focused on NYARC cataloged websites.
Web Archiving
"The Internet Archive is already doing it.”
Actually, the IA is providing the tools for
other institutions to use in archiving.
ARCHIVE - IT
uses open source tools developed by the
Internet Archive
● Heritrix Web Crawler
● Wayback Interface
● WARC format, an ISO standard
the report and manual checks
Partner and WAYBACK interface
Quality Assurance
• Password protected sites – can not be archived
• Javascript – more complicated implementation
can be difficult to capture and display. Ongoing
area of development.
• Videos -- difficulty with some proprietary formats
• Form and Database driven content --‐ may be
archived using a sitemap or other direct links to the
content.
Evaluating seeds
Robots.txt Blocks
The crawler by default respects all robots.txt files. Check
post--‐crawl reports for blocked seeds or documents
If your site is blocked:
a) Contact the site owner and ask if they will un--‐block
b) Ask your Partner Specialist to turn on “ignore robots”
feature in your account
Notes:
/ denotes single directory seed
subdomains.archive.org (add individually or expand seed)
Site Survey Criteria
● html/flash/pdf
● images
● embedded material
● links
● directories and subdomains
● terms, rights statements and permissions
Obvious ruse
More of the obvious
Sites created without the intention of
being archived are the sites in need of
archiving.
Survey Says
● 257 cataloged entries
● 168 resources are possible to capture
● 82 resources would require more research or
display definite red flags for web archiving.
● PDFs are available for at least some of the
content in 75 resources.
● Flash was an element in 23 resources
● 16 sites used HTML5
● 54 used a CMS like Drupal or WordPress
There were 3 cataloged resources no longer
available on the live web but viewable through
Internet Archive.
Another 2 defunct resources were not available
through Internet Archive.
The main page for one of these lost resources was
available as a snapshot in WAYBACK but the actual
cataloged resource was not available.
Change is Constant
Archive-It Updates:
● Heritrix 1 series to Heritrix 3 series
(February)
● Archive-It 4.8
(May)
Archive-It 4.8
Plans
● Upcoming grants
● Capture of NYARC institution websites
● Include Wayback interface links in
Arcade catalog records
● Continue to identify websites for
capture and implement capture
Conclusions
○ Digital resources not prevalent enough to
reassign current staff
○ Website capture most costly in terms of staff time
○ Copyright continues to be an issue
○ Long term digital preservation needs yet to be
assessed
○ Capture of Frick Collection sites and NYARC will
pose as a challenging test case

Weitere ähnliche Inhalte

Andere mochten auch (7)

Portfolio of mierza miranti
Portfolio of mierza mirantiPortfolio of mierza miranti
Portfolio of mierza miranti
 
Ch05 6
Ch05 6Ch05 6
Ch05 6
 
Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)
 
Receitas 6ªC
Receitas   6ªCReceitas   6ªC
Receitas 6ªC
 
IDC Archiving
IDC ArchivingIDC Archiving
IDC Archiving
 
Ds 02 015
Ds 02 015Ds 02 015
Ds 02 015
 
Sonasoft email archiving
Sonasoft email archivingSonasoft email archiving
Sonasoft email archiving
 

Ähnlich wie Farl web archiving

Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
Samantha Norling
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
Essam Obaid
 
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
Liber2012
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
charper
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
Tola Odugbesan
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
The Frick Collection
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 

Ähnlich wie Farl web archiving (20)

Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive Awards
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congress
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
The Commons and Digital Humanities
The Commons and Digital HumanitiesThe Commons and Digital Humanities
The Commons and Digital Humanities
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Spotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resourcesSpotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resources
 
IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Farl web archiving

  • 1. A survey of web-based art resources with findings applicable to FARL electronic records collection development Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini Frick Art Reference Library Deborah Kempe, Chief, Collections Management & Access Web Survey and Collection Development Coffee on the terrace
  • 2. M-LEAD-TWO Intern enterprises - "collection assessments, digital resource surveys, web archiving, provide support for important consortial programs such as shared resources" ● Brooklyn Museum: Mark Daly, Ronnette Hope, Project Manager: Emily Atwater ● NYARC Latin American Resources (MOMA): Ralph Baylor ● FARL: Gretchen Nadasky, Alison Rhonemus
  • 3. Frick Art Reference Library In early 2011, the Frick Art Reference Library and the Thomas J. Watson Library at The Metropolitan Museum of Art completed a pilot project to address coordinated collecting of born-digital auction catalogs using ContentDM and Archive-It.
  • 4. FARL web archiving program is situated in Collection Development. Current plans for website capture include online auction catalogs and art web resources cataloged by NYARC. Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction catalogs. My project focused on NYARC cataloged websites.
  • 5. Web Archiving "The Internet Archive is already doing it.” Actually, the IA is providing the tools for other institutions to use in archiving.
  • 6. ARCHIVE - IT uses open source tools developed by the Internet Archive ● Heritrix Web Crawler ● Wayback Interface ● WARC format, an ISO standard
  • 7.
  • 8. the report and manual checks Partner and WAYBACK interface Quality Assurance
  • 9. • Password protected sites – can not be archived • Javascript – more complicated implementation can be difficult to capture and display. Ongoing area of development. • Videos -- difficulty with some proprietary formats • Form and Database driven content --‐ may be archived using a sitemap or other direct links to the content. Evaluating seeds
  • 10. Robots.txt Blocks The crawler by default respects all robots.txt files. Check post--‐crawl reports for blocked seeds or documents If your site is blocked: a) Contact the site owner and ask if they will un--‐block b) Ask your Partner Specialist to turn on “ignore robots” feature in your account Notes: / denotes single directory seed subdomains.archive.org (add individually or expand seed)
  • 11. Site Survey Criteria ● html/flash/pdf ● images ● embedded material ● links ● directories and subdomains ● terms, rights statements and permissions
  • 13. More of the obvious Sites created without the intention of being archived are the sites in need of archiving.
  • 14. Survey Says ● 257 cataloged entries ● 168 resources are possible to capture ● 82 resources would require more research or display definite red flags for web archiving. ● PDFs are available for at least some of the content in 75 resources. ● Flash was an element in 23 resources ● 16 sites used HTML5 ● 54 used a CMS like Drupal or WordPress
  • 15. There were 3 cataloged resources no longer available on the live web but viewable through Internet Archive. Another 2 defunct resources were not available through Internet Archive. The main page for one of these lost resources was available as a snapshot in WAYBACK but the actual cataloged resource was not available.
  • 16.
  • 17.
  • 18.
  • 19. Change is Constant Archive-It Updates: ● Heritrix 1 series to Heritrix 3 series (February) ● Archive-It 4.8 (May)
  • 21. Plans ● Upcoming grants ● Capture of NYARC institution websites ● Include Wayback interface links in Arcade catalog records ● Continue to identify websites for capture and implement capture
  • 22. Conclusions ○ Digital resources not prevalent enough to reassign current staff ○ Website capture most costly in terms of staff time ○ Copyright continues to be an issue ○ Long term digital preservation needs yet to be assessed ○ Capture of Frick Collection sites and NYARC will pose as a challenging test case