SlideShare a Scribd company logo
1 of 37
How collaboration can save [more of] the
web: recent progress in collaborative web
archiving initiatives

Anna Perricci
Columbia University Libraries
METRO Conference 2014
January 15, 2014
Overview
• Web archiving context (Who)
• Benefits of curated web archives/use cases (Why)

• Columbia University Libraries Web Resources Collection
Program (What/How)
• Background
• Selection
• Permissions
• Harvesting
• Description
• Access
Andrew W. Mellon Foundation
support for CUL web archiving
Grant projects

•

•
•

Collection Building for Web Resources (2008-2009)
1 FTE: project librarian

Web Resources Collection Program Development (20092012)
3 FTE: 2 web curators, 1 programmer
Web Resources Archiving Collaboration (2013-2015)
2 FTE: 1 project librarian, 1 bibliographic asst
CUL Web Archive Collections
•

Avery Library Historic Preservation and Urban Planning
97 websites, semiannual

•

Burke Library New York City Religions
225 websites, semiannual

•

Human Rights
591 websites, quarterly

•

Rare Book and Manuscript Library
54 websites, semiannual

•

University Archives
most of columbia.edu domain, plus 75+ affiliated sites
with "external" URLs

•

General
CUL’s Archive-It space: http://www.archive-it.org/organizations/304
Elements of CUL’s web archiving workflows
•
•
•
•
•

Selection or nomination
Archive-It for scoping, crawls, access, storage & QA
Cataloging workflows/output
Firm policy on permissions
Project management tools for growing number of
collaborations
• Workflows can vary a little between collections
Archive-It provides some essential services so CUL
can focus on curation, cataloging and collaboration building
Archive-It’s responsiveness to CUL’s requests

• Metadata: custom field functionality added to
metadata template & all fields made repeatable
• Pilot projects: options related to ignoring robots.txt,
downloading WARC files and export metadata as
XML, improvements to capture for
architecture/urban planning sites via Wayback QA
tools
Challenges: media files & images (using QA tools)
Description and access for archived websites
• Archive-it.org site-level metadata (All thematic
collections, DCMI, copied from MARC records if
possible)
• CLIO collection-level MARC records (Human
rights, Avery, Burke)
• CLIO site-level MARC records (Human rights, Avery)
• Document-level MARC records (selected longish
Avery collection reports, pre-existing IRCR records)
• Human Rights Web Archive portal on CUL website
(using metadata extracted from MARC records)
CLIO public view of MARC collection-level record
Permissions
There is no explicit US Copyright Act libraries exception for web
archiving
CUL policy is to request permission from website owners to
harvest their websites and provide access to archived versions
• Permission request email sent to contact info from
website
• If no response after 3 weeks, follow-up request with
notification of intent to archive website
Permissions statistics as of January 2014
• 983 requests sent
• 51% responded Yes
• .05% responded No
• 48.5% did not respond
Web Resources Archiving Collaboration
Many thanks to the Mellon Foundation
Building collaborations among
•
•
•
•

The web archiving community
Other research libraries
Users and potential users of web archives
Website creators
Incentives grants to
advance web archiving tools

Image source: http://imgur.com/gallery/vG7KE48
Building an efficient, coherent, and scalable
national framework for collecting web content

Image source: http://imgur.com/gallery/1m5MBKf
Designated space for collaborative collecting
Collaborative collection experiment in progress
via partner institutions in Borrow Direct
Collaboration with music librarians
Contemporary composers—the perfect storm?
Collection Development (test case)
Image of color coded spreadsheet removed from
published slides for confidentiality of collaborators
Project tracking
Use cases
Who are the web archives for? Are they being
used? Could we encourage more effective use?
http://hrwa.cul.columbia.edu
Using the Human Rights Web Archive & learning from
human rights scholars’ work (publications, citations)
Citations scraped from articles published in
2010 in select scholarly journals
Isolating URLs from list of citations
(approximately 10% of citations scraped have URLs in them)
Best Practices for site creators: working with
website creators

Image source: http://imgur.com/gallery/NWJ12Pl
Open issues: division and maintenance of cooperative
efforts (communication, software and more)
Process over next 18 months
•
•
•
•
•
•
•
•

Planning, needs assessment (interviews)
Group communication
Ongoing growth (scale of collecting and distribution of effort)
Planning for sharing costs and sustainability
5 year plan for Borrow Direct collaborations
Identifying themes for further collecting
Cataloging resources
Quality Assurance and ongoing evaluation
Leveraging expertise and finding ways
to share responsibilities
• Collecting strategy and establishment of priorities for
collection development
• Suggest seed URLs
• Liaise with site owners to solicit permission to archive
websites
• Conduct detailed quality assurance by browsing the
archived website as a user would (e.g. try to access media
files to ensure they have been successfully captured)
• Assessment of efficacy for users?
• Governance of collaborations
Leveraging expertise of web archivists
• Capture web archives & do initial quality assurance of a
crawl
• Coordinate efforts and field questions regarding
• technical elements of web archiving
• existing CUL web archiving policy
• permissions processing
• needs assessment
• user profiles and use cases
• value and usage assessment? (later)
Larger questions to return to periodically
• Do you have any ideas about what users would want to find in web
archives or use cases?
• Which websites do you and your colleagues consider indispensable for
your day to day work?
• Are there any websites in your institution's domain that should be
harvested and saved (e.g. web pages created by or about faculty
members, academic programs or student groups)?
• Are there any types of sites that would certainly be out of scope or have
proven to be out of scope?
• Is there anything impeding the use of web archives we are creating?
Takeaways

Image source: http://www.thebraiser.com/wp-content/uploads/2012/12/chinese.jpg
The future is bright!

Image source: Flickr user: Tambako the Jaguar (CC BY-ND 2.0)
Opportunities for discussion:
Today and on an ongoing basis

Image source: http://www.idocampaigns.com/Editor/assets/iStock_000014314309_Large.jpg
Thank you!
Anna Perricci
alp2198@columbia.edu

More Related Content

What's hot

Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
 
Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...
Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...
Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...NASIG
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISMicah Altman
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Walk this way: Online content platform migration experiences and collaboration
Walk this way: Online content platform migration experiences and collaboration Walk this way: Online content platform migration experiences and collaboration
Walk this way: Online content platform migration experiences and collaboration NASIG
 
ER&L 2020 - When the grass is greener
ER&L 2020 - When the grass is greenerER&L 2020 - When the grass is greener
ER&L 2020 - When the grass is greenerMatthew Ragucci
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
Impact of the evergreen library automation system on public library users
Impact of the evergreen library automation system on public library usersImpact of the evergreen library automation system on public library users
Impact of the evergreen library automation system on public library usersIndiana Online Users Group
 
Researching Researchers: Avalon's Repository Usage
Researching Researchers: Avalon's Repository UsageResearching Researchers: Avalon's Repository Usage
Researching Researchers: Avalon's Repository UsageAvalon Media System
 
Vision and design principles for site
Vision and design principles for siteVision and design principles for site
Vision and design principles for sitedpuwebservices
 

What's hot (20)

ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic RoadmapALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
 
Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web Archiving
 
Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...
Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...
Tracking Down the Problem: The Development of a Web-Scale Discovery Troublesh...
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PIS
 
ALA NISO-BISG Forum - Patron Privacy
ALA NISO-BISG Forum - Patron PrivacyALA NISO-BISG Forum - Patron Privacy
ALA NISO-BISG Forum - Patron Privacy
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
RA21 Charleston Library Conference Presentation
RA21 Charleston Library Conference Presentation RA21 Charleston Library Conference Presentation
RA21 Charleston Library Conference Presentation
 
Walk this way: Online content platform migration experiences and collaboration
Walk this way: Online content platform migration experiences and collaboration Walk this way: Online content platform migration experiences and collaboration
Walk this way: Online content platform migration experiences and collaboration
 
ER&L 2020 - When the grass is greener
ER&L 2020 - When the grass is greenerER&L 2020 - When the grass is greener
ER&L 2020 - When the grass is greener
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
Impact of the evergreen library automation system on public library users
Impact of the evergreen library automation system on public library usersImpact of the evergreen library automation system on public library users
Impact of the evergreen library automation system on public library users
 
ALA Standards Update -Sassone-ODI
ALA Standards Update -Sassone-ODIALA Standards Update -Sassone-ODI
ALA Standards Update -Sassone-ODI
 
Technology Evaluation and Meeting the Needs of People with Disabilities
Technology Evaluation and Meeting the Needs of People with Disabilities Technology Evaluation and Meeting the Needs of People with Disabilities
Technology Evaluation and Meeting the Needs of People with Disabilities
 
Researching Researchers: Avalon's Repository Usage
Researching Researchers: Avalon's Repository UsageResearching Researchers: Avalon's Repository Usage
Researching Researchers: Avalon's Repository Usage
 
Vision and design principles for site
Vision and design principles for siteVision and design principles for site
Vision and design principles for site
 
Bishop 2
Bishop 2Bishop 2
Bishop 2
 
Connecting the Dots: Constellations in the Linked Data Universe
Connecting the Dots: Constellations in the Linked Data UniverseConnecting the Dots: Constellations in the Linked Data Universe
Connecting the Dots: Constellations in the Linked Data Universe
 
Introduction to Digital Commons
Introduction to Digital CommonsIntroduction to Digital Commons
Introduction to Digital Commons
 

Viewers also liked

Lightning talk on MARC records for the Contemporary Composers Web Archive pre...
Lightning talk on MARC records for the Contemporary Composers Web Archive pre...Lightning talk on MARC records for the Contemporary Composers Web Archive pre...
Lightning talk on MARC records for the Contemporary Composers Web Archive pre...Anna Perricci
 
SAA Web Archiving Roundtable Education Needs Assessment Survey Results
SAA Web Archiving Roundtable Education Needs Assessment Survey ResultsSAA Web Archiving Roundtable Education Needs Assessment Survey Results
SAA Web Archiving Roundtable Education Needs Assessment Survey ResultsAnna Perricci
 
ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...
ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...
ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...Anna Perricci
 
Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Anna Perricci
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Anna Perricci
 
Retention Modeling for the Eastern Academic Scholars' Trust (EAST)
Retention Modeling for the Eastern Academic Scholars' Trust (EAST)Retention Modeling for the Eastern Academic Scholars' Trust (EAST)
Retention Modeling for the Eastern Academic Scholars' Trust (EAST)Anna Perricci
 
Dismantling Silos to Build Robust Shared Print Projects
Dismantling Silos to Build Robust Shared Print ProjectsDismantling Silos to Build Robust Shared Print Projects
Dismantling Silos to Build Robust Shared Print ProjectsAnna Perricci
 
Best Practices Exchange 2013: How collaboration can save [more of] the web: r...
Best Practices Exchange 2013: How collaboration can save [more of] the web: r...Best Practices Exchange 2013: How collaboration can save [more of] the web: r...
Best Practices Exchange 2013: How collaboration can save [more of] the web: r...Anna Perricci
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsAnna Perricci
 
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...Anna Perricci
 
Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Anna Perricci
 
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)Anna Perricci
 
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...Anna Perricci
 

Viewers also liked (13)

Lightning talk on MARC records for the Contemporary Composers Web Archive pre...
Lightning talk on MARC records for the Contemporary Composers Web Archive pre...Lightning talk on MARC records for the Contemporary Composers Web Archive pre...
Lightning talk on MARC records for the Contemporary Composers Web Archive pre...
 
SAA Web Archiving Roundtable Education Needs Assessment Survey Results
SAA Web Archiving Roundtable Education Needs Assessment Survey ResultsSAA Web Archiving Roundtable Education Needs Assessment Survey Results
SAA Web Archiving Roundtable Education Needs Assessment Survey Results
 
ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...
ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...
ACRL/NY 2013 poster: Assessment of the Effectiveness of the Human Rights Web ...
 
Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct
 
Retention Modeling for the Eastern Academic Scholars' Trust (EAST)
Retention Modeling for the Eastern Academic Scholars' Trust (EAST)Retention Modeling for the Eastern Academic Scholars' Trust (EAST)
Retention Modeling for the Eastern Academic Scholars' Trust (EAST)
 
Dismantling Silos to Build Robust Shared Print Projects
Dismantling Silos to Build Robust Shared Print ProjectsDismantling Silos to Build Robust Shared Print Projects
Dismantling Silos to Build Robust Shared Print Projects
 
Best Practices Exchange 2013: How collaboration can save [more of] the web: r...
Best Practices Exchange 2013: How collaboration can save [more of] the web: r...Best Practices Exchange 2013: How collaboration can save [more of] the web: r...
Best Practices Exchange 2013: How collaboration can save [more of] the web: r...
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive Awards
 
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...
 
Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...
 
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
 
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
 

Similar to METRO Conference 2014: How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingKristen Yarmey
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
NASIG 2020 - Walk this way
NASIG 2020 -  Walk this wayNASIG 2020 -  Walk this way
NASIG 2020 - Walk this wayMatthew Ragucci
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library PartnershipOCLC
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunitiesAhmed AlSum
 
IWMW 1998: About the parallel sessions
IWMW 1998: About the parallel sessionsIWMW 1998: About the parallel sessions
IWMW 1998: About the parallel sessionsIWMW
 
IWMW 1999: Parallel sessions on day 2
IWMW 1999: Parallel sessions on day 2IWMW 1999: Parallel sessions on day 2
IWMW 1999: Parallel sessions on day 2IWMW
 
Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices Richard Wallis
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Roxanne Missingham
 
SAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving RoundtableSAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving Roundtablerosalielack
 
The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data FutureThe Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data FutureNASIG
 
CIL 2020 - Bringing Collections to the Screen
CIL 2020 - Bringing Collections to the ScreenCIL 2020 - Bringing Collections to the Screen
CIL 2020 - Bringing Collections to the ScreenMatthew Ragucci
 
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)TimelessFuture
 
archana singh 102 (mlis)1.pptx
archana singh 102 (mlis)1.pptxarchana singh 102 (mlis)1.pptx
archana singh 102 (mlis)1.pptxSatishKumarGupta20
 
Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...
Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...
Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...Maine_SharedCollections
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data PublishingBrian Hole
 
Crossing the Streams: Trialing and Investigating New Format Resources
Crossing the Streams: Trialing and Investigating New Format ResourcesCrossing the Streams: Trialing and Investigating New Format Resources
Crossing the Streams: Trialing and Investigating New Format ResourcesLindsay Cronk
 

Similar to METRO Conference 2014: How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives (20)

Web decay and Internet Archive
Web decay and Internet ArchiveWeb decay and Internet Archive
Web decay and Internet Archive
 
Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web Archiving
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
NASIG 2020 - Walk this way
NASIG 2020 -  Walk this wayNASIG 2020 -  Walk this way
NASIG 2020 - Walk this way
 
Rusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing AccessRusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing Access
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library Partnership
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
IWMW 1998: About the parallel sessions
IWMW 1998: About the parallel sessionsIWMW 1998: About the parallel sessions
IWMW 1998: About the parallel sessions
 
IWMW 1999: Parallel sessions on day 2
IWMW 1999: Parallel sessions on day 2IWMW 1999: Parallel sessions on day 2
IWMW 1999: Parallel sessions on day 2
 
Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
SAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving RoundtableSAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving Roundtable
 
The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data FutureThe Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
 
CIL 2020 - Bringing Collections to the Screen
CIL 2020 - Bringing Collections to the ScreenCIL 2020 - Bringing Collections to the Screen
CIL 2020 - Bringing Collections to the Screen
 
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
 
archana singh 102 (mlis)1.pptx
archana singh 102 (mlis)1.pptxarchana singh 102 (mlis)1.pptx
archana singh 102 (mlis)1.pptx
 
Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...
Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...
Maine Shared Collections Strategy: Collaborating to Preserve Our Print Collec...
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
 
Crossing the Streams: Trialing and Investigating New Format Resources
Crossing the Streams: Trialing and Investigating New Format ResourcesCrossing the Streams: Trialing and Investigating New Format Resources
Crossing the Streams: Trialing and Investigating New Format Resources
 

More from Anna Perricci

Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web ArchivingAnna Perricci
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising Anna Perricci
 
Ethics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenaryEthics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenaryAnna Perricci
 
No one said this would be easy: Sustaining Webrecorder as a robust web archiv...
No one said this would be easy: Sustaining Webrecorder as a robust web archiv...No one said this would be easy: Sustaining Webrecorder as a robust web archiv...
No one said this would be easy: Sustaining Webrecorder as a robust web archiv...Anna Perricci
 
Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Anna Perricci
 
Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!Anna Perricci
 
Archiver le web pour les artistes : Atelier Webrecorder
Archiver le web pour les artistes : Atelier WebrecorderArchiver le web pour les artistes : Atelier Webrecorder
Archiver le web pour les artistes : Atelier WebrecorderAnna Perricci
 
Webrecorder: Building, Maintaining & Growing
Webrecorder: Building, Maintaining & GrowingWebrecorder: Building, Maintaining & Growing
Webrecorder: Building, Maintaining & GrowingAnna Perricci
 
Social Contexts of Web Archiving: Collaboration and Ethical Collection Building
Social Contexts of Web Archiving: Collaboration and  Ethical Collection BuildingSocial Contexts of Web Archiving: Collaboration and  Ethical Collection Building
Social Contexts of Web Archiving: Collaboration and Ethical Collection BuildingAnna Perricci
 
Web Archiving Intro (circa 2015)
Web Archiving Intro (circa 2015)Web Archiving Intro (circa 2015)
Web Archiving Intro (circa 2015)Anna Perricci
 
Slides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive SectorsSlides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive SectorsAnna Perricci
 
Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!Anna Perricci
 

More from Anna Perricci (12)

Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web Archiving
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Ethics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenaryEthics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenary
 
No one said this would be easy: Sustaining Webrecorder as a robust web archiv...
No one said this would be easy: Sustaining Webrecorder as a robust web archiv...No one said this would be easy: Sustaining Webrecorder as a robust web archiv...
No one said this would be easy: Sustaining Webrecorder as a robust web archiv...
 
Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019
 
Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!
 
Archiver le web pour les artistes : Atelier Webrecorder
Archiver le web pour les artistes : Atelier WebrecorderArchiver le web pour les artistes : Atelier Webrecorder
Archiver le web pour les artistes : Atelier Webrecorder
 
Webrecorder: Building, Maintaining & Growing
Webrecorder: Building, Maintaining & GrowingWebrecorder: Building, Maintaining & Growing
Webrecorder: Building, Maintaining & Growing
 
Social Contexts of Web Archiving: Collaboration and Ethical Collection Building
Social Contexts of Web Archiving: Collaboration and  Ethical Collection BuildingSocial Contexts of Web Archiving: Collaboration and  Ethical Collection Building
Social Contexts of Web Archiving: Collaboration and Ethical Collection Building
 
Web Archiving Intro (circa 2015)
Web Archiving Intro (circa 2015)Web Archiving Intro (circa 2015)
Web Archiving Intro (circa 2015)
 
Slides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive SectorsSlides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive Sectors
 
Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!Webrecorder: Web Archiving for All!
Webrecorder: Web Archiving for All!
 

Recently uploaded

4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 

Recently uploaded (20)

4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 

METRO Conference 2014: How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

  • 1. How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives Anna Perricci Columbia University Libraries METRO Conference 2014 January 15, 2014
  • 2. Overview • Web archiving context (Who) • Benefits of curated web archives/use cases (Why) • Columbia University Libraries Web Resources Collection Program (What/How) • Background • Selection • Permissions • Harvesting • Description • Access
  • 3. Andrew W. Mellon Foundation support for CUL web archiving Grant projects • • • Collection Building for Web Resources (2008-2009) 1 FTE: project librarian Web Resources Collection Program Development (20092012) 3 FTE: 2 web curators, 1 programmer Web Resources Archiving Collaboration (2013-2015) 2 FTE: 1 project librarian, 1 bibliographic asst
  • 4. CUL Web Archive Collections • Avery Library Historic Preservation and Urban Planning 97 websites, semiannual • Burke Library New York City Religions 225 websites, semiannual • Human Rights 591 websites, quarterly • Rare Book and Manuscript Library 54 websites, semiannual • University Archives most of columbia.edu domain, plus 75+ affiliated sites with "external" URLs • General
  • 5. CUL’s Archive-It space: http://www.archive-it.org/organizations/304
  • 6. Elements of CUL’s web archiving workflows • • • • • Selection or nomination Archive-It for scoping, crawls, access, storage & QA Cataloging workflows/output Firm policy on permissions Project management tools for growing number of collaborations • Workflows can vary a little between collections
  • 7. Archive-It provides some essential services so CUL can focus on curation, cataloging and collaboration building
  • 8. Archive-It’s responsiveness to CUL’s requests • Metadata: custom field functionality added to metadata template & all fields made repeatable • Pilot projects: options related to ignoring robots.txt, downloading WARC files and export metadata as XML, improvements to capture for architecture/urban planning sites via Wayback QA tools
  • 9. Challenges: media files & images (using QA tools)
  • 10. Description and access for archived websites • Archive-it.org site-level metadata (All thematic collections, DCMI, copied from MARC records if possible) • CLIO collection-level MARC records (Human rights, Avery, Burke) • CLIO site-level MARC records (Human rights, Avery) • Document-level MARC records (selected longish Avery collection reports, pre-existing IRCR records) • Human Rights Web Archive portal on CUL website (using metadata extracted from MARC records)
  • 11. CLIO public view of MARC collection-level record
  • 12. Permissions There is no explicit US Copyright Act libraries exception for web archiving CUL policy is to request permission from website owners to harvest their websites and provide access to archived versions • Permission request email sent to contact info from website • If no response after 3 weeks, follow-up request with notification of intent to archive website Permissions statistics as of January 2014 • 983 requests sent • 51% responded Yes • .05% responded No • 48.5% did not respond
  • 13. Web Resources Archiving Collaboration Many thanks to the Mellon Foundation Building collaborations among • • • • The web archiving community Other research libraries Users and potential users of web archives Website creators
  • 14. Incentives grants to advance web archiving tools Image source: http://imgur.com/gallery/vG7KE48
  • 15. Building an efficient, coherent, and scalable national framework for collecting web content Image source: http://imgur.com/gallery/1m5MBKf
  • 16. Designated space for collaborative collecting
  • 17. Collaborative collection experiment in progress via partner institutions in Borrow Direct
  • 20. Collection Development (test case) Image of color coded spreadsheet removed from published slides for confidentiality of collaborators
  • 23. Who are the web archives for? Are they being used? Could we encourage more effective use?
  • 25. Using the Human Rights Web Archive & learning from human rights scholars’ work (publications, citations)
  • 26. Citations scraped from articles published in 2010 in select scholarly journals
  • 27. Isolating URLs from list of citations (approximately 10% of citations scraped have URLs in them)
  • 28. Best Practices for site creators: working with website creators Image source: http://imgur.com/gallery/NWJ12Pl
  • 29. Open issues: division and maintenance of cooperative efforts (communication, software and more)
  • 30. Process over next 18 months • • • • • • • • Planning, needs assessment (interviews) Group communication Ongoing growth (scale of collecting and distribution of effort) Planning for sharing costs and sustainability 5 year plan for Borrow Direct collaborations Identifying themes for further collecting Cataloging resources Quality Assurance and ongoing evaluation
  • 31. Leveraging expertise and finding ways to share responsibilities • Collecting strategy and establishment of priorities for collection development • Suggest seed URLs • Liaise with site owners to solicit permission to archive websites • Conduct detailed quality assurance by browsing the archived website as a user would (e.g. try to access media files to ensure they have been successfully captured) • Assessment of efficacy for users? • Governance of collaborations
  • 32. Leveraging expertise of web archivists • Capture web archives & do initial quality assurance of a crawl • Coordinate efforts and field questions regarding • technical elements of web archiving • existing CUL web archiving policy • permissions processing • needs assessment • user profiles and use cases • value and usage assessment? (later)
  • 33. Larger questions to return to periodically • Do you have any ideas about what users would want to find in web archives or use cases? • Which websites do you and your colleagues consider indispensable for your day to day work? • Are there any websites in your institution's domain that should be harvested and saved (e.g. web pages created by or about faculty members, academic programs or student groups)? • Are there any types of sites that would certainly be out of scope or have proven to be out of scope? • Is there anything impeding the use of web archives we are creating?
  • 35. The future is bright! Image source: Flickr user: Tambako the Jaguar (CC BY-ND 2.0)
  • 36. Opportunities for discussion: Today and on an ongoing basis Image source: http://www.idocampaigns.com/Editor/assets/iStock_000014314309_Large.jpg

Editor's Notes

  1. Foster technical innovationBuild collaborative web archiving modelsDevelop use cases (who are we trying to serve; how can we make sure our efforts effective?)Use Columbia University’s web resources as a test case (e.g. are we successfully getting the right stuff, and can we get feedback and cooperative efforts from site creators)