4. What is an Archive?
An authoritative collection
Preserved and professionally managed in perpetuity
History, institutional commitment & policy, integrity re:
preservation
“…information needed for society’s memory.”
"Schellenberg in Cyberspace," American Archivist 61:2 (Fall 1998), p. 309-327.
Preservation first
5. What is a Repository?
“A place where things can be stored and maintained; a
storehouse.”
[Society of American Archivists Glossary]
“Depository” is same
also library that receives government documents to public
access
Not all repositories are archives
6. Why Care?
“Preserving information for decades or even centuries has
proved important. Shang dynasty (12th century BC)
Chinese astronomers inscribed eclipse observations on
“oracle bones" (animal bones and tortoise shells).
About 3200 years later researchers used these records,
together with one from 1302BC, to estimate that the
accumulated clock error was just over 7 hours, and
from this derived a value for the viscosity of the Earth's
mantle as it rebounds from the weight of the glaciers..”
********
7. Why Care?
“These timescales of many decades, even centuries,
contrast with the typical 5-year lifetime for computing
hardware and digital media”
“A Fresh Look at the Reliability of Longterm Digital Storage.” Baker, Mary, et al..
EuroSys '06, April 18-21, 2006
8. Why Care?
Preservation: Digital information is impermanent
Publisher: Safety
to insure ongoing availability of your content
Your library customers: Custodianship
to insure continuity of the record of scientific
progress
Very long view: epistemology, history of
science and culture
9. What Should be Preserved?
Scholarly content
Research materials
Web-based, digitally born content
10. How e-Archives Differ
Mission: collection v. preservation
Access control, dark v. light
Deposits
Why: voluntary v. mandated
Who: author v. publisher
What: manuscripts v. final work
When: backfile v. current content
Future format migration
Rights transfer
Costs
12. Types of Archives:
National archives
Institutional repositories
Community-based archives
Product solution archives
13. Types of Archives:
National
Dutch National library
Koninklijke Bibliotheek (KB)
British Library
NIH – PubMedCentral?
“NIH’s digital repository for biomedical research”
Library of Congress?
14. KB: Dutch National Library
Mission: Legal deposit library
“…collect, catalogue and preserve all publications
appearing in the Netherlands. ”
Capable of ingesting 60,000 articles/day
Deposits: Source files from publishers
Automated, strict
Costs?
Access Control:
Local patron access
Publisher sets remote access rules
15. KB: Dutch National Library
Migration: Preservation research leader
Committed to format migration
Archiving agreements with:
OUP, Sage, Blackwell, Elsevier, Kluwer Academic, etc.
16. The British Library
Legal Deposit Pilot
Mission: Legal deposit library
UK-published (to start)
Pilot: Legal deposit for e-journals
23 volunteer publishers
Secure infrastructure
Uses DigiTool by Ex-Libris
Shared with the other UK legal deposit libraries
To “scope and test” ingest, storage, retrieval
Cost?
17. The British Library:
Preservation and Migration
BL’s future for managing digital assets
preserve any type of digital material in perpetuity
Migration
ensure that users can view the material with contemporary
applications
preserve the original look-and-feel where possible
Access Control
“appropriate permissions”
18. PMC: US National Library of
Medicine Journal Archive
Mission: Make research more accessible
Free full-text archive of 230 journals
Deposit: publishers submit source files
Migration
Access Control
Cost?
19. PMC: Depository for
NIH-Funded Research Articles
Authors of NIH-funded articles “encouraged” to
deposit final manuscript
“After all modifications due to …peer review”
MS Word, PDF, etc.
With supplementary information
Publisher can replace with published version
To be required soon?
20. Library of Congress
National Digital Information Infrastructure and Preservation
Program (NDIIPP) – formed in 2000
Members: National Library of Medicine, the National
Agricultural Library, the National Institute of Standards and
Technology, the Research Libraries Group, the OCLC
Online Computer Library Center, and the Council on
Library and Information Resources
Preliminary investigation and software development phase
Primarily e-journal deposit
Future …???
21. Types of Archives:
Institutional
University with expansive focus
Stanford Digital Repository
Automated
LOCKSS
22. Stanford Digital Repository
Stanford Univ. Libraries initiative
Digital preservation serving
Stanford University
Broader academic community
Publishers
Principles: Trust, Security, Transparency
Costs?
23. LOCKSS
Technology to preserve local library collection
Automated, self-correcting cache servers
Requires LOCKSS server at library
Requires publisher participation
Builds collection of all resources which the institution
licenses
Goes online to users if data source becomes unavailable
Provides access to static “HTML images” of source
Costs
25. Portico
Mission: scholarly preservation
Standalone archive
Initiated by JSTOR, with grant funding
Deposits: source files from publisher
Migration: planned
Costs
Publishers annual fee $250 to $75,000
based on annual revenue
Libraries annual fee $1,500 to $24,000
based on Library Materials Expenditure
26. Portico: Access Control
Member libraries get access:
“when specific trigger events occur, and when titles are
no longer available from the publisher or other source.”
Trigger events include:
Publisher stops operations
Publisher ceases to publish a title
Publisher no longer offers back issues
Catastrophic and sustained failure of a publisher’s delivery
platform
Can also fulfill “perpetual access” subscription
obligations
28. CLOCKSS (Controlled LOCKSS)
Long-term global archiving solution
Community-managed, failsafe repository for scholarly content
Serve libraries & publishers in the event of a long-term business
interruption
Publishers participation is voluntary
Small number library participants maintain the archive on behalf
of larger community
libraries preserve member publisher content whether they subscribe or
not
Release only after a trigger event
Publisher, libraries, and society collaborative decision to release
“cost sharing” for system, not access
Costs?
30. Summary:
How Repositories Differ
Stated purpose
Dark v. light
Complete backfile v. current only
Deposits
Who: author v. publisher
What: manuscripts v. final work
Why: voluntary v. mandated
Rights transfer
Access control
Costs
32. Why Archive?
SAGE’s commitment to customers and partners
Critical to society arrangements
Essential for new e-sales (consortia + single
institutions) – Perpetual access
Business continuity
Long-term preservation
We are not archiving experts!
33. Where to Archive?
Dutch KB
CLOCKSS
LOCKSS
Portico
Library of Congress
British Library
34. How to Archive?
Provide details of digital availability
Provide sample of content
Provide details of content format (DTD)
Send all backfile for loading
Set up content flow for ongoing content
35. SAGE Experience with
DutchKB
Contract and negotiation
Contact with technical team
Delivery of samples and details of scope
Follow-up questions
Visit KB – Find out what’s happening
Delivery of back content
Delivery of ongoing issues
Ongoing issue discrepancies