The 7 Things I Know About Cyber Security After 25 Years | April 2024
Dcc endeavour-2006
1. a centre of expertise in data curation and preservation
Experience is a hard teacher…
Curation and the Digital Record
Chris Rusbridge
Endeavor EndUser 2006
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-
nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
Francisco, California, 94105, USA.
2. a centre of expertise in data curation and preservation
"Experience is a hard teacher
because she gives the test first,
the lesson afterwards”
• Vernon Sanders Law, ex baseball player
• (Or perhaps, in the case of digital preservation,
the test occurs long after you are dead?)
Endeavor EndUser 2006
3. a centre of expertise in data curation and preservation
Contents
• Curation
• Sustainability
• Data resources
• Preservation & curation issues
• OAIS Review
Endeavor EndUser 2006
4. a centre of expertise in data curation and preservation
Curation
• Data increasingly important as evidence
• Experimental verifiability (the basis of science)
• Unrepeatable observations & experiments
(particularly environmental in broadest sense)
• Legal, compliance & transactions
• Cultural resources
• For evidential value, data must be curated
Endeavor EndUser 2006
5. a centre of expertise in data curation and preservation
Curation
• “Maintaining and adding value to a trusted
body of digital information for current and
future use”
Endeavor EndUser 2006
6. a centre of expertise in data curation and preservation
Lynch remarks
• Closing the 2005 Curation Conference
• 3 views of digital curation
• Collection as a living thing
• Whole life process, evolving object(s)
• Finite process, handover to preservation
Endeavor EndUser 2006
7. a centre of expertise in data curation and preservation
Endeavor EndUser 2006
8. a centre of expertise in data curation and preservation
•This is what you do!
Endeavor EndUser 2006
9. a centre of expertise in data curation and preservation
Sustainability and exit strategy
• Most critical resource for curation: present
and future money supply!
• Plan for the long term, but have a succession
plan
• Sustained approach not project mentality
Endeavor EndUser 2006
10. a centre of expertise in data curation and preservation
Sustainability and exit strategy
• Most critical resource for curation: present
and future money supply!
• Plan for the long term, but have a succession
plan
• Sustained approach not project mentality
•This is what you do!
Endeavor EndUser 2006
11. a centre of expertise in data curation and preservation
Some illustrations: UK census
• 1881 census (UKDA)
• Hand-written individual return forms: data conversion issue
(reference form available): digitisation and access issues
• 1961 census (TNA/NDAD)
• First using computers to analyse (first major UK-wide
computer project?); individual returns closed until 2062: data
preservation issue!!!
• 2001 census (ONS/CDU)
• Data corrections and adjustments: curation issue
Endeavor EndUser 2006
12. a centre of expertise in data curation and preservation
Curation of emails
Lots of metadata and context (RFC 822)
Often highly distributed
Split conversations
Unknown numbers of copies
Personal choice of clients
• Legal requirements!
• Controlled filing and controlled deletion
needed…
Endeavor EndUser 2006
13. a centre of expertise in data curation and preservation
Endeavor EndUser 2006
14. a centre of expertise in data curation and preservation
Online Public Access Catalogues
• Long term, curated databases
• Often high quality (not always)
• Well known interchange standards (MARC),
classification standards (several), name
authorities…
• Still significant problems combining sources
Endeavor EndUser 2006
15. a centre of expertise in data curation and preservation
Endeavor EndUser 2006
16. a centre of expertise in data curation and preservation
Endeavor EndUser 2006
19. a centre of expertise in data curation and preservation
TWOMASS (Infrared)
SDSS (Visual)
Endeavor EndUser 2006 Slide from Rajendra Bose
20. a centre of expertise in data curation and preservation
Endeavor EndUser 2006 Slide from Rajendra Bose
21. a centre of expertise in data curation and preservation
Example…
• National Virtual Observatory
• Johns Hopkins press release: “Scientists working to create the
NVO, an online portal for astronomical research unifying dozens of
large astronomical databases, confirmed discovery of [a] new
brown dwarf recently. The star emerged from a computerized
search of information on millions of astronomical objects in two
separate astronomical databases. Thanks to an NVO prototype,
that search, formerly an endeavor requiring weeks or months of
human attention, took approximately two minutes.”
Endeavor EndUser 2006
22. a centre of expertise in data curation and preservation
Context
• Data meaningless without context
• Linkage
• Metadata of many kinds
• Workflow!
• Provenance
• Computational lineage
• Authenticity
Endeavor EndUser 2006
23. a centre of expertise in data curation and preservation
Access and re-use
• Ethics and rights control access
• Weak in expressing this long-term
• Collaboration tools
• Annotation, discussion, review
• Re-use leading to change and development
• “Publication”
• Not just in “print”
• Underlying data should be “published”, too
• Citation…
Endeavor EndUser 2006
24. a centre of expertise in data curation and preservation
Citation
• Needs a stable resource to cite…
OWL Web Ontology Language
Reference
W3C Proposed Recommendation 15 December 2003
This version:
http://www.w3.org/TR/2003/PR-owl-ref-20031215/
Latest version:
http://www.w3.org/TR/owl-ref/
Previous version:
http://www.w3.org/TR/2003/CR-owl-ref-2003081
Endeavor EndUser 2006
25. a centre of expertise in data curation and preservation
Citation…
• The date alone (as in common web citation
approaches) is not enough!
•[6] The CIA World Factbook.
•www.cia.gov/cia/publications/factbook/.
•Retrieved on 8 Jan 2006.
• Cited object likely to have changed…
• Citation should link to the cited object as it was!
Endeavor EndUser 2006
26. a centre of expertise in data curation and preservation
Citation needs…
• An efficient way to reference and access “archived”
past states of a changing dataset (work in progress,
Buneman et al)
• Less important for original observations
• Don’t mess with those data
• Less important for incremental datasets
• Later stuff should not invalidate earlier
• Very important for revisable datasets
• Eg Genomics… datasets that result from the combined work
of curators, or contain opinions or facts likely to change
Endeavor EndUser 2006
27. a centre of expertise in data curation and preservation
XML Archive at time t - 1
XMLArch: System Architecture
time t
Relational
XML Archiver
XML Snapshot at
Database
Pre-processor
Version
Merger
Data Extractor
XML Archive at time t
Endeavor EndUser 2006 •Carwyn Edwards
28. a centre of expertise in data curation and preservation
Preservation & curation
• Use preserves
• Money preserves
• Redundancy good, monoculture bad?
• LOCKSS-type & other approaches…
• Bits are fragile and robust
• Don’t rely on portable media
• Look after them well
• Technology changes…
• How fast? What impact?
• Metadata matters! (Know what you’ve got)
Endeavor EndUser 2006
29. a centre of expertise in data curation and preservation
Formats, migration, significant
properties…
• “We MUST preserve the look and feel!”
• Well…
• Think about a book like “Kenilworth” by Walter
Scott
• Think about the BBC Domesday emulation
• You may be better with a preserved
“desiccated” version… than nothing at all!
Endeavor EndUser 2006
30. a centre of expertise in data curation and preservation
The Project Gutenberg EBook of Kenilworth, by Sir Walter Scott
This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever. You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org
Title: Kenilworth
Author: Sir Walter Scott
Release Date: February 21, 2006 [EBook #1606]
Language: English
Character set encoding: ASCII
*** START OF THIS PROJECT GUTENBERG EBOOK KENILWORTH ***
Produced by An Anonymous Volunteer and David Widger
KENILWORTH.
by Sir Walter Scott, Bart.
INTRODUCTION
A certain degree of success, real or supposed, in the delineation of
Queen Mary, naturally induced the author to attempt something similar
respecting "her sister and her foe," the celebrated Elizabeth. He
will not, however, pretend to have approached the task with the same
feelings; for the candid Robertson himself confesses having felt the
prejudices with which a Scottishman is tempted to regard the subject;
…
Endeavor EndUser 2006
31. a centre of expertise in data curation and preservation
•But there
•ARE limits!
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Endeavor EndUser 2006
32. a centre of expertise in data curation and preservation
Preservation is not cheap
• But it’s not expensive…
Endeavor EndUser 2006
33. a centre of expertise in data curation and preservation
Preservation is not cheap
• But it’s not expensive…
• Compared with the alternative!
Endeavor EndUser 2006
34. a centre of expertise in data curation and preservation
Preservation is not cheap
• But it’s not expensive…
• Compared with the alternative!
•Postcard
•Sent to me
•anonymously
Endeavor EndUser 2006
35. a centre of expertise in data curation and preservation
Curation: whose job is it?
• Yours!
• With your archivists
• And your Records Managers
• And your scientists and scholars…
Endeavor EndUser 2006
36. a centre of expertise in data curation and preservation
Preservation & curation
• We can’t do it alone
• Collective responsibility
• We can’t rely on anyone else
• Institutional responsibility
Endeavor EndUser 2006
37. a centre of expertise in data curation and preservation
It’s about time…
• From the very short
• Good management (don’t under-estimate but don’t
over-estimate)
• Through the medium term
• Curation: use it or lose it
• Gather ye metadata while ye may!
• Preservation relay
• To the very long term
• High commitment, high cost, high risk
• Harder to do en masse
Endeavor EndUser 2006
38. a centre of expertise in data curation and preservation
Supplier role?
• Work together with libraries…
• Multi-supplier, Multi-platform
• Open source mix
• The library is not simple any more
• Library 2.0?
• Power of crowds, economy of attention,
generation X…
• Wikicat?
Endeavor EndUser 2006
39. a centre of expertise in data curation and preservation
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Endeavor EndUser 2006
40. a centre of expertise in data curation and preservation
Supplier role?
• Work together with libraries…
• Multi-supplier, Multi-platform
• Open source mix
• The library is not simple any more
• Library 2.0?
• Power of crowds, economy of attention, generation X…
• Wikicat?
• Web 2.0?
• Mix, mashup
• What you see is… not there?
Endeavor EndUser 2006
41. a centre of expertise in data curation and preservation
BEWARE WEB 2.0!!!
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Endeavor EndUser 2006
42. a centre of expertise in data curation and preservation
OAIS
• “Announcement of a Comment Period for the Five
Year Review of the Reference Model for an Open
Archival Information System (OAIS) Standard”
• “… must be reviewed every five years and a determination
made to reaffirm, modify, or withdraw the existing standard.”
• “…any revision must remain backward compatible with
regard to major terminology and concepts.”
• “… we do not plan to expand the general level of detail”
• “… reduce ambiguities and fill in any missing or weak
concepts”
• Make suggestions and express interest until 30/10/06
• OAIS-support@delight.gsfc.nasa.gov
Endeavor EndUser 2006
43. a centre of expertise in data curation and preservation
To close…
• Your library is currently taking the curation
test…
• Your children will learn the answer!
• But
Endeavor EndUser 2006
Hinweis der Redaktion
Initially we have concentrated on data extracted from relational databases, mainly because this is where the IUPHAR data is. 1) Extract to XML (friendly hierarchical format). 2) Next we want to merge with the archive containing the previous versions. 3) Process and Merge 4) New archive with latest version added. Demo ....