A dry-run of content I wanted to present to an Australian Society of Archivists workshop 21 October 2016.
This trial run was at Archives New Zealand on 28 September 2016.
4. 2014-06-20: Play It Again Conference Report:
http://bit.ly/2d8Bnw0
(playitagain.org)
2014-11-25: The Reality of Digital Transfer:
http://bit.ly/2ctxocQ
(slideshare.net)
5. We (Archives NZ) have got quite far… But
there's still a lot more to do…
6. So let's remind ourselves: What is the point?
● Work in concert with agencies and their consultants.
● Generate better information and records management
● Cleaner transfers...
● Create a more open and transparent government where the digital record is
concerned...
● DIA’s line... Support New Zealanders to build strong communities by providing
access to trusted information and knowledge.
7. And! Digital Preservation
● At this point in time, idiomatic methods of preservation are still forming...
● Whatever the future of archival custodianship...
● Or the future of digital preservation...
● Techniques need to be developed to support agencies with information and records
management, and memory institutes with long-term custodianship.
● Don't fall into the processing trap...
8. What can we identify as important?
● Infrastructure/team, supported by the organisation
● Some things work, some don’t; some change... be flexible.
● Work iteratively...
● Look at what you can do...
● Continue to develop... evidence, real use-cases
11. Policy...
●Has been a constant in my time here.
●Was a draw to me starting in NZ
●Sets the rules by which we can play…
●Literally, play: bend don’t break
● Achieved through careful stakeholder consultation and consideration of
impact.
●Sign-off process at director level.
●Two favourite policies, checksum, pre-conditioning.
12. Team...
●We could always do with more people…
●But we recognise that we've been allowed more folk dedicated to this
than some places.
●The team is supported in their decision making and their skills.
●Breakdown: Curious; driven; up-to-date; drive to ‘solve’ born-digital
transfer; different but complementary skills… *passion*!
●(And opinionated! ;-) )
●It doesn’t always look that way but there is a certain amount of leeway
from IT support too...
13. Technology...?
Rosetta by Ex-Libris: is the Long-term preservation system, it allows us to manage some
quite complex bits 'n' pieces… but:
●Does not yet enable transfer from Agency-to-Archives (it supports)
●Is not a clearing house for records
●Spot preservation risks up-front
●Doesn't 'do' sentencing…
●Does not build ingest packages…
●Does not 'do' archival description...
●Does not contain every tool under the sun to handle all the file formats…
Machine Learning: http://nautil.us/blog/the-fundamental-limits-of-machine-learning
14. The processes we need are biased toward transfer
and ingest…
Rosetta can only help so much…
||----------------||---------------------------------------------------------------------------------------------------||
Creation Transfer (Life of a record ~25 years) Life of an archive ~∞
The other processes we will still need will be
about (active) long term custodianship…
Rosetta is still only beginning that journey...
15. The miscellany in this presentation...
A story about the tools that can help us...
● Technical Registries (of practice)
● DROID/Siegfried Analysis Report
● Fuzzy Hashes
16.
17.
18. With everything we need to do…
We cannot action it all at the same time...
19. Knowledge needs to remain alive and accessible, record it:
Source: https://commons.wikimedia.org/wiki/Category:Kanban#/media/File:Simple_Task_Kanban.jpg
22. DROID/Siegfried Analysis Report
● Example of changing needs and capability
● Initially a plain-text reporting tool
● Evolved into a 'team' tool…
● Evolving into an organisation’s tool…
● Hopefully a community tool…
● Our first port of call for any transfer...
* Marriage of DROID and Siegfried: http://bit.ly/2ddS0IP
* A little bit more about the tool: http://bit.ly/2dii3jP
23. DROID/Siegfried Analysis Report
● Available to all the community (December 2013): http://bit.ly/2cB8gFY
● Maps DROID and Siegfried output to an SQLite database for querying power and speed.
● Aside from Python, ZERO-dependencies – user needs to be able to download it and go...
● Complete flexibility over output.
● TXT, HTML, Rogues, Heroes… Normalization via database layer – write your own!
● Normalization via database layer – abstracted for multiple ID tools
● The tools each do what they're supposed to well, the dissection of output can be left to others.
* Marriage of DROID and Siegfried (OPF Blog): http://bit.ly/2ddS0IP
* A little bit more about the tool (OPF Blog): http://bit.ly/2dii3jP
28. Benefits...
● Sets a baseline for a lingua franca… beginners and experts
alike...
● Definitions contributed by our archivists!
● Easier on the eye
● Re-factored to be more flexible
● Give it a try! Let us know how it goes!
31. Checksums
● Looking to be unique
– De-duplication
– Fixity
● No connection between
– Security function
– Cannot reverse
32. But every file has a connection...
● Binary
● File Format
● Textual Content
● Embedded Content
● Template
● Author
● Like DNA, with many different strands to dissect...
● Fuzzy Hashing!
35. And they look like...
● aad371039d588b43e02887f87e570f6d2b1a7f1da89667ef11227d
9b3e706610d8e12d
● 0dc36013dd088b43e02983f87e534e6d2b1a7f1da88627ef11267d
8b3e716610d9e16d
● Not that different from regular checksums!
● But help us to demonstrate a closer relationship between files…
● “The sum of the parts is greater than the whole.”
~ Arist!otle
40. How can we use this?
● Sentencing... while still teaching our machines, we can still close
the net while looking at records manually…
● Discovery: Amazon like results: You might also like this record!
41. The experiment continues...
● Matches are relative to themselves...
● Algorithms make a difference...
● And perhaps, like genetics... some traits are more dominant than
others...
● Consider working with content in different ways...
– Utilize format bias... normalize
– Separate content from structure and analyse?
● Keep trying things, but at minimum cost... (another agile concept:
minimal viable product)
42.
43. Conclusion: A bit more miscellany
●Keyword: Interim
●Our needs change constantly, and there's a lot to do…
●Don't suffer paralysis by analysis.
●Do a requirements analysis
●Look at what you can do (minimum viable product) and iterate...
44. Conclusion: A bit more miscellany
●Lot's of hints to bits 'n' pieces I haven't been able to talk about:
●Role of the community… (They/We're here to help! Same problems!)
●Communication and sharing… (Do it!)
●Software development skills… (There are other ways to be involved)
What's the point? (OPF Blog): http://bit.ly/2ddXnaY
●Maybe also a seed for discussion.