SlideShare a Scribd company logo
1 of 45
Preservation Capability Miscellany
By Ross Spencer
Twitter: @beet_keeper
A brief ‘provenance’ note…
2014-06-20: Play It Again Conference Report:
http://bit.ly/2d8Bnw0
(playitagain.org)
2014-11-25: The Reality of Digital Transfer:
http://bit.ly/2ctxocQ
(slideshare.net)
We (Archives NZ) have got quite far… But
there's still a lot more to do…
So let's remind ourselves: What is the point?
● Work in concert with agencies and their consultants.
● Generate better information and records management
● Cleaner transfers...
● Create a more open and transparent government where the digital record is
concerned...
● DIA’s line... Support New Zealanders to build strong communities by providing
access to trusted information and knowledge.
And! Digital Preservation
● At this point in time, idiomatic methods of preservation are still forming...
● Whatever the future of archival custodianship...
● Or the future of digital preservation...
● Techniques need to be developed to support agencies with information and records
management, and memory institutes with long-term custodianship.
● Don't fall into the processing trap...
What can we identify as important?
● Infrastructure/team, supported by the organisation
● Some things work, some don’t; some change... be flexible.
● Work iteratively...
● Look at what you can do...
● Continue to develop... evidence, real use-cases
Is it all there for us..?
No, but we have a good foundation…
Policy...
●Has been a constant in my time here.
●Was a draw to me starting in NZ
●Sets the rules by which we can play…
●Literally, play: bend don’t break
● Achieved through careful stakeholder consultation and consideration of
impact.
●Sign-off process at director level.
●Two favourite policies, checksum, pre-conditioning.
Team...
●We could always do with more people…
●But we recognise that we've been allowed more folk dedicated to this
than some places.
●The team is supported in their decision making and their skills.
●Breakdown: Curious; driven; up-to-date; drive to ‘solve’ born-digital
transfer; different but complementary skills… *passion*!
●(And opinionated! ;-) )
●It doesn’t always look that way but there is a certain amount of leeway
from IT support too...
Technology...?
Rosetta by Ex-Libris: is the Long-term preservation system, it allows us to manage some
quite complex bits 'n' pieces… but:
●Does not yet enable transfer from Agency-to-Archives (it supports)
●Is not a clearing house for records
●Spot preservation risks up-front
●Doesn't 'do' sentencing…
●Does not build ingest packages…
●Does not 'do' archival description...
●Does not contain every tool under the sun to handle all the file formats…
Machine Learning: http://nautil.us/blog/the-fundamental-limits-of-machine-learning
The processes we need are biased toward transfer
and ingest…
Rosetta can only help so much…
||----------------||---------------------------------------------------------------------------------------------------||
Creation Transfer (Life of a record ~25 years) Life of an archive ~∞
The other processes we will still need will be
about (active) long term custodianship…
Rosetta is still only beginning that journey...
The miscellany in this presentation...
A story about the tools that can help us...
● Technical Registries (of practice)
● DROID/Siegfried Analysis Report
● Fuzzy Hashes
With everything we need to do…
We cannot action it all at the same time...
Knowledge needs to remain alive and accessible, record it:
Source: https://commons.wikimedia.org/wiki/Category:Kanban#/media/File:Simple_Task_Kanban.jpg
Trello: is one option...
Features...
● Kanban
● Teams
● Ownership
● Visibility
● Accessibility
● Reduce transitory records
● Create temporality
● Centralize knowledge
● Invite external colleagues
DROID/Siegfried Analysis Report
● Example of changing needs and capability
● Initially a plain-text reporting tool
● Evolved into a 'team' tool…
● Evolving into an organisation’s tool…
● Hopefully a community tool…
● Our first port of call for any transfer...
* Marriage of DROID and Siegfried: http://bit.ly/2ddS0IP
* A little bit more about the tool: http://bit.ly/2dii3jP
DROID/Siegfried Analysis Report
● Available to all the community (December 2013): http://bit.ly/2cB8gFY
● Maps DROID and Siegfried output to an SQLite database for querying power and speed.
● Aside from Python, ZERO-dependencies – user needs to be able to download it and go...
● Complete flexibility over output.
● TXT, HTML, Rogues, Heroes… Normalization via database layer – write your own!
● Normalization via database layer – abstracted for multiple ID tools
● The tools each do what they're supposed to well, the dissection of output can be left to others.
* Marriage of DROID and Siegfried (OPF Blog): http://bit.ly/2ddS0IP
* A little bit more about the tool (OPF Blog): http://bit.ly/2dii3jP
● Plain-text example...
● HTML Example…
Let’s have a look…
http://bit.ly/2dircst
Benefits...
● Sets a baseline for a lingua franca… beginners and experts
alike...
● Definitions contributed by our archivists!
● Easier on the eye
● Re-factored to be more flexible
● Give it a try! Let us know how it goes!
Checksums
● Look like:
– MD5: d41d8cd98f00b204e9800998ecf8427e
– SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709
Checksums
Checksums
● Looking to be unique
– De-duplication
– Fixity
● No connection between
– Security function
– Cannot reverse
But every file has a connection...
● Binary
● File Format
● Textual Content
● Embedded Content
● Template
● Author
● Like DNA, with many different strands to dissect...
● Fuzzy Hashing!
Fuzzy Hashing: SSDEEP
Source: https://github.com/KLDavies/ssdeep/
Fuzzy Hashing: tlsh
Source: https://github.com/trendmicro/tlsh
And they look like...
● aad371039d588b43e02887f87e570f6d2b1a7f1da89667ef11227d
9b3e706610d8e12d
● 0dc36013dd088b43e02983f87e534e6d2b1a7f1da88627ef11267d
8b3e716610d9e16d
● Not that different from regular checksums!
● But help us to demonstrate a closer relationship between files…
● “The sum of the parts is greater than the whole.”
~ Arist!otle
Which we're about to find out!
Workshop!
Results!
Results!
How can we use this?
● Sentencing... while still teaching our machines, we can still close
the net while looking at records manually…
● Discovery: Amazon like results: You might also like this record!
The experiment continues...
● Matches are relative to themselves...
● Algorithms make a difference...
● And perhaps, like genetics... some traits are more dominant than
others...
● Consider working with content in different ways...
– Utilize format bias... normalize
– Separate content from structure and analyse?
● Keep trying things, but at minimum cost... (another agile concept:
minimal viable product)
Conclusion: A bit more miscellany
●Keyword: Interim
●Our needs change constantly, and there's a lot to do…
●Don't suffer paralysis by analysis.
●Do a requirements analysis
●Look at what you can do (minimum viable product) and iterate...
Conclusion: A bit more miscellany
●Lot's of hints to bits 'n' pieces I haven't been able to talk about:
●Role of the community… (They/We're here to help! Same problems!)
●Communication and sharing… (Do it!)
●Software development skills… (There are other ways to be involved)
What's the point? (OPF Blog): http://bit.ly/2ddXnaY
●Maybe also a seed for discussion.
Thank you!

More Related Content

What's hot (7)

Why Link?
Why Link?Why Link?
Why Link?
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
Submitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorialSubmitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorial
 
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)dataSUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
 
OrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KWOrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KW
 
While the Sun Shines: Assessing Born-Digital Holdings Before It's Too Late
While the Sun Shines: Assessing Born-Digital Holdings Before It's Too LateWhile the Sun Shines: Assessing Born-Digital Holdings Before It's Too Late
While the Sun Shines: Assessing Born-Digital Holdings Before It's Too Late
 

Similar to ASA Trial Workshop Slides for Archives NZ [2016-09-28]

What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing DevelopmentCTruncer
 
Python in Industry
Python in IndustryPython in Industry
Python in IndustryDharmit Shah
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Pat Hermens
 
Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...maeste
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationPRELIDA Project
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheapMarc Cluet
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018peterchanws
 
My talk at Linux Piter 2015
My talk at Linux Piter 2015My talk at Linux Piter 2015
My talk at Linux Piter 2015Alex Chistyakov
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversHYS Enterprise
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingMartinStrycek
 
Years of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoopsYears of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoopsKris Buytaert
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Seun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptxSeun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptxSeunLanLege1
 
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...DynamicInfraDays
 
Data Modeling for communication
Data Modeling for communicationData Modeling for communication
Data Modeling for communicationRichard Freggi
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 

Similar to ASA Trial Workshop Slides for Archives NZ [2016-09-28] (20)

What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017
 
Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018
 
My talk at Linux Piter 2015
My talk at Linux Piter 2015My talk at Linux Piter 2015
My talk at Linux Piter 2015
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
 
Messaging
MessagingMessaging
Messaging
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Years of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoopsYears of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoops
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Seun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptxSeun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptx
 
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
 
Data Modeling for communication
Data Modeling for communicationData Modeling for communication
Data Modeling for communication
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 

Recently uploaded

Characterizing wildlife trafficking and associated crime.
Characterizing wildlife trafficking and associated crime.Characterizing wildlife trafficking and associated crime.
Characterizing wildlife trafficking and associated crime.Christina Parmionova
 
Help set up SERUDS Orphanage virtual classroom
Help set up SERUDS Orphanage virtual classroomHelp set up SERUDS Orphanage virtual classroom
Help set up SERUDS Orphanage virtual classroomSERUDS INDIA
 
EDI Executive Education MasterClass- 15thMay 2024 (updated).pdf
EDI Executive Education MasterClass- 15thMay 2024 (updated).pdfEDI Executive Education MasterClass- 15thMay 2024 (updated).pdf
EDI Executive Education MasterClass- 15thMay 2024 (updated).pdfEnergy for One World
 
Spring 2024 Issue Punitive and Productive Suffering
Spring 2024 Issue Punitive and Productive SufferingSpring 2024 Issue Punitive and Productive Suffering
Spring 2024 Issue Punitive and Productive Sufferingyalehistoricalreview
 
一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查
一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查
一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查4ulkzl7ne
 
PPT Item # 7&8 6900 Broadway P&Z Case # 438
PPT Item # 7&8 6900 Broadway P&Z Case # 438PPT Item # 7&8 6900 Broadway P&Z Case # 438
PPT Item # 7&8 6900 Broadway P&Z Case # 438ahcitycouncil
 
Item ## 4a -- April 29, 2024 CCM Minutes
Item ## 4a -- April 29, 2024 CCM MinutesItem ## 4a -- April 29, 2024 CCM Minutes
Item ## 4a -- April 29, 2024 CCM Minutesahcitycouncil
 
International Day of Families - 15 May 2024 - UNDESA.
International Day of Families - 15 May 2024 - UNDESA.International Day of Families - 15 May 2024 - UNDESA.
International Day of Families - 15 May 2024 - UNDESA.Christina Parmionova
 
Yale Historical Review Machava Interview PDF Spring 2024
Yale Historical Review Machava Interview PDF Spring 2024Yale Historical Review Machava Interview PDF Spring 2024
Yale Historical Review Machava Interview PDF Spring 2024yalehistoricalreview
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32JSchaus & Associates
 
Plant health, safe trade and digital technology.
Plant health, safe trade and digital technology.Plant health, safe trade and digital technology.
Plant health, safe trade and digital technology.Christina Parmionova
 
Harbin-Gross-Spring2022.pdf Yale Historical Review
Harbin-Gross-Spring2022.pdf Yale Historical ReviewHarbin-Gross-Spring2022.pdf Yale Historical Review
Harbin-Gross-Spring2022.pdf Yale Historical Reviewyalehistoricalreview
 
Securing the Future | Public Good App House
Securing the Future | Public Good App HouseSecuring the Future | Public Good App House
Securing the Future | Public Good App HouseTechSoup
 
World Wildlife Crime Report 2024 - Introduction
World Wildlife Crime Report 2024 - IntroductionWorld Wildlife Crime Report 2024 - Introduction
World Wildlife Crime Report 2024 - IntroductionChristina Parmionova
 
The 2024 World Wildlife Crime Report tracks all these issues, trends and more...
The 2024 World Wildlife Crime Report tracks all these issues, trends and more...The 2024 World Wildlife Crime Report tracks all these issues, trends and more...
The 2024 World Wildlife Crime Report tracks all these issues, trends and more...Christina Parmionova
 
Nitrogen filled high expansion foam in open Containers
Nitrogen filled high expansion foam in open ContainersNitrogen filled high expansion foam in open Containers
Nitrogen filled high expansion foam in open ContainersHarm Kiezebrink
 
International Day of Plants Health 2024, May 12th.
International Day of Plants Health 2024, May 12th.International Day of Plants Health 2024, May 12th.
International Day of Plants Health 2024, May 12th.Christina Parmionova
 
Managing large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner FarmsManaging large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner FarmsHarm Kiezebrink
 

Recently uploaded (20)

Characterizing wildlife trafficking and associated crime.
Characterizing wildlife trafficking and associated crime.Characterizing wildlife trafficking and associated crime.
Characterizing wildlife trafficking and associated crime.
 
Help set up SERUDS Orphanage virtual classroom
Help set up SERUDS Orphanage virtual classroomHelp set up SERUDS Orphanage virtual classroom
Help set up SERUDS Orphanage virtual classroom
 
EDI Executive Education MasterClass- 15thMay 2024 (updated).pdf
EDI Executive Education MasterClass- 15thMay 2024 (updated).pdfEDI Executive Education MasterClass- 15thMay 2024 (updated).pdf
EDI Executive Education MasterClass- 15thMay 2024 (updated).pdf
 
Spring 2024 Issue Punitive and Productive Suffering
Spring 2024 Issue Punitive and Productive SufferingSpring 2024 Issue Punitive and Productive Suffering
Spring 2024 Issue Punitive and Productive Suffering
 
一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查
一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查
一比一原版新西兰奥塔哥大学毕业证成绩单学位证留信学历认证可查
 
PPT Item # 7&8 6900 Broadway P&Z Case # 438
PPT Item # 7&8 6900 Broadway P&Z Case # 438PPT Item # 7&8 6900 Broadway P&Z Case # 438
PPT Item # 7&8 6900 Broadway P&Z Case # 438
 
Item ## 4a -- April 29, 2024 CCM Minutes
Item ## 4a -- April 29, 2024 CCM MinutesItem ## 4a -- April 29, 2024 CCM Minutes
Item ## 4a -- April 29, 2024 CCM Minutes
 
International Day of Families - 15 May 2024 - UNDESA.
International Day of Families - 15 May 2024 - UNDESA.International Day of Families - 15 May 2024 - UNDESA.
International Day of Families - 15 May 2024 - UNDESA.
 
BioandPicforRepKendrick_LastUpdatedMay2024
BioandPicforRepKendrick_LastUpdatedMay2024BioandPicforRepKendrick_LastUpdatedMay2024
BioandPicforRepKendrick_LastUpdatedMay2024
 
Yale Historical Review Machava Interview PDF Spring 2024
Yale Historical Review Machava Interview PDF Spring 2024Yale Historical Review Machava Interview PDF Spring 2024
Yale Historical Review Machava Interview PDF Spring 2024
 
The Outlook for the Budget and the Economy
The Outlook for the Budget and the EconomyThe Outlook for the Budget and the Economy
The Outlook for the Budget and the Economy
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32
 
Plant health, safe trade and digital technology.
Plant health, safe trade and digital technology.Plant health, safe trade and digital technology.
Plant health, safe trade and digital technology.
 
Harbin-Gross-Spring2022.pdf Yale Historical Review
Harbin-Gross-Spring2022.pdf Yale Historical ReviewHarbin-Gross-Spring2022.pdf Yale Historical Review
Harbin-Gross-Spring2022.pdf Yale Historical Review
 
Securing the Future | Public Good App House
Securing the Future | Public Good App HouseSecuring the Future | Public Good App House
Securing the Future | Public Good App House
 
World Wildlife Crime Report 2024 - Introduction
World Wildlife Crime Report 2024 - IntroductionWorld Wildlife Crime Report 2024 - Introduction
World Wildlife Crime Report 2024 - Introduction
 
The 2024 World Wildlife Crime Report tracks all these issues, trends and more...
The 2024 World Wildlife Crime Report tracks all these issues, trends and more...The 2024 World Wildlife Crime Report tracks all these issues, trends and more...
The 2024 World Wildlife Crime Report tracks all these issues, trends and more...
 
Nitrogen filled high expansion foam in open Containers
Nitrogen filled high expansion foam in open ContainersNitrogen filled high expansion foam in open Containers
Nitrogen filled high expansion foam in open Containers
 
International Day of Plants Health 2024, May 12th.
International Day of Plants Health 2024, May 12th.International Day of Plants Health 2024, May 12th.
International Day of Plants Health 2024, May 12th.
 
Managing large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner FarmsManaging large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner Farms
 

ASA Trial Workshop Slides for Archives NZ [2016-09-28]

  • 1. Preservation Capability Miscellany By Ross Spencer Twitter: @beet_keeper
  • 3.
  • 4. 2014-06-20: Play It Again Conference Report: http://bit.ly/2d8Bnw0 (playitagain.org) 2014-11-25: The Reality of Digital Transfer: http://bit.ly/2ctxocQ (slideshare.net)
  • 5. We (Archives NZ) have got quite far… But there's still a lot more to do…
  • 6. So let's remind ourselves: What is the point? ● Work in concert with agencies and their consultants. ● Generate better information and records management ● Cleaner transfers... ● Create a more open and transparent government where the digital record is concerned... ● DIA’s line... Support New Zealanders to build strong communities by providing access to trusted information and knowledge.
  • 7. And! Digital Preservation ● At this point in time, idiomatic methods of preservation are still forming... ● Whatever the future of archival custodianship... ● Or the future of digital preservation... ● Techniques need to be developed to support agencies with information and records management, and memory institutes with long-term custodianship. ● Don't fall into the processing trap...
  • 8. What can we identify as important? ● Infrastructure/team, supported by the organisation ● Some things work, some don’t; some change... be flexible. ● Work iteratively... ● Look at what you can do... ● Continue to develop... evidence, real use-cases
  • 9. Is it all there for us..?
  • 10. No, but we have a good foundation…
  • 11. Policy... ●Has been a constant in my time here. ●Was a draw to me starting in NZ ●Sets the rules by which we can play… ●Literally, play: bend don’t break ● Achieved through careful stakeholder consultation and consideration of impact. ●Sign-off process at director level. ●Two favourite policies, checksum, pre-conditioning.
  • 12. Team... ●We could always do with more people… ●But we recognise that we've been allowed more folk dedicated to this than some places. ●The team is supported in their decision making and their skills. ●Breakdown: Curious; driven; up-to-date; drive to ‘solve’ born-digital transfer; different but complementary skills… *passion*! ●(And opinionated! ;-) ) ●It doesn’t always look that way but there is a certain amount of leeway from IT support too...
  • 13. Technology...? Rosetta by Ex-Libris: is the Long-term preservation system, it allows us to manage some quite complex bits 'n' pieces… but: ●Does not yet enable transfer from Agency-to-Archives (it supports) ●Is not a clearing house for records ●Spot preservation risks up-front ●Doesn't 'do' sentencing… ●Does not build ingest packages… ●Does not 'do' archival description... ●Does not contain every tool under the sun to handle all the file formats… Machine Learning: http://nautil.us/blog/the-fundamental-limits-of-machine-learning
  • 14. The processes we need are biased toward transfer and ingest… Rosetta can only help so much… ||----------------||---------------------------------------------------------------------------------------------------|| Creation Transfer (Life of a record ~25 years) Life of an archive ~∞ The other processes we will still need will be about (active) long term custodianship… Rosetta is still only beginning that journey...
  • 15. The miscellany in this presentation... A story about the tools that can help us... ● Technical Registries (of practice) ● DROID/Siegfried Analysis Report ● Fuzzy Hashes
  • 16.
  • 17.
  • 18. With everything we need to do… We cannot action it all at the same time...
  • 19. Knowledge needs to remain alive and accessible, record it: Source: https://commons.wikimedia.org/wiki/Category:Kanban#/media/File:Simple_Task_Kanban.jpg
  • 20. Trello: is one option...
  • 21. Features... ● Kanban ● Teams ● Ownership ● Visibility ● Accessibility ● Reduce transitory records ● Create temporality ● Centralize knowledge ● Invite external colleagues
  • 22. DROID/Siegfried Analysis Report ● Example of changing needs and capability ● Initially a plain-text reporting tool ● Evolved into a 'team' tool… ● Evolving into an organisation’s tool… ● Hopefully a community tool… ● Our first port of call for any transfer... * Marriage of DROID and Siegfried: http://bit.ly/2ddS0IP * A little bit more about the tool: http://bit.ly/2dii3jP
  • 23. DROID/Siegfried Analysis Report ● Available to all the community (December 2013): http://bit.ly/2cB8gFY ● Maps DROID and Siegfried output to an SQLite database for querying power and speed. ● Aside from Python, ZERO-dependencies – user needs to be able to download it and go... ● Complete flexibility over output. ● TXT, HTML, Rogues, Heroes… Normalization via database layer – write your own! ● Normalization via database layer – abstracted for multiple ID tools ● The tools each do what they're supposed to well, the dissection of output can be left to others. * Marriage of DROID and Siegfried (OPF Blog): http://bit.ly/2ddS0IP * A little bit more about the tool (OPF Blog): http://bit.ly/2dii3jP
  • 24.
  • 27. Let’s have a look… http://bit.ly/2dircst
  • 28. Benefits... ● Sets a baseline for a lingua franca… beginners and experts alike... ● Definitions contributed by our archivists! ● Easier on the eye ● Re-factored to be more flexible ● Give it a try! Let us know how it goes!
  • 29. Checksums ● Look like: – MD5: d41d8cd98f00b204e9800998ecf8427e – SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709
  • 31. Checksums ● Looking to be unique – De-duplication – Fixity ● No connection between – Security function – Cannot reverse
  • 32. But every file has a connection... ● Binary ● File Format ● Textual Content ● Embedded Content ● Template ● Author ● Like DNA, with many different strands to dissect... ● Fuzzy Hashing!
  • 33. Fuzzy Hashing: SSDEEP Source: https://github.com/KLDavies/ssdeep/
  • 34. Fuzzy Hashing: tlsh Source: https://github.com/trendmicro/tlsh
  • 35. And they look like... ● aad371039d588b43e02887f87e570f6d2b1a7f1da89667ef11227d 9b3e706610d8e12d ● 0dc36013dd088b43e02983f87e534e6d2b1a7f1da88627ef11267d 8b3e716610d9e16d ● Not that different from regular checksums! ● But help us to demonstrate a closer relationship between files… ● “The sum of the parts is greater than the whole.” ~ Arist!otle
  • 36. Which we're about to find out!
  • 40. How can we use this? ● Sentencing... while still teaching our machines, we can still close the net while looking at records manually… ● Discovery: Amazon like results: You might also like this record!
  • 41. The experiment continues... ● Matches are relative to themselves... ● Algorithms make a difference... ● And perhaps, like genetics... some traits are more dominant than others... ● Consider working with content in different ways... – Utilize format bias... normalize – Separate content from structure and analyse? ● Keep trying things, but at minimum cost... (another agile concept: minimal viable product)
  • 42.
  • 43. Conclusion: A bit more miscellany ●Keyword: Interim ●Our needs change constantly, and there's a lot to do… ●Don't suffer paralysis by analysis. ●Do a requirements analysis ●Look at what you can do (minimum viable product) and iterate...
  • 44. Conclusion: A bit more miscellany ●Lot's of hints to bits 'n' pieces I haven't been able to talk about: ●Role of the community… (They/We're here to help! Same problems!) ●Communication and sharing… (Do it!) ●Software development skills… (There are other ways to be involved) What's the point? (OPF Blog): http://bit.ly/2ddXnaY ●Maybe also a seed for discussion.