3. Managing digital collections,
trying to quantify digital files
that are stored on different
carriers and multiple
locations…
isn’t this exactly what digital
archivists are trying, day
after day?
https://blog.archiveshub.jisc.ac.uk/2012/07/07/the-modern-
archivist-working-with-people-and-technology/
5. The VIAA model
professional
and efficient
digitization
sustainable
digital
preservation
a non-exclusive
right to use the
CP’s content on
our own
platforms
archivingdigitization (re-)use
10. After 6 years of digitization we noticed the need at
VIAA CP’s for a new service
A Large scale acquisition and preservation
service for (born) digital archives
13. Questionaire
• Sent out in 2016 to 100 CP’s
• Question about:
the types of digital collections and
variations of file formats
Audio, video, images (photo collections)
Mpeg, wav, mp3, tiff, jpeg, …
the type of storage that is used to
store these collections at the CP’s
RISK management
Hard disks
NAS
External services
the urgency for an external service
that provides long term preservation
14. Answers from 64 organization
59 organisations replied that they
are in need of extra service on
preservation of their digital
collections
Breakdown of the type of
organisations
Feedback questionnaire (2016)
Archives
22
Museums
12
Herritage cells
15
Broadcasters
7
16. Heritage cells have the
largest audio collections
Followed by the archives
and museums
Size of audio collections
17. Size of photo collections
Heritage cells and
archives have the
largest photo
collections
Followed by the
museums and
libraries
18. State Library of Queensland
Pilot projects
VIAA made a selection based
upon
• the urgency for the CP and
• the type of collection
that was registered in the
questionnaire
19. Why pilot projects?
get more hands-on experience on the different types of
collections and new file formats
Identify all needs to be able to deliver a scalable service
(today +150 CP’s) for long term preservation
20. Pilot projects during 2017-2018
• Broadcasters
• VRT
• Focus-WTV
• BRUZZ
• RTV
• Performing arts
• Rosas
• Ultima Vez
• Museums
• Industriemuseum
• Huis van Alijn
• Heritage cell
• Erfgoedcel K.ERF
23. Learnings
1. The timing of a pilot project was very unpredictable due to the
dependence on the time spent at the project by the CP
2. Need for more technological knowledge and user-friendly tools
at the CP
3. Very heterogeneous collections, technical expertise is required
25. Learnings put into practice
1. Clear project scope definition
2. The current tools are too technical User-friendly tools + training
3. Project lead time must be shorter and continuity must be monitored
Extra manpower at VIAA + project management to watch over scope
and timing
4. Heterogeneous collections = complexity Further development of
technical expertise
26. 1. Scope definition
IN SCOPE
Type of file: Single files
Type of collection:
• Audio
• Video
• photo(scans)
NOT IN SCOPE
Type of file: Complex objects
Type of collection:
• Websites
• Documents
• Games
• mail
• …
27. 2. User-friendly tools + training
• Learn how to make an inventory (using DROID)
• Make a selection (In scope vs out of scope)
• What makes a good archive master?
• Create md5 checksums
• Metadata mapping
33. 3. Projectmanagement
New methodology to deploy the service on a large scale
• Working with fixed milestones
• 2 types of projects:
• Continuous integration
• Batch intake
Extra tooling for monitoring the projects
36. Project planning
2 types of projects:
1. Continuous integration (using FTP upload protocol)
1. More development needed
2. Upload file + metadata (in VIAA datamodel format) on FTP
2. Batch intake
1. Define a batch (part of you collection)
2. Create a file based inventory
3. Calculate md5 checksums for each file
4. Export your metadata in a structured format (csv, xml)
5. Metadatamapping to the VIAA datamodel
6. Transfer the file via FTP or VIAA hard disk
38. What has happened so far?
• 55 CP’s are included in born digital projects
• Pilot projects + 2 waves
• Total sum = 85 TB
• 27 different extensions and pronom id’s
• 18 mime types
39.
40.
41. Future work: Gradually extend the
scope and service
• Complex objects
• PILOTS:
• Ingest of newspaper digitisationprojects
• RESEARCH:
• how to build a generic ingest workflow for complex
objects
• Ongoing
43. Questions?
• To white list or not to white list?
• We are still investigating the scope of a white list
• Pre or post ingest transformation / transcoding?
• Pro’s and con’s?
• Levels of preservation planning
• Today = bit level preservation
• Reporting on file formats (codec and containers)
• Future work = preservation watch, risk management and
transformation / migration
Hinweis der Redaktion
What you see in this picture is an artwork of the Belgian artist, Jan Fabre. At work is called ‘the man who is measuring clouds’.
It may look farfetched… but during this presentation I’m going to convince you of the similarities between a digital archivist, an artist and maybe a weather man. (because I’ll talk a lot about clouds, waves etc..)
So in a way I’m happy to notice that what we are doing at VIAA with digital collections feels somehow the same of being an artist :) during this presentation I will give you more insight on how may times we fell of the ladder and went back up to try to measure the clouds, and how these digital collections have the same characteristics of clouds and how we found tools to measure them.
What you see in this artwork is an artist standing on a ladder reaching as far as possible to measure the impossible, you can almost imagine the artist falling down, struggling to get back up and continue his struggle.. Well.. There are some similarities in how we experience working with digital collections.
Our national audiovisual heritage is … almost everywhere! Scattered amongst libraries, archives, museums, broadcasters, universities, arts organisations, private collections, private companies, research centers, governmental bodies, … and these institutions almost never have the technical infrastructure, expertise or the money to cope with this.
Measuring is only the start, once you know what is inside your collection you can start managing you digital collection.
eigenlijk zijn dit al de oplossingen, niet de noden.
je zou kunnen zeggen:
noden:
- tools zijn vaak nog te technisch => gebruiksvriendelijker tooling voor VIAA en content partners
- doorlooptijd voor projecten moet korter => meer tijd voor projectopvolging en begeleiding door VIAA nodig
- collecties bij content partners zijn vaak heterogeen, bvb deels wel al geregistreerd, soms niet, soms deels op fileservers, adlib of andere. => als we dit willen opschalen is er nood aan uniformisering voor we kunnen importeren.
Scoping.. Start with the easies part and get things done..