1. The Hourglass Model:
metadata in AV archiving beyond the buzzwords
Brecht Declercq, VIAA, Belgium
Sexto Seminario Internacional de Archivos Sonoros y Audiovisuales
Mexico City, 24.06.2014
14. Linked metadata
Linked open data
User generated metadata
Automatically extracted metadata
Preservation metadata
Descriptive metadata
etc ...
Production metadata
15. Linked metadata
Linked open data
User generated metadata
Automatically extracted metadata
Preservation metadata
Descriptive metadata
etc ...
Production metadata
16. Linked metadata
Linked open data
User generated metadata
Automatically extracted metadata
Preservation metadata
Descriptive metadata
etc ...
Production metadata
17. Linked metadata
Linked open data
User generated metadata
Automatically extracted metadata
Preservation metadata
Descriptive metadata
etc ...
Production metadata
18. Linked metadata
Linked open data
User generated metadata
Automatically extracted metadata
Preservation metadata
Descriptive metadata
etc ...
26. BBC World Service Archive Prototype
Multi combinations
PRODUCTION METADATA
DESCRIPTION
+
AUTOMATICALLY GENERATED
METADATA
+
USER GENERATED METADATA
ASR
Different methods reinforcing eachother !
27. • To know the strengths and weaknesses of
various ways to create descriptive metadata.
• To know your collection: what characteristics
does it have?
• To know which goal(s) you’re aiming at before
applying a way to create your metadata.
Making the right decision which tool to use is a key
competence of the next generation of AV archivists
Key competences
A few weeks ago:
Project by University of Oxford together with BBC Information & Archives.
Automatic cataloging of faces, objects, quotes, ... Of news broadcasts of the last five years!
Wonderful project, isn’t it?
Those who agree rise their hand!
Maybe this research project came right on time, isn’t it? Because ...
Challenges for audiovisual archives these days are huge!
- We have to deal with ever bigger quantities...
We’re supposed to open up the collection and make it accessible in all kinds of ways...
The collection has to be made accessible for education for example, we heard it only yesterday...
And then of course we’re supposed to link the collection with other collections, enhance and contextualize it...
Now metadata, it won’t surprise you, is in this context more important than ever before...
But hey!
Also there things are changing...
Because these days are probably over...
But we have to deal with metadata coming directly from production nowadays...
There’s user generated metadata:
crowdsourcing the metadata by creating a community and asking them to help with the description of your content.
And then there’s this monster of Loch Ness of Av archiving:
Of which many have heard speaking... But only few saw it in action:
Automatic feature extraction... Such as face recognition.
... Or even ... identification
And although it’s now done in all kinds of electronical databases or MAM systems ...
A little manual work is still needed, isn’t it?
So with these huge challenges and all these new kinds of metadata...
As a cataloger, and even as an audiovisual archives manager
It is ever more difficult to keep the overview... Right?
Because linked metadata.... Can they be user generated?
And preservation metadata... Sometimes they’re automatically extracted... Right?
And production metadata... Is that something like preservation metadata?
These buzzwords... The only thing they seem to do is make the fuzz bigger!
So let’s stand back for a second.
Look where we are...
And where we are going.
Because if we just put these fancy terms together...
That’s what we like to as archivists
And we focus particularly on the nominators...
It appears that some are talking about the origin of the metadata...
And some about the purpose, the goal, the result they aim at.
So what I’d like to do, is to offer a model...
In which someone sometime recognized an Hourglass.
With a creation level.
A purpose level.
And a processing level in between the two.
And now we can start to fill in this model.
First on the creation level: let’s fill in the four big categories in which metadata are created these days.
And then let’s go down.
And fill in all the goals these metadata are serving.
And then it’s up to the archivist to make some decisions, right?
Cause this one is very classic: manually describing the content for the users to find it.
And who doesn’t use metadata from production to manage the collection.
But just think about this one: using user generated metadata... For digital preservation... E.g.: can labor intensive quality control of digitization some day be done ... by users?
Or can automatically generated metadata be used as the basis for enhancement and contextualisation, for example by transforming it into linked data?
The conclusion is, that if you start to make combinations, old ways of thinking still fit in, while new possibilities arise.
But let’s take this just one step further...
Because we can also make multi-combinations.
Indeed, several sources can be used to serve one or more purposes.
Of course, it’s up to the archivist then to manage the process, and to control the quality of it.
Cause there’s this wonderful example, of a BBC project, the so called World Service Archive Prototype.
It uses no less than three different sources to increase search and retrieval.
Do have a look at their website, it’s there far better explained than I can it these short minutes I have.
The nice thing is... That different methods reinforce eachother... The production metadata, written on the boxes... Is actually quite poor, but it can serve as a first identification.
But there’s the speech-to-text, helping already a lot, albeit with lots of errors...
And then there’s the crowd, correcting the user generated metadata
Because then it will allow you...
To change your screwdriver... That multipurpose goody that you used to use also as a hammer sometimes, or as a knife...
For a whole fancy toolbox.
Of well suited tools...
For particular circumstances.
And as my very last slide... I would like to come back to this BBC and Oxford University project...
Of course... Who am I to criticise the BBC? It will always hold my admiration for so many reasons.
And it’s wonderful that research on feature extraction tools like face recognition is continued...
But to make a collection of very recent newscasts searchable.... There ‘s maybe another method.
Way less hi-tech...
But also way more efficient...
And less error prone...
How about some metadata from production?
Subtitling for the hearing impaired?
Journalist’s texts from iNews?