New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive

New Life for Old Media
Investigations into Speech Synthesis and Deep Learning-based Colorization for
Audiovisual Archive
Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen

Netherlands Institute for Sound and Vision (NISV)

70% audio-visual heritage material
More than 1.000.000 hrs of
TV (public broadcasters)
Radio, Music,Documentaries, Film, Commercials,
etc.
Photographs, objects, …

CC BY - SA as preferable license
3000 items “Internet Quality”
Polygoon newsreels
Supporting a National and
European Audiovisual Commons
Public outreach by embracing
new technologies and
‘participatory culture’
Openbeelden.nl / openimages.eu

Explore AI techniques to enrich this archival
material to allow for new types of engagement
1. Text-To-Speech engine based on limited single narrator
2. Colorization of old black-and-white video footage

Philip Bloemendal
Famous anchorman
Iconic voice
tiny.cc/voiceNL
(not a virus)

Limited Domain Speech Synthesis
Can the current corpus of audio recordings
of Bloemendal be used to construct a TTS
engine?
• Percentage of the Dutch language can be
generated with the current corpus?
• What can we do to improve?
• How well is the text-to-speech engine
recognizable as Philip Bloemendal?
• How understandable are the constructed
audio files?

Text:
Audio:
The Dutch football played Germany
the.wav dutch.wav football.wav
Spoken Language
Elements Repository
(35,000 words)
team
Slot-and-filler Text-to-speech
3,300 newsreels,
speech recognition

How to expand the coverage of the index?
•Many (contemporary) words have not
been pronounced by Philip Bloemendal
•Multiple strategies
–Change format (Lowercase, diaeresis)
–Numbers
–Finding synonyms
–Decompounding

Finding Synonyms
• Open Dutch Wordnet
Dutch lexical semantic database
(Postma et al. 2016)
• Yields synsets
(e.g. Hoofdmeester -> Rector, Schoolhoofd)
• Computationally expensive lookup

Decompounding
• Dutch language allows for
compounding words, each word is
distinct in the corpus
• Decompounding is
computationally expensive (for
large corpora, long words)
• Constructed Bigrams and Trigrams
School, hoofd -> Schoolhoofd
Regen, water -> regenwater
Staat, hoofd -> StaatShoofd

4 corpora to test against
•News articles (same domain, different time) | 50 articles, 2743 unique words
•1970s news articles from the (same domain, time) | 50 articles | 16,191 words
•E-books (different domain, various times) |6 books | 2,657 words
•Tweets (different domain, different time) | 1000 tweets| 27,180 words
• Evaluation
– Number of distinct words
– Number of sentences
Evaluation

• 8 people tested the software
• Philip was recognized (or ‘that news guy’)
• Words with more consonants were easier to recognize
• When user input their own sentences, more recognition
• When sentences were demonstrated without subtitles, less
• Speed of software / GUI limited testing capabilities
How recognisable are sentences?

The use of Deep Neural Networks in colorizing video

Neural Networks
Recent progress in computational power made implementation
of Deep Neural Nets possible
Neural Networks trained on large training set can accurately
make predictions in real-world examples

Zhang et al. (2012) trained a neural net
on over a million images for colorization
http://richzhang.github.io/colorization/
Existing Literature

• Extract individual frames from video using FFMPEG
• Colorize each individual frame
• Re-compile video and attach original audio file
Outcome
Extract 200x200
frames 24fps
(ffmpeg)
Zhang et al. implemented in
TensorFlow
Combine into
videos (ffmpeg)
Implementation on Video

• Colorized videos are more ‘tangible’ and ‘alive’ than black/white
• Showing colorized Polygoonjournaals can augment TTS engine
• General positive responses on technology may increase attention to NISV collection
Outcome

• Each frame is considered
independent and is colorized as such
--> Artifacts appear between frames
• Slow performance without use of
Nvidia GPU
• Low resolution
• Predicted colors still far from perfect
Challenges

www.openbeelden.nl/tags/ingekleurd
Hosted on Openbeelden
platform
One of the colorized videos
received 61,000+ views, 1,700
likes and was shared 521 times,
illustrating the potential to
engage new audiences.
tiny.cc/colorNL

• Collection-specific TTS systems for audio-enrichments of archive
material or multimedia applications.
• Colorization of old media allows for a new view on existing images
• NISV will continue investigating these emerging technologies to
enable new types of interaction and to further engage new
audiences with archival material in unexpected ways.
– In the media museum
– On its public-facing online channels.
Take home

New Life for old Media:
Investigations into Speech Synthesis and Deep Learning-based Colorization for
Audiovisual Archive
Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen
Thank you

Annex: Results (sentences)
Dataset Unique sentences Unique sentences
found
After synsets After
decompounding
Contemporary news 1022 106 110 186
Old news 2626 183 190 301
Tweets 8937 174 181 296
Books 56106 9387 11385 18271

New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive

Ähnlich wie New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive (20)

Mehr von Sound and Vision R&D

Mehr von Sound and Vision R&D (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive