By Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen. Presented at NEM Summit 2017, 29/30 November 2017 in Madrid. Paper can be found here: http://publications.beeldengeluid.nl/pub/570
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive
1. New Life for Old Media
Investigations into Speech Synthesis and Deep Learning-based Colorization for
Audiovisual Archive
Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen
3. 70% audio-visual heritage material
More than 1.000.000 hrs of
TV (public broadcasters)
Radio, Music,Documentaries, Film, Commercials,
etc.
Photographs, objects, …
4.
5. CC BY - SA as preferable license
3000 items “Internet Quality”
Polygoon newsreels
Supporting a National and
European Audiovisual Commons
Public outreach by embracing
new technologies and
‘participatory culture’
Openbeelden.nl / openimages.eu
6. Explore AI techniques to enrich this archival
material to allow for new types of engagement
1. Text-To-Speech engine based on limited single narrator
2. Colorization of old black-and-white video footage
8. Limited Domain Speech Synthesis
Can the current corpus of audio recordings
of Bloemendal be used to construct a TTS
engine?
• Percentage of the Dutch language can be
generated with the current corpus?
• What can we do to improve?
• How well is the text-to-speech engine
recognizable as Philip Bloemendal?
• How understandable are the constructed
audio files?
9. Text:
Audio:
The Dutch football played Germany
the.wav dutch.wav football.wav
Spoken Language
Elements Repository
(35,000 words)
team
Slot-and-filler Text-to-speech
3,300 newsreels,
speech recognition
10. How to expand the coverage of the index?
•Many (contemporary) words have not
been pronounced by Philip Bloemendal
•Multiple strategies
–Change format (Lowercase, diaeresis)
–Numbers
–Finding synonyms
–Decompounding
12. Decompounding
• Dutch language allows for
compounding words, each word is
distinct in the corpus
• Decompounding is
computationally expensive (for
large corpora, long words)
• Constructed Bigrams and Trigrams
School, hoofd -> Schoolhoofd
Regen, water -> regenwater
Staat, hoofd -> StaatShoofd
13. 4 corpora to test against
•News articles (same domain, different time) | 50 articles, 2743 unique words
•1970s news articles from the (same domain, time) | 50 articles | 16,191 words
•E-books (different domain, various times) |6 books | 2,657 words
•Tweets (different domain, different time) | 1000 tweets| 27,180 words
• Evaluation
– Number of distinct words
– Number of sentences
Evaluation
15. • 8 people tested the software
• Philip was recognized (or ‘that news guy’)
• Words with more consonants were easier to recognize
• When user input their own sentences, more recognition
• When sentences were demonstrated without subtitles, less
• Speed of software / GUI limited testing capabilities
How recognisable are sentences?
16. The use of Deep Neural Networks in colorizing video
17. Neural Networks
Recent progress in computational power made implementation
of Deep Neural Nets possible
Neural Networks trained on large training set can accurately
make predictions in real-world examples
18. Zhang et al. (2012) trained a neural net
on over a million images for colorization
http://richzhang.github.io/colorization/
Existing Literature
19. • Extract individual frames from video using FFMPEG
• Colorize each individual frame
• Re-compile video and attach original audio file
Outcome
Extract 200x200
frames 24fps
(ffmpeg)
Zhang et al. implemented in
TensorFlow
Combine into
videos (ffmpeg)
Implementation on Video
20. • Colorized videos are more ‘tangible’ and ‘alive’ than black/white
• Showing colorized Polygoonjournaals can augment TTS engine
• General positive responses on technology may increase attention to NISV collection
Outcome
22. • Each frame is considered
independent and is colorized as such
--> Artifacts appear between frames
• Slow performance without use of
Nvidia GPU
• Low resolution
• Predicted colors still far from perfect
Challenges
24. • Collection-specific TTS systems for audio-enrichments of archive
material or multimedia applications.
• Colorization of old media allows for a new view on existing images
• NISV will continue investigating these emerging technologies to
enable new types of interaction and to further engage new
audiences with archival material in unexpected ways.
– In the media museum
– On its public-facing online channels.
Take home
25. New Life for old Media:
Investigations into Speech Synthesis and Deep Learning-based Colorization for
Audiovisual Archive
Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen
Thank you
26. Annex: Results (sentences)
Dataset Unique sentences Unique sentences
found
After synsets After
decompounding
Contemporary news 1022 106 110 186
Old news 2626 183 190 301
Tweets 8937 174 181 296
Books 56106 9387 11385 18271