SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Speech synthesis based on a
limited speech corpus
Rudy Marsman | VU University | NISV
Netherlands Institute for Sound and Vision
(NISV) | Beeld & Geluid
Beeld en Geluid
• collects, preserves and opens the Dutch audiovisual heritage for as
many users as possible
• one of the largest audiovisual archives in Europe. The institute
manages over 70 percent of the Dutch audiovisual heritage
• Was interested in ways to re-use old Polygoonjournaals footage
• Text-To-Speech engine based on Philip Bloemendal
Philip Bloemendal
• Famous anchorman
• Iconic voice
• https://www.youtube.com/watch?v=31tClHJ2tfQ
Research
• Can the current corpus of audio recordings of Bloemendal be used
to construct a TTS engine?
• How large percentage of the Dutch language can be constructed with the
current corpus?
• What can we do to improve?
• How well is the text-to-speech engine recognizable as Philip Bloemendal?
• How well comprehensive are the constructed audiofiles?
How large percentage of the Dutch language can
be constructed with the current corpus?
• Constructing the corpus
• How many ‘Polygoonjournaals’
• Openbeelden – OAI (Open Archives Initiative)
• Extract audio
• Speech analysis – roughly 35000 distinct words
• XML files
• Evaluation
• Metrics
• Corpora
• Language changes
How large percentage of the Dutch language can
be constructed with the current corpus?
• Approach: 4 corpora to test against
• Contemporary news articles (same domain, different time) | 50 articles
• News articles from the 1970s (same domain, time) | 50 articles
• E-books (different domain, various times) |6 books
• Tweets (different domain, different time) | 1000 tweets
• Evaluation
• Number of distinct words
• Number of sentences
What can we do to improve performance?
• It is to be expected that many (contemporary) words have not
been pronounced by Philip
• Various approaches
• Change format (Lowercase, diareses)
• Numbers
• Finding synonyms
• Decompounding
Finding Synonyms
• Open Dutch Wordnet: Dutch lexical semantic database
• Maarten Postma et al.
• Yields synsets (e.g. Hoofdmeester -> Rector, Schoolhoofd)
• Computationally expensive
Decompounding
• Dutch language allows for compounding words
• School, hoofd -> Schoolhoofd
• Regen, water -> regenwater
• Staat, hoofd -> StaatShoofd
• Each word is distinct in the corpus
• Decompounding is computationally expensive
• Computationally expensive for large corpora, long words
• Constructed Bigrams and Trigrams
Results (words)
Dataset Unique words Unique words
found
After synsets After
decompounding
Contemporary
news
2743 2019 2106 2448
Old news 16191 7703 8261 11541
Tweets 27180 7692 8446 13440
Books 26575 11440 12922 20207
Results (sentences)
Dataset Unique
sentences
Unique
sentences
found
After synsets After
decompounding
Contemporary
news
1022 106 110 186
Old news 2626 183 190 301
Tweets 8937 174 181 296
Books 56106 9387 11385 18271
How comprehensible / recognizable are
sentences
• 8 people tested the software
• Philip was recognized (or ‘that news guy’)
• Words with more consonants were easier to recognize
• When user input their own sentences, more recognition
• When sentences were demonstrated without subtitles, less
• Speed of software / GUI limited testing capabilities
The use of Deep Neural
Networks in colorizing video
Rudy Marsman | VU University | NISV
Neural Networks
• Recent progress in computational power made implementation of
Deep Neural Nets possible
• Neural Net trained on large training set can accurately make
predictions in real-world examples
Zhang et al.
• Richard Zhang et al. trained a neural net to colorize images
• Trained on over a million images
• Fools humans into thinking colorized photo is original 20% of time
• Resizes image to fit input layer of 200x200 pixels
• Gained popularity in news website / forums
Zhang et al.
Implementation on video
• Extract individual frames from video using FFMPEG
• Colorize each individual frame
• Re-compile video and attach original audio file
Example
• https://www.youtube.com/watch?v=olsO2rOy_i4
Applications
• Colorized videos are more ‘tangible’ and ‘alive’ than black/white
• Showing colorized Polygoonjournaals can augment TTS engine
• General positive responses on technology may increase attention
to NISV collection
• NISV Employees were enthousiastic
Issues
• Each frame is considered independent and is colorized thusly
• Artifacts appear between frames
• Slow performance without use of Nvidia GPU
• Low resolution
• Predicted colors still far from perfect
Conclusions
• Current corpus covers many of often used words
• Various implemented approacheds increase coverage
• Low coverage for sentences -> real world approach may need
improvement
• Audio is recognizable and understandable
• Neural Networks may be used to colorize video footage
Discussion

Weitere ähnliche Inhalte

Ähnlich wie Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus

VRAlocal14: Is This Thing On, Hoover
VRAlocal14: Is This Thing On, HooverVRAlocal14: Is This Thing On, Hoover
VRAlocal14: Is This Thing On, Hoover
VanderbiltVRC
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Lucidworks
 
A la recherche
 A la recherche A la recherche
A la recherche
Ed Weiss
 
World language : technology
World language : technologyWorld language : technology
World language : technology
hhs
 

Ähnlich wie Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus (20)

New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...
 
New Life for Old Media (NEM presentation)
New Life for Old Media  (NEM presentation)New Life for Old Media  (NEM presentation)
New Life for Old Media (NEM presentation)
 
The tipping point
The tipping pointThe tipping point
The tipping point
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
VRAlocal14: Is This Thing On, Hoover
VRAlocal14: Is This Thing On, HooverVRAlocal14: Is This Thing On, Hoover
VRAlocal14: Is This Thing On, Hoover
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
 
CICLing 2016
CICLing 2016CICLing 2016
CICLing 2016
 
A la recherche
 A la recherche A la recherche
A la recherche
 
A la recherche
 A la recherche A la recherche
A la recherche
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
 
Research and Development at Sound and Vision
Research and Development at Sound and Vision Research and Development at Sound and Vision
Research and Development at Sound and Vision
 
Report of the second FAIRDOM foundry
Report of the second FAIRDOM foundryReport of the second FAIRDOM foundry
Report of the second FAIRDOM foundry
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
 
A Video Corpus for Language Learning: Open Source Tools & Materials from the ...
A Video Corpus for Language Learning: Open Source Tools & Materials from the ...A Video Corpus for Language Learning: Open Source Tools & Materials from the ...
A Video Corpus for Language Learning: Open Source Tools & Materials from the ...
 
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
 
World language : technology
World language : technologyWorld language : technology
World language : technology
 
Intro
IntroIntro
Intro
 
Intro
IntroIntro
Intro
 
Automatic transcription of video files sig media
Automatic transcription of video files   sig mediaAutomatic transcription of video files   sig media
Automatic transcription of video files sig media
 
Transkribus | Günter Mühlberger
Transkribus | Günter MühlbergerTranskribus | Günter Mühlberger
Transkribus | Günter Mühlberger
 

Mehr von Victor de Boer

Mehr von Victor de Boer (20)

One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic Web
 
Linked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesLinked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media Archives
 
The Benefits of Linking Metadata for Internal and External users of an Audiov...
The Benefits of Linking Metadata for Internal and External users of an Audiov...The Benefits of Linking Metadata for Internal and External users of an Audiov...
The Benefits of Linking Metadata for Internal and External users of an Audiov...
 
UX Challenges of Information Organisation: Assessment of Language Impairment ...
UX Challenges of Information Organisation: Assessment of Language Impairment ...UX Challenges of Information Organisation: Assessment of Language Impairment ...
UX Challenges of Information Organisation: Assessment of Language Impairment ...
 
Interactive Dance Choreography Assistance presentation for ACE entertainment ...
Interactive Dance Choreography Assistance presentation for ACE entertainment ...Interactive Dance Choreography Assistance presentation for ACE entertainment ...
Interactive Dance Choreography Assistance presentation for ACE entertainment ...
 
Fahad Ali's slides for Machine to-machine communication in rural conditions ...
Fahad Ali's slides for Machine to-machine communication in rural conditions  ...Fahad Ali's slides for Machine to-machine communication in rural conditions  ...
Fahad Ali's slides for Machine to-machine communication in rural conditions ...
 
Linking African Traditional Medicine Knowledge - by Gossa Lo
Linking African Traditional Medicine Knowledge - by Gossa LoLinking African Traditional Medicine Knowledge - by Gossa Lo
Linking African Traditional Medicine Knowledge - by Gossa Lo
 
Enriching Media Collections for Event-based Exploration
Enriching Media Collections for Event-based ExplorationEnriching Media Collections for Event-based Exploration
Enriching Media Collections for Event-based Exploration
 
User-centered Data Science for Digital Humanities
User-centered Data Science for Digital HumanitiesUser-centered Data Science for Digital Humanities
User-centered Data Science for Digital Humanities
 
Linked Data for Audiovisual Archives (Guest lecture at NISV)
Linked Data for Audiovisual Archives (Guest lecture at NISV)Linked Data for Audiovisual Archives (Guest lecture at NISV)
Linked Data for Audiovisual Archives (Guest lecture at NISV)
 
Semantic Technology for Development: Semantic Web without the Web?
Semantic Technology for Development: Semantic Web without the Web?Semantic Technology for Development: Semantic Web without the Web?
Semantic Technology for Development: Semantic Web without the Web?
 
DIVE+ and Events at EVENTS2017
DIVE+ and Events at EVENTS2017DIVE+ and Events at EVENTS2017
DIVE+ and Events at EVENTS2017
 
About Cultuurlink
About CultuurlinkAbout Cultuurlink
About Cultuurlink
 
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
 
Kasadaka and ICT4D at VU
Kasadaka and ICT4D at VUKasadaka and ICT4D at VU
Kasadaka and ICT4D at VU
 
VU ICT4D symposium 2017 Francis Dittoh Mr. Meteo
VU ICT4D symposium 2017 Francis Dittoh  Mr. MeteoVU ICT4D symposium 2017 Francis Dittoh  Mr. Meteo
VU ICT4D symposium 2017 Francis Dittoh Mr. Meteo
 
VU ICT4D symposium 2017 Chris van Aart
VU ICT4D symposium 2017 Chris van AartVU ICT4D symposium 2017 Chris van Aart
VU ICT4D symposium 2017 Chris van Aart
 
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
 
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
 
Downscale for sustainability Downscale 2016 Anna Bon
Downscale for sustainability Downscale 2016 Anna BonDownscale for sustainability Downscale 2016 Anna Bon
Downscale for sustainability Downscale 2016 Anna Bon
 

Kürzlich hochgeladen

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Rudy Marsman's thesis presentation slides: Speech synthesis based on a limited speech corpus

  • 1. Speech synthesis based on a limited speech corpus Rudy Marsman | VU University | NISV
  • 2. Netherlands Institute for Sound and Vision (NISV) | Beeld & Geluid
  • 3. Beeld en Geluid • collects, preserves and opens the Dutch audiovisual heritage for as many users as possible • one of the largest audiovisual archives in Europe. The institute manages over 70 percent of the Dutch audiovisual heritage • Was interested in ways to re-use old Polygoonjournaals footage • Text-To-Speech engine based on Philip Bloemendal
  • 4. Philip Bloemendal • Famous anchorman • Iconic voice • https://www.youtube.com/watch?v=31tClHJ2tfQ
  • 5. Research • Can the current corpus of audio recordings of Bloemendal be used to construct a TTS engine? • How large percentage of the Dutch language can be constructed with the current corpus? • What can we do to improve? • How well is the text-to-speech engine recognizable as Philip Bloemendal? • How well comprehensive are the constructed audiofiles?
  • 6. How large percentage of the Dutch language can be constructed with the current corpus? • Constructing the corpus • How many ‘Polygoonjournaals’ • Openbeelden – OAI (Open Archives Initiative) • Extract audio • Speech analysis – roughly 35000 distinct words • XML files • Evaluation • Metrics • Corpora • Language changes
  • 7. How large percentage of the Dutch language can be constructed with the current corpus? • Approach: 4 corpora to test against • Contemporary news articles (same domain, different time) | 50 articles • News articles from the 1970s (same domain, time) | 50 articles • E-books (different domain, various times) |6 books • Tweets (different domain, different time) | 1000 tweets • Evaluation • Number of distinct words • Number of sentences
  • 8. What can we do to improve performance? • It is to be expected that many (contemporary) words have not been pronounced by Philip • Various approaches • Change format (Lowercase, diareses) • Numbers • Finding synonyms • Decompounding
  • 9. Finding Synonyms • Open Dutch Wordnet: Dutch lexical semantic database • Maarten Postma et al. • Yields synsets (e.g. Hoofdmeester -> Rector, Schoolhoofd) • Computationally expensive
  • 10. Decompounding • Dutch language allows for compounding words • School, hoofd -> Schoolhoofd • Regen, water -> regenwater • Staat, hoofd -> StaatShoofd • Each word is distinct in the corpus • Decompounding is computationally expensive • Computationally expensive for large corpora, long words • Constructed Bigrams and Trigrams
  • 11. Results (words) Dataset Unique words Unique words found After synsets After decompounding Contemporary news 2743 2019 2106 2448 Old news 16191 7703 8261 11541 Tweets 27180 7692 8446 13440 Books 26575 11440 12922 20207
  • 12. Results (sentences) Dataset Unique sentences Unique sentences found After synsets After decompounding Contemporary news 1022 106 110 186 Old news 2626 183 190 301 Tweets 8937 174 181 296 Books 56106 9387 11385 18271
  • 13. How comprehensible / recognizable are sentences • 8 people tested the software • Philip was recognized (or ‘that news guy’) • Words with more consonants were easier to recognize • When user input their own sentences, more recognition • When sentences were demonstrated without subtitles, less • Speed of software / GUI limited testing capabilities
  • 14. The use of Deep Neural Networks in colorizing video Rudy Marsman | VU University | NISV
  • 15. Neural Networks • Recent progress in computational power made implementation of Deep Neural Nets possible • Neural Net trained on large training set can accurately make predictions in real-world examples
  • 16. Zhang et al. • Richard Zhang et al. trained a neural net to colorize images • Trained on over a million images • Fools humans into thinking colorized photo is original 20% of time • Resizes image to fit input layer of 200x200 pixels • Gained popularity in news website / forums
  • 18. Implementation on video • Extract individual frames from video using FFMPEG • Colorize each individual frame • Re-compile video and attach original audio file
  • 20. Applications • Colorized videos are more ‘tangible’ and ‘alive’ than black/white • Showing colorized Polygoonjournaals can augment TTS engine • General positive responses on technology may increase attention to NISV collection • NISV Employees were enthousiastic
  • 21. Issues • Each frame is considered independent and is colorized thusly • Artifacts appear between frames • Slow performance without use of Nvidia GPU • Low resolution • Predicted colors still far from perfect
  • 22. Conclusions • Current corpus covers many of often used words • Various implemented approacheds increase coverage • Low coverage for sentences -> real world approach may need improvement • Audio is recognizable and understandable • Neural Networks may be used to colorize video footage