SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
1
PyCon JP 2015
Renyuan Lyu
呂仁園
Chun-Han Lai
賴俊翰
Karaoke-style Read-aloud System
Chang Gung Univ.
Taiwan
Oct/10/ Saturday 2 p.m.–2:30 p.m. in 会議室1/Conference Room 1
CguTextKaraoke
a Karaoke-style Read-aloud System
Using Speech Alignment and Text-to-Speech Technology
Chun-Han Lai (賴俊翰)
Renyuan Lyu (呂仁園)
Chang Gung University (長庚大學)
Taiwan (台灣)
2
Abstract
• A procedure to create a Speech-to-Text
Synchronization file from an original text-only file
– can be used to show high-light text just like a Karaoke
machine
– very useful for language learning purpose.
• TTS (Text-to-speech) technology on clouds, like
Google TTS
• Speech-recognition technology, like HTK, for
temporal alignment
3
Introduction
• Starting from a text-only file, using a cloud-based text-to-speech
(TTS) technology, like Google Translate/TTS, and also a speech-
recognition technology, like Hidden Markov Model Toolkits (HTK),
we could generate its associated timed-text file which aligns up text
with speech waveform file in the temporal axis.
• Python is used not only as a glue to link all different styles of
software resources, like Google Translate and HTK, but also as a
powerful tool to deal with all text processing tasks in this project.
• From such a kind of timed text file, we have also provided a
JavaScript based web-app and also a Python GUI software to
demonstrate the time-aligned high-lighted text like a karaoke
machine in word level, which are considered very useful for the
language learning purpose.
4
a Karaoke-style Text Read-aloud System
https://www.youtube-nocookie.com/embed/9a5KoXNCagM?start=180
• Karaoke (カラオケ) is a form of interactive
entertainment in which an amateur singer sings
along with recorded music.
• Lyrics are usually displayed on a video screen, along
with a moving symbol, changing color, or music
video images, to guide the singer.
• Here is an example of my favorites
https://en.wikipedia.org/wiki/Karaoke
5
Speech Shadowing Technique
for Language Learning
• The motivation of this project
» https://en.wikipedia.org/wiki/Speech_shadowing
–Speech shadowing
• is an Language Learning technique in which
subjects repeat speech immediately after hearing it.
– The technique is used in language learning.
– A demonstration can be viewed at the following Youtube
link.
• “English Speaking Practice: How to improve your
English Speaking and Fluency: SHADOWING”
• https://www.youtube.com/watch?v=GVWFGIyNswI6
Text-to-Speech Synthesis
7
Wikipedia is a multilingual, web-based, free-content encyclopedia project supported
by the Wikimedia Foundation and based on a model of openly editable content. The
name "Wikipedia" is a portmanteau of the words wiki (a technology for creating
collaborative websites, from the Hawaiian word wiki, meaning "quick") and
encyclopedia. Wikipedia's articles provide links designed to guide the user to related
pages with additional information.
Given: a piece of Text and its speech, e.g.,
The goal is to obtain its speech
Google TTS API
in a Python module
8
• pip install gTTS
from gtts import gTTS
aText= 'Wikipedia is a multilingual, ...'
aLang= 'en'
tts= gTTS(text= aText, lang= aLang)
tts.save("aSpeech.mp3")
aSpeech.mp3aText
https://github.com/pndurette/gTTS
FFmpeg
• About Ffmpeg
– [https://en.wikipedia.org/wiki/FFmpeg]
– FFmpeg is a free software project that
produces libraries and programs for
handling multimedia data.
– It is one of the leading multimedia frameworks,
able to do many DSP tasks, including ...
• decode, encode,
• transcode, mux, demux, stream, filter and play
9
10
FFmpeg -i aSpeech.mp3 -y -
vn -acodec pcm_s16le -ac 1
-ar 16000 -f wav
aSpeech.wav
aSpeech.mp3 aSpeech.wav
Pcm, 16 bits/sample Little endian
1 (mono) channel
16000 samples/sec
FFplay
aSpeech.wav
Verifying
by seeing
and hearing
Or using an interactive audio tool, like Audacity.
Audacity (audio editor)
• Audacity is a powerful, free open source digital audio editor
– Its features include:
• Recording and playing back sounds
• Importing and exporting of WAV, MP3, ....
• Viewing and editing via cut, copy, and paste, ...
11
aSpeech.mp3
aSpeech.wav
Text-to-Speech Alignment
12
Wikipedia is a multilingual, web-based, free-content encyclopedia project
supported by the Wikimedia Foundation and based on a model of openly editable
content. The name "Wikipedia" is a portmanteau of the words wiki (a technology for
creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and
encyclopedia. Wikipedia's articles provide links designed to guide the user to related
pages with additional information.
Given: a piece of Text and its speech, e.g.,
The goal is to obtain a ‘Timed-Text’
0.0000.080sil
0.0800.870wikipedia
0.8700.990is
0.9901.080a
1.0802.010multilingual
2.0102.140sil
2.1602.240sil
2.2403.020webbased
3.0203.180sil
3.2043.354sil
3.3544.284freecontent
4.2845.374encyclopedia
5.3745.774project
5.7746.454supported
6.4546.754by
6.7546.904the
6.9047.574wikimedia
7.5748.414foundation
8.4148.514sil
8.5328.622sil
8.6228.852and
8.8529.242based
9.2429.382on
9.3829.432a
9.4329.982model
9.98210.032of
10.03210.592openly
10.59211.212editable
11.21211.802content
11.80211.932sil
:
:
:
Wav splitting
13
In Sentence-level, this can be straightforward done by
extracting the time information from the TTS mp3 files,
which are received sentence by sentence.
Sentence boundaries
Phonetic Transcription
• Speech recognition technology needs to transcribe text into
phonetic symbols, in order to build up phone models.
14
“Wikipedia is a multilingual, web-based, free-content encyclopedia project.”
“wikipedia ɪz ə məltilɪŋwəl, wɛb- best, fri- kɑntɛnt ənsɑjkləpidiə prɑdʒɛkt.”
”wikipedia Iz @ m@ltilINw@l, wEb- best, fri- kAntEnt @nsAykl@pidi@ prAdZEkt.”
Original English Text: (ASCII only, perhaps!)
Transcription in IPA: (needs Unicode)
Transcription in SAMPA: (ASCII only, including non-alphabet symbols)
http://upodn.com/phon.asp
• Post processing of phonetic transcription
• To map or simply clean all undesired symbols from multiple
styles of outputs
– (usually in unicode, or some non-alphabet symbols)
• For plain English (en),
– Approximately using the original Text as the phone sequence.
– Although it seems too simple, it is so far so good.
• For Traditional Chinese (zh-tw),
– Google Translate was used to get phonetic symbols in Pinyin (拼音,
pīnyīn), and then plain romaji (eliminating the tone mark)
• For Japanese (ja),
– Mecab has been used recently to get the Katakana (片仮名, カタカナ).
– Romkan has been used to transform katakana to romaji (kunrei)
• Thanks to Python, it helps me do the most jobs
during this stage of processing!!
15
• Phonetic transcription for English
– Using regular expression module
16
phn= text2phn_en(enText)
enText=
‘’’Wikipedia is a multilingual, web-based,
free-content encyclopedia project.‘’’
phn=
‘’’wikipedia_is_a_multilingual_webbased
_freecontent_encyclopedia_project’’'
import re
pats= ''|"|-|^_|_$|,|.|(|)'
phn= re.sub(pats, '', phn)
• Phonetic transcription for Traditional Chinese
– Using Google Translate/TTS api
17
phn= text2phn_tc(tcText)
tcText=
‘維基百科是一個自由內容’
phn=
‘weiji_baike_shi_yige_ziyou_neirong’
GOOGLE_TTS_URL=
'https://translate.google.
com.tw/translate_a/singl
e?dt=bd&dt=ex&dt=at&'
req= urllib.request.Request(GOOGLE_TTS_URL + data)
• Phonetic transcription for Japanese
– Using MeCab and Romkan
18
phn= text2phn_jp(jpText)
jpText=
‘‘’ウィキペディアは、
信頼されるフリーなオンライン百科事典、‘’’
phn=
‘‘’wikipedyia_wa_sil_sinrai_sa_reru_furi-_
na_onrain_hyakka_ziten‘’’
import MeCab
import romkan
y= MeCab.Tagger().parse(text)
...
kun= romkan.to_kunrei(phn)
At the Halfway
• a bundle of files wav/lab
19
• HMM Toolkits (HTK),
– http://htk.eng.cam.ac.uk/
– Given a speech utterance, with its phone
sequence, the speech can be well aligned with
phones by ‘forced alignment’ techniques in the
HMM approach.
– A set of HMM Toolkits, called HTK, provided a
convenient way to utilize the HMM approach.
20
Speech recognition technology
• The HTK overview
21
HTK processing (abstract) ....
22
• #[00] setting the working dir
• #[01] creating the (hmm) model prototype
• #[02] label processing
• #[03] feature extraction
• #[04] model initialization
• #[05] model training
• #[06] forced alignment
• #[07] post file moving operation
HTK processing (detail)....
23
#[00] setting the working dir
dirName= ./_wav/
#[01] creating the (hmm) model prototype
CreateHProto....
myHmmPro
N = 3 M = 6
#[02] label processing
000, 0,----> ._htkhled -A -i spLab00.mlf -n spLab00.lst -S spLab.scp hL
001, 0,----> ._htkhled -A -i spLab.mlf -n spLab.lst -S spLab.scp hLed.l
002, 0,----> ._htkhled -A -i spLab_p.mlf -n spLab_p.lst -S spLab.scp -I
#[03] feature extraction
003, 0,----> ._htkHCopy -A -C hCopy.conf -S spWav2Mfc.scp 1>> 1.htk.out 2>>
#[04] model initialization
004, 1,----> mkdir hmms_p
005, 0,----> ._htkHCompV -A -m -C hInit.conf -S spMfc.scp -I spLab_p.mlf -M
#[05] model training
006, 0,----> ._htkHERest -A -C hErest.conf -S spMfc.scp -p 1 -t 2000.0 -w 3
007, 0,----> ._htkHERest -A -C hErest.conf -p 0 -t 2000.0 -w 3 -v 0.05 -I sp
: (repeating several times...)
:
#[06] forced alignment
016, 0,----> ._htkHVite -A -a -C hVite.conf -S spMfc.scp -d hmms_p/ -i s
#[07] post file moving operation
017, 1,----> mkdir outDir
018, 1,----> copy spLab_aligned.mlf outDir./_wav_aligned.mlf
24
HLedspLab.scp spLab.mlf
spLab.lst
hLed.led
HLed
spLab00.mlf
spLab00.lst
hLed00.led
HLed
spLab_p.mlf
spLab_p.lst
hLed.led
spLab_p.dic
HLed
25
HCopy
hCopy.conf
spWav2Mfc.scp
*.wav *.mfc
HCopy
HCompV
26
HCompV
HCompV.conf
*.mfc hmms_p/*
spMfc.scp
spLab_p.mlf
myHmmPro
HERest
27
HERest
hErest.conf
*.mfc
hmms_p/*
spMfc.scp
spLab_p.mlf spLab_p.lst
hmms_p/HER1.acc
N iterations
N=5
HERest
HVite
28
HVite
hVite.conf*.mfc
spMfc.scp
spLab_p.lst
spLab_aligned.mlf
spLab.mlf
spLab_p.dic
hmms_p/
HTK summary
29
HLed
HCopy
HCompV
HERest
HVite
HTK Tools
#!MLF!#
"./_wav/SN0.rec"
0 800000 sil -578.044434
800000 8700000 wikipedia -5636.368652
8700000 9900000 is -855.988770
9900000 10800000 a -693.554871
10800000 20100000 multilingual -7268.197266
20100000 21400000 sil -791.746216
.
"./_wav/SN1.rec"
0 800000 sil -541.083069
800000 8600000 webbased -5977.622070
8600000 10200000 sil -1048.225220
.
"./_wav/SN2.rec"
0 1500000 sil -1100.892822
1500000 10800000 freecontent -7094.197266
10800000 21700000 encyclopedia -8148.633789
21700000 25700000 project -3247.493896
25700000 32500000 supported -5594.979492
32500000 35500000 by -2412.487305
35500000 37000000 the -1176.310547
37000000 43700000 wikimedia -5128.852051
43700000 52100000 foundation -5995.618164
52100000 53100000 sil -695.872864
.
.
.
spLab_aligned.mlf
wavDir/
The major algorithm in HTK
30
‘Holiday Shopping’ = ‘h’+’o’+’l’+’i’+’d’+’ay’+’sil’+’sh’+’o’+’p’+’I’+’ng’
‘h’ ’o’ ’ng’
• Forced Alignment in HTK
– 1. Given a Speech signal
– 2. Doing the Pronunciation transcription
• Pronunciation symbols must be all-ASCII only!!
– 3. Training to get the HMM models
31
‘h’
’o’
’ng’
– 4. Doing the Viterbi Search for the optimal path (alignment):
32
#!MLF!#
"wavDir/SN0001.rec"
0 800000 sil -567.865356
800000 8700000 wikipedia -5670.471680
8700000 10000000 is -951.059692
10000000 10600000 a -489.843994
10600000 20000000 multilingual -7398.754395
20000000 20700000 sil -416.119415
.
"wavDir/SN0002.rec"
0 900000 sil -632.964050
900000 8600000 webbased -6000.767578
8600000 9900000 sil -914.236206
.
"wavDir/SN0003.rec"
0 2100000 sil -1373.137817
2100000 9000000 freecontent -5306.260742
9000000 18500000 encyclopedia -6654.958984
18500000 25600000 project -5698.730469
25600000 32700000 supported -5713.494141
32700000 33200000 by -429.306763
33200000 34800000 the -1205.477539
34800000 41500000 wikimedia -5115.318359
41500000 50000000 foundation -6074.208496
50000000 52000000 and -1746.236938
52000000 56200000 based -3267.695801
56200000 57000000 on -585.264404
57000000 57700000 a -577.346130
57700000 63200000 model -3769.413574
63200000 63800000 of -524.015503
63800000 65300000 sil -1129.348633
.
wavDir.align
33
Now it’s time
to KaraOke !
A Browser in Javascript and HTML
for Text-KaraOke
• https://youtu.be/11-ltx0yv_o
34
A Browser in Python using TKinter
for Text-KaraOke
35
Conclusion & Future Work
• Make the process more automatically.
• Make the user interface more friendly.
• Make the program more robust.
• Call for your help to improve.
• Thank you for Listening!
36
37
PyCon JP 2015
Renyuan Lyu
呂仁園
Chun-Han Lai
賴俊翰
Karaoke-style Read-aloud System
Oct/10/ Saturday 2 p.m.–2:30 p.m. in 会議室1/Conference Room 1
Thank you for Listening.
ご聴取 有り難う 御座いました。
感謝您的收聽。

Weitere ähnliche Inhalte

Was ist angesagt?

Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
PyData
 

Was ist angesagt? (20)

Using SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with PythonUsing SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with Python
 
Python教程 / Python tutorial
Python教程 / Python tutorialPython教程 / Python tutorial
Python教程 / Python tutorial
 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
 
Python on a chip
Python on a chipPython on a chip
Python on a chip
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge
 
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
 
Python Intro
Python IntroPython Intro
Python Intro
 
Python Workshop
Python WorkshopPython Workshop
Python Workshop
 
Python in Action (Part 1)
Python in Action (Part 1)Python in Action (Part 1)
Python in Action (Part 1)
 
Introduction to Programming in Go
Introduction to Programming in GoIntroduction to Programming in Go
Introduction to Programming in Go
 
Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.
 
SWIG Hello World
SWIG Hello WorldSWIG Hello World
SWIG Hello World
 
Learn python – for beginners
Learn python – for beginnersLearn python – for beginners
Learn python – for beginners
 
Introduction to-python
Introduction to-pythonIntroduction to-python
Introduction to-python
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
PyPy London Demo Evening 2013
PyPy London Demo Evening 2013PyPy London Demo Evening 2013
PyPy London Demo Evening 2013
 
Python Developer Certification
Python Developer CertificationPython Developer Certification
Python Developer Certification
 
Why Python?
Why Python?Why Python?
Why Python?
 

Andere mochten auch

Sphinxで作る貢献しやすい ドキュメント翻訳の仕組み
Sphinxで作る貢献しやすいドキュメント翻訳の仕組みSphinxで作る貢献しやすいドキュメント翻訳の仕組み
Sphinxで作る貢献しやすい ドキュメント翻訳の仕組み
Takayuki Shimizukawa
 

Andere mochten auch (20)

PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみようPythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
 
Python と型ヒント (Type Hints)
Python と型ヒント (Type Hints)Python と型ヒント (Type Hints)
Python と型ヒント (Type Hints)
 
組合せ最適化を体系的に知ってPythonで実行してみよう PyCon 2015
組合せ最適化を体系的に知ってPythonで実行してみよう PyCon 2015組合せ最適化を体系的に知ってPythonで実行してみよう PyCon 2015
組合せ最適化を体系的に知ってPythonで実行してみよう PyCon 2015
 
日本のオープンデータプラットフォームをPythonでつくる
日本のオープンデータプラットフォームをPythonでつくる日本のオープンデータプラットフォームをPythonでつくる
日本のオープンデータプラットフォームをPythonでつくる
 
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
 
SekainoKAO by TeamKAO
SekainoKAO by TeamKAOSekainoKAO by TeamKAO
SekainoKAO by TeamKAO
 
tse - Pythonによるテキスト整形ユーティリティ
tse - Pythonによるテキスト整形ユーティリティtse - Pythonによるテキスト整形ユーティリティ
tse - Pythonによるテキスト整形ユーティリティ
 
PyLadies Tokyo - 初心者向けPython体験ワークショップ開催の裏側
PyLadies Tokyo - 初心者向けPython体験ワークショップ開催の裏側PyLadies Tokyo - 初心者向けPython体験ワークショップ開催の裏側
PyLadies Tokyo - 初心者向けPython体験ワークショップ開催の裏側
 
Sphinxで作る貢献しやすい ドキュメント翻訳の仕組み
Sphinxで作る貢献しやすいドキュメント翻訳の仕組みSphinxで作る貢献しやすいドキュメント翻訳の仕組み
Sphinxで作る貢献しやすい ドキュメント翻訳の仕組み
 
アドネットワークのデータ解析チームを支える技術
アドネットワークのデータ解析チームを支える技術アドネットワークのデータ解析チームを支える技術
アドネットワークのデータ解析チームを支える技術
 
野球Hack!~Pythonを用いたデータ分析と可視化 #pyconjp
野球Hack!~Pythonを用いたデータ分析と可視化 #pyconjp野球Hack!~Pythonを用いたデータ分析と可視化 #pyconjp
野球Hack!~Pythonを用いたデータ分析と可視化 #pyconjp
 
sqldf for pandas
sqldf for pandassqldf for pandas
sqldf for pandas
 
pandasによるデータ加工時の注意点やライブラリの話
pandasによるデータ加工時の注意点やライブラリの話pandasによるデータ加工時の注意点やライブラリの話
pandasによるデータ加工時の注意点やライブラリの話
 
Django から各種チャットツールに通知するライブラリを作った話
Django から各種チャットツールに通知するライブラリを作った話Django から各種チャットツールに通知するライブラリを作った話
Django から各種チャットツールに通知するライブラリを作った話
 
3分でサーバオペレーションコマンドを作る技術
3分でサーバオペレーションコマンドを作る技術3分でサーバオペレーションコマンドを作る技術
3分でサーバオペレーションコマンドを作る技術
 
Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門
 
How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015
 
PyCon JP 2015 keynote
PyCon JP 2015 keynotePyCon JP 2015 keynote
PyCon JP 2015 keynote
 
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
 
Pythonで作る俺様サウンドエフェクター
Pythonで作る俺様サウンドエフェクターPythonで作る俺様サウンドエフェクター
Pythonで作る俺様サウンドエフェクター
 

Ähnlich wie Ry pyconjp2015 karaoke

Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02
ElliotBlack
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkins
stephlizahawkins123
 
Talking Technologies
Talking TechnologiesTalking Technologies
Talking Technologies
Jisc Scotland
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
KyleFielding
 
1019上課資料
1019上課資料1019上課資料
1019上課資料
abunc8
 
Ig2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisitedIg2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisited
stephlizahawkins123
 

Ähnlich wie Ry pyconjp2015 karaoke (20)

Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
Ig2 task 1 re edit version
Ig2 task 1 re edit versionIg2 task 1 re edit version
Ig2 task 1 re edit version
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
 
Thingy editedd
Thingy editeddThingy editedd
Thingy editedd
 
Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02
 
Automated Podcasting System for Universities
Automated Podcasting System for UniversitiesAutomated Podcasting System for Universities
Automated Podcasting System for Universities
 
Screencasts, Captions and your Global Audience
Screencasts, Captions and your Global AudienceScreencasts, Captions and your Global Audience
Screencasts, Captions and your Global Audience
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptx
 
The Casting Couch Claud
The Casting Couch   ClaudThe Casting Couch   Claud
The Casting Couch Claud
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkins
 
WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21
 
Talking Technologies
Talking TechnologiesTalking Technologies
Talking Technologies
 
Sound recording glossary by Liam Oven for Unit 73
Sound recording glossary by Liam Oven for Unit 73Sound recording glossary by Liam Oven for Unit 73
Sound recording glossary by Liam Oven for Unit 73
 
Assistive and Learning Technics
Assistive and Learning TechnicsAssistive and Learning Technics
Assistive and Learning Technics
 
Sound recording glossary improved
Sound recording glossary improvedSound recording glossary improved
Sound recording glossary improved
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
 
1019上課資料
1019上課資料1019上課資料
1019上課資料
 
Ig2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisitedIg2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisited
 

Mehr von Renyuan Lyu (7)

Lightning talk01 docx
Lightning talk01 docxLightning talk01 docx
Lightning talk01 docx
 
Lightning talk01
Lightning talk01Lightning talk01
Lightning talk01
 
Pycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch DetectionPycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch Detection
 
pycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslatepycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslate
 
Ry pyconjp2015 turtle
Ry pyconjp2015 turtleRy pyconjp2015 turtle
Ry pyconjp2015 turtle
 
教青少年寫程式
教青少年寫程式教青少年寫程式
教青少年寫程式
 
Pycon apac 2014
Pycon apac 2014Pycon apac 2014
Pycon apac 2014
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Ry pyconjp2015 karaoke

  • 1. 1 PyCon JP 2015 Renyuan Lyu 呂仁園 Chun-Han Lai 賴俊翰 Karaoke-style Read-aloud System Chang Gung Univ. Taiwan Oct/10/ Saturday 2 p.m.–2:30 p.m. in 会議室1/Conference Room 1
  • 2. CguTextKaraoke a Karaoke-style Read-aloud System Using Speech Alignment and Text-to-Speech Technology Chun-Han Lai (賴俊翰) Renyuan Lyu (呂仁園) Chang Gung University (長庚大學) Taiwan (台灣) 2
  • 3. Abstract • A procedure to create a Speech-to-Text Synchronization file from an original text-only file – can be used to show high-light text just like a Karaoke machine – very useful for language learning purpose. • TTS (Text-to-speech) technology on clouds, like Google TTS • Speech-recognition technology, like HTK, for temporal alignment 3
  • 4. Introduction • Starting from a text-only file, using a cloud-based text-to-speech (TTS) technology, like Google Translate/TTS, and also a speech- recognition technology, like Hidden Markov Model Toolkits (HTK), we could generate its associated timed-text file which aligns up text with speech waveform file in the temporal axis. • Python is used not only as a glue to link all different styles of software resources, like Google Translate and HTK, but also as a powerful tool to deal with all text processing tasks in this project. • From such a kind of timed text file, we have also provided a JavaScript based web-app and also a Python GUI software to demonstrate the time-aligned high-lighted text like a karaoke machine in word level, which are considered very useful for the language learning purpose. 4
  • 5. a Karaoke-style Text Read-aloud System https://www.youtube-nocookie.com/embed/9a5KoXNCagM?start=180 • Karaoke (カラオケ) is a form of interactive entertainment in which an amateur singer sings along with recorded music. • Lyrics are usually displayed on a video screen, along with a moving symbol, changing color, or music video images, to guide the singer. • Here is an example of my favorites https://en.wikipedia.org/wiki/Karaoke 5
  • 6. Speech Shadowing Technique for Language Learning • The motivation of this project » https://en.wikipedia.org/wiki/Speech_shadowing –Speech shadowing • is an Language Learning technique in which subjects repeat speech immediately after hearing it. – The technique is used in language learning. – A demonstration can be viewed at the following Youtube link. • “English Speaking Practice: How to improve your English Speaking and Fluency: SHADOWING” • https://www.youtube.com/watch?v=GVWFGIyNswI6
  • 7. Text-to-Speech Synthesis 7 Wikipedia is a multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links designed to guide the user to related pages with additional information. Given: a piece of Text and its speech, e.g., The goal is to obtain its speech
  • 8. Google TTS API in a Python module 8 • pip install gTTS from gtts import gTTS aText= 'Wikipedia is a multilingual, ...' aLang= 'en' tts= gTTS(text= aText, lang= aLang) tts.save("aSpeech.mp3") aSpeech.mp3aText https://github.com/pndurette/gTTS
  • 9. FFmpeg • About Ffmpeg – [https://en.wikipedia.org/wiki/FFmpeg] – FFmpeg is a free software project that produces libraries and programs for handling multimedia data. – It is one of the leading multimedia frameworks, able to do many DSP tasks, including ... • decode, encode, • transcode, mux, demux, stream, filter and play 9
  • 10. 10 FFmpeg -i aSpeech.mp3 -y - vn -acodec pcm_s16le -ac 1 -ar 16000 -f wav aSpeech.wav aSpeech.mp3 aSpeech.wav Pcm, 16 bits/sample Little endian 1 (mono) channel 16000 samples/sec FFplay aSpeech.wav Verifying by seeing and hearing Or using an interactive audio tool, like Audacity.
  • 11. Audacity (audio editor) • Audacity is a powerful, free open source digital audio editor – Its features include: • Recording and playing back sounds • Importing and exporting of WAV, MP3, .... • Viewing and editing via cut, copy, and paste, ... 11 aSpeech.mp3 aSpeech.wav
  • 12. Text-to-Speech Alignment 12 Wikipedia is a multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links designed to guide the user to related pages with additional information. Given: a piece of Text and its speech, e.g., The goal is to obtain a ‘Timed-Text’ 0.0000.080sil 0.0800.870wikipedia 0.8700.990is 0.9901.080a 1.0802.010multilingual 2.0102.140sil 2.1602.240sil 2.2403.020webbased 3.0203.180sil 3.2043.354sil 3.3544.284freecontent 4.2845.374encyclopedia 5.3745.774project 5.7746.454supported 6.4546.754by 6.7546.904the 6.9047.574wikimedia 7.5748.414foundation 8.4148.514sil 8.5328.622sil 8.6228.852and 8.8529.242based 9.2429.382on 9.3829.432a 9.4329.982model 9.98210.032of 10.03210.592openly 10.59211.212editable 11.21211.802content 11.80211.932sil : : :
  • 13. Wav splitting 13 In Sentence-level, this can be straightforward done by extracting the time information from the TTS mp3 files, which are received sentence by sentence. Sentence boundaries
  • 14. Phonetic Transcription • Speech recognition technology needs to transcribe text into phonetic symbols, in order to build up phone models. 14 “Wikipedia is a multilingual, web-based, free-content encyclopedia project.” “wikipedia ɪz ə məltilɪŋwəl, wɛb- best, fri- kɑntɛnt ənsɑjkləpidiə prɑdʒɛkt.” ”wikipedia Iz @ m@ltilINw@l, wEb- best, fri- kAntEnt @nsAykl@pidi@ prAdZEkt.” Original English Text: (ASCII only, perhaps!) Transcription in IPA: (needs Unicode) Transcription in SAMPA: (ASCII only, including non-alphabet symbols) http://upodn.com/phon.asp
  • 15. • Post processing of phonetic transcription • To map or simply clean all undesired symbols from multiple styles of outputs – (usually in unicode, or some non-alphabet symbols) • For plain English (en), – Approximately using the original Text as the phone sequence. – Although it seems too simple, it is so far so good. • For Traditional Chinese (zh-tw), – Google Translate was used to get phonetic symbols in Pinyin (拼音, pīnyīn), and then plain romaji (eliminating the tone mark) • For Japanese (ja), – Mecab has been used recently to get the Katakana (片仮名, カタカナ). – Romkan has been used to transform katakana to romaji (kunrei) • Thanks to Python, it helps me do the most jobs during this stage of processing!! 15
  • 16. • Phonetic transcription for English – Using regular expression module 16 phn= text2phn_en(enText) enText= ‘’’Wikipedia is a multilingual, web-based, free-content encyclopedia project.‘’’ phn= ‘’’wikipedia_is_a_multilingual_webbased _freecontent_encyclopedia_project’’' import re pats= ''|"|-|^_|_$|,|.|(|)' phn= re.sub(pats, '', phn)
  • 17. • Phonetic transcription for Traditional Chinese – Using Google Translate/TTS api 17 phn= text2phn_tc(tcText) tcText= ‘維基百科是一個自由內容’ phn= ‘weiji_baike_shi_yige_ziyou_neirong’ GOOGLE_TTS_URL= 'https://translate.google. com.tw/translate_a/singl e?dt=bd&dt=ex&dt=at&' req= urllib.request.Request(GOOGLE_TTS_URL + data)
  • 18. • Phonetic transcription for Japanese – Using MeCab and Romkan 18 phn= text2phn_jp(jpText) jpText= ‘‘’ウィキペディアは、 信頼されるフリーなオンライン百科事典、‘’’ phn= ‘‘’wikipedyia_wa_sil_sinrai_sa_reru_furi-_ na_onrain_hyakka_ziten‘’’ import MeCab import romkan y= MeCab.Tagger().parse(text) ... kun= romkan.to_kunrei(phn)
  • 19. At the Halfway • a bundle of files wav/lab 19
  • 20. • HMM Toolkits (HTK), – http://htk.eng.cam.ac.uk/ – Given a speech utterance, with its phone sequence, the speech can be well aligned with phones by ‘forced alignment’ techniques in the HMM approach. – A set of HMM Toolkits, called HTK, provided a convenient way to utilize the HMM approach. 20 Speech recognition technology
  • 21. • The HTK overview 21
  • 22. HTK processing (abstract) .... 22 • #[00] setting the working dir • #[01] creating the (hmm) model prototype • #[02] label processing • #[03] feature extraction • #[04] model initialization • #[05] model training • #[06] forced alignment • #[07] post file moving operation
  • 23. HTK processing (detail).... 23 #[00] setting the working dir dirName= ./_wav/ #[01] creating the (hmm) model prototype CreateHProto.... myHmmPro N = 3 M = 6 #[02] label processing 000, 0,----> ._htkhled -A -i spLab00.mlf -n spLab00.lst -S spLab.scp hL 001, 0,----> ._htkhled -A -i spLab.mlf -n spLab.lst -S spLab.scp hLed.l 002, 0,----> ._htkhled -A -i spLab_p.mlf -n spLab_p.lst -S spLab.scp -I #[03] feature extraction 003, 0,----> ._htkHCopy -A -C hCopy.conf -S spWav2Mfc.scp 1>> 1.htk.out 2>> #[04] model initialization 004, 1,----> mkdir hmms_p 005, 0,----> ._htkHCompV -A -m -C hInit.conf -S spMfc.scp -I spLab_p.mlf -M #[05] model training 006, 0,----> ._htkHERest -A -C hErest.conf -S spMfc.scp -p 1 -t 2000.0 -w 3 007, 0,----> ._htkHERest -A -C hErest.conf -p 0 -t 2000.0 -w 3 -v 0.05 -I sp : (repeating several times...) : #[06] forced alignment 016, 0,----> ._htkHVite -A -a -C hVite.conf -S spMfc.scp -d hmms_p/ -i s #[07] post file moving operation 017, 1,----> mkdir outDir 018, 1,----> copy spLab_aligned.mlf outDir./_wav_aligned.mlf
  • 29. HTK summary 29 HLed HCopy HCompV HERest HVite HTK Tools #!MLF!# "./_wav/SN0.rec" 0 800000 sil -578.044434 800000 8700000 wikipedia -5636.368652 8700000 9900000 is -855.988770 9900000 10800000 a -693.554871 10800000 20100000 multilingual -7268.197266 20100000 21400000 sil -791.746216 . "./_wav/SN1.rec" 0 800000 sil -541.083069 800000 8600000 webbased -5977.622070 8600000 10200000 sil -1048.225220 . "./_wav/SN2.rec" 0 1500000 sil -1100.892822 1500000 10800000 freecontent -7094.197266 10800000 21700000 encyclopedia -8148.633789 21700000 25700000 project -3247.493896 25700000 32500000 supported -5594.979492 32500000 35500000 by -2412.487305 35500000 37000000 the -1176.310547 37000000 43700000 wikimedia -5128.852051 43700000 52100000 foundation -5995.618164 52100000 53100000 sil -695.872864 . . . spLab_aligned.mlf wavDir/
  • 30. The major algorithm in HTK 30 ‘Holiday Shopping’ = ‘h’+’o’+’l’+’i’+’d’+’ay’+’sil’+’sh’+’o’+’p’+’I’+’ng’ ‘h’ ’o’ ’ng’ • Forced Alignment in HTK – 1. Given a Speech signal – 2. Doing the Pronunciation transcription • Pronunciation symbols must be all-ASCII only!! – 3. Training to get the HMM models
  • 31. 31 ‘h’ ’o’ ’ng’ – 4. Doing the Viterbi Search for the optimal path (alignment):
  • 32. 32 #!MLF!# "wavDir/SN0001.rec" 0 800000 sil -567.865356 800000 8700000 wikipedia -5670.471680 8700000 10000000 is -951.059692 10000000 10600000 a -489.843994 10600000 20000000 multilingual -7398.754395 20000000 20700000 sil -416.119415 . "wavDir/SN0002.rec" 0 900000 sil -632.964050 900000 8600000 webbased -6000.767578 8600000 9900000 sil -914.236206 . "wavDir/SN0003.rec" 0 2100000 sil -1373.137817 2100000 9000000 freecontent -5306.260742 9000000 18500000 encyclopedia -6654.958984 18500000 25600000 project -5698.730469 25600000 32700000 supported -5713.494141 32700000 33200000 by -429.306763 33200000 34800000 the -1205.477539 34800000 41500000 wikimedia -5115.318359 41500000 50000000 foundation -6074.208496 50000000 52000000 and -1746.236938 52000000 56200000 based -3267.695801 56200000 57000000 on -585.264404 57000000 57700000 a -577.346130 57700000 63200000 model -3769.413574 63200000 63800000 of -524.015503 63800000 65300000 sil -1129.348633 . wavDir.align
  • 34. A Browser in Javascript and HTML for Text-KaraOke • https://youtu.be/11-ltx0yv_o 34
  • 35. A Browser in Python using TKinter for Text-KaraOke 35
  • 36. Conclusion & Future Work • Make the process more automatically. • Make the user interface more friendly. • Make the program more robust. • Call for your help to improve. • Thank you for Listening! 36
  • 37. 37 PyCon JP 2015 Renyuan Lyu 呂仁園 Chun-Han Lai 賴俊翰 Karaoke-style Read-aloud System Oct/10/ Saturday 2 p.m.–2:30 p.m. in 会議室1/Conference Room 1 Thank you for Listening. ご聴取 有り難う 御座いました。 感謝您的收聽。