SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
class SelfIntro自己紹介:
1
def __init__ (私):
私.名 = 'Renyuan Lyu, 呂 仁園'
私.職業 = 'University Professor, 大学の先生'
私.研究分野 = 'Speech Recognition, 音声認識'
私.職場 = 'Chang Gung Univ (CGU), 長庚大學'
私.国 = 'TAIWAN, 台灣’
私.誇り = '''
Pycon JP speaker (2015~2017, 2019 ),
カラオケさん'''
https://youtu.be/O1-9Yv9cB8Q
2
https://youtu.be/cUewj2kRrbk?t=2434
Lightning Talks at PyCon JP 2016, 2017
Real-time Pitch Detection
and Speech Recognition
in Python
via Pyaudio, Pygame & Vpython
Renyuan Lyu (呂仁園),
Chang Gung University (長庚大學),
TAIWAN (台灣)
@ Pycon JP 2019 3
The System
4
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
https://youtu.be/XF3oGwEsPac
The System
Singing
Voice
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
Lyrics (歌詞)
“Twinkle Twinkle Little Star”
“きらきらひかる”
“一閃一閃亮晶晶”
Pitch (musical notes, 音符)
“C C G G A A C –”
5
Data (Voice) acquisition
• Audio Signal Processing
• samplingRate= 16000 samples/sec,
• bitsPerSample= 16 bits/sample = 2 bytes/sample
• channelNumber= 3 (L, R, humming)
• Frame-wise short-time processing
Frame01
Frame02
6
Digital Signal Processing:
Spectrogram
• A spectrogram is
• a visual representation
• of the spectrum
• of frequencies
• of a signal
• as it varies with time.
• using Fast Fourier Transform
• FFT
7
https://youtu.be/bCRL5yw8fXA
A Real-time Spectrogram
http://friture.org/
8
https://youtu.be/1sbtXqZaGXE
• Friture is a program in PYTHON
designed to analyze audio input in
real-time.
• It displays audio data as a scope, a
spectrum analyzer, or with a rolling
2D spectrogram.
• I found this program in 2012~2013
and was totally convinced that I can
transfer into the PYTHON world to
continue my career.
Using Audacity
to get audio signal
9
https://youtu.be/o9DF9SVdcVo
The first step to do audio signal processing
is to get some audio signal by yourself
and play with it.
WAVE PCM
soundfile format
(.wav)
• http://soundfile.sapp.org
/doc/WaveFormat/
10
• Compared with text data,
audio data is much bigger,
and it is usually stored in
binary form.
• Being familiar with the data
format is crucial to process it.
“See” the audio signal in the raw format
11
Extract audio header information
12
Visualize the audio signal in waveform
• As long as you can visualize the
audio signal, you can make sure
you read them in a correct way,
• and then you can do further
processing via advanced signal
processing algorithms
• like Pitch Detection and Speech
Recognition.
13
Human aided pitch tracking
by Humming
• Pitch Detection for real music
signal is not easy by itself.
• To simplify the task, I use
some TRICK….
• I hum the song and record it in
another channel, while listening
the music.
• I use this “clean” humming
voice to detect the pitch.
14
Multi-Threading Programming
15
def init(self):
self.錄音線= threading.Thread(target= self.錄音線程)
self.能量線= threading.Thread(target= self.f1_能量)
self.基頻線= threading.Thread(target= self.f4_基頻)
self.語音辨認線= threading.Thread(target= self.f6_語音辨認)
def start(self):
self.錄音線.start()
self.能量線.start()
self.基頻線.start()
self.語音辨認線.start()
• For a Realtime system,
the multi-threading
programming is crucial,
• At least, an independent
thread for data
acquisition is necessary.
audio recording “Thread”
16
A circular buffer
to store the real-time
audio signal
17
I set a buffer in RAM to store 16 sec of voice,
It is of size 16*16000*2*3= 1,536,000 bytes
Pitch Detection Algorithm
18
• Zoom a speech signal into scale of .01 sec, We
can visualize there are periodic patterns.
• the duration of a periodic pattern is called
the “pitch period”.
• For the A-440 note, the pitch period =
1/440 = .0023 sec
• A traditionally popular pitch detection
algorithm is based on auto-correlation
method.
Pitch Detection Thread
19
Pitch Sampling at slower intervals
20
Pitch Quantization
21
Speech Recognition
• http://shorturl.at/rxLM4
22
23
Speech Recognition
need Large-scale of Database
to train the system.
Nowadays, Deep-learning
algorithms play the major roles
and achieve the greatest
performance.
Speech Recognition in Python
24
https://pypi.org/project/SpeechRecognition/
Google has a great Speech Recognition API.
This API converts spoken text (microphone)
into written text (Python strings)
the ASR Thread
25
Get a segment (M frames) of speech ➔ x
Transform x into an “AudioData” and then
send it to Google Speech Recognition engine
to get a recognition output “text”.
To get speech data from a circular buffer is
quite an issue for implementation. !!
26
def 語音辨認(私):
辨= sr.Recognizer()
while self.語音辨認中==True:
#
# Get x as "singingVoice" to be 音
#
音= sr.AudioData(x, 私.取樣率, 私.樣本寬)
#
# Do ASR to get recognition Result as 文
#
try:
if lang=='ja':
文= 辨.recognize_google(音, language='ja')
elif lang=='en':
文= 辨.recognize_google(音, language='en')
elif lang= 'zh-TW'
文= 辨.recognize_google(音, language='zh-TW')
else:
私.文= '{} ({})'.format(文, lang)
except:
私.文= 'exceptionOccurs!!'
pass
return
Lyric Transcription
• Melodic voice (singing) recognition
• Timed Text Generation
• Need do Speech recognition and
segmentation
• Currently, it was done by human,
not yet by machine.
27
Kara OK
• Pitch Tracking
• Timed Text Displaying
28
https://youtu.be/F1_Xz1c5AEE
Final
Demo
29
https://youtu.be/0cdo6ZnBZc8
ご清聴ありがとうございました。
Thank you for listening.
感謝聆聽。
@ PyCon Jp 2019
Renyuan Lyu
From TAIWAN
30

Weitere ähnliche Inhalte

Was ist angesagt?

How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization worksMuhammad Taqi
 
Multimedia
MultimediaMultimedia
MultimediaBUDNET
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Edureka!
 
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...Danilo J. S. Bellini
 
Aichroth audio forensics and automation
Aichroth audio forensics and automationAichroth audio forensics and automation
Aichroth audio forensics and automationFIAT/IFTA
 
Conditional generative model for audio
Conditional generative model for audioConditional generative model for audio
Conditional generative model for audioKeunwoo Choi
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming KrishnaMildain
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectKeunwoo Choi
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech RecognitionDr. Uday Saikia
 

Was ist angesagt? (18)

Speech processing
Speech processingSpeech processing
Speech processing
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
 
Multimedia
MultimediaMultimedia
Multimedia
 
Speech Recognition No Code
Speech Recognition No CodeSpeech Recognition No Code
Speech Recognition No Code
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
 
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
 
Aichroth audio forensics and automation
Aichroth audio forensics and automationAichroth audio forensics and automation
Aichroth audio forensics and automation
 
Conditional generative model for audio
Conditional generative model for audioConditional generative model for audio
Conditional generative model for audio
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
 
Multimedia
Multimedia Multimedia
Multimedia
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 
Basic audio programming
Basic audio programmingBasic audio programming
Basic audio programming
 
Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 

Ähnlich wie Py conjp2019 renyuanlyu_3

Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaokeRenyuan Lyu
 
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generationtanyasaxena1611
 
Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2ThomasDowson123
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Maarten Balliauw
 
Pod Series Audio10
Pod Series Audio10Pod Series Audio10
Pod Series Audio10Dan Cabrera
 
Django Python(2)
Django Python(2)Django Python(2)
Django Python(2)tomcoh
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processingdiegogee
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossaryJakeyhyatt123
 
Pod Series Audio14
Pod Series Audio14Pod Series Audio14
Pod Series Audio14Dan Cabrera
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaEdureka!
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheethajohnson90
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work SheetKyleFielding
 

Ähnlich wie Py conjp2019 renyuanlyu_3 (20)

Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaoke
 
Pycon apac 2014
Pycon apac 2014Pycon apac 2014
Pycon apac 2014
 
Desktop assistant
Desktop assistant Desktop assistant
Desktop assistant
 
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generation
 
Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2
 
Ig2 task 1 re edit version
Ig2 task 1 re edit versionIg2 task 1 re edit version
Ig2 task 1 re edit version
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
 
Pod Series Audio10
Pod Series Audio10Pod Series Audio10
Pod Series Audio10
 
Django Python(2)
Django Python(2)Django Python(2)
Django Python(2)
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Pod Series Audio14
Pod Series Audio14Pod Series Audio14
Pod Series Audio14
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
Podcasting
PodcastingPodcasting
Podcasting
 
Speech Dubbing Software
Speech Dubbing SoftwareSpeech Dubbing Software
Speech Dubbing Software
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
 
Input, Processing and Output
Input, Processing and OutputInput, Processing and Output
Input, Processing and Output
 
Python overview
Python overviewPython overview
Python overview
 

Mehr von Renyuan Lyu

Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Renyuan Lyu
 
Lightning talk01 docx
Lightning talk01 docxLightning talk01 docx
Lightning talk01 docxRenyuan Lyu
 
Lightning talk01
Lightning talk01Lightning talk01
Lightning talk01Renyuan Lyu
 
Pycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch DetectionPycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch DetectionRenyuan Lyu
 
pycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslatepycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslateRenyuan Lyu
 
pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__Renyuan Lyu
 
Ry pyconjp2015 turtle
Ry pyconjp2015 turtleRy pyconjp2015 turtle
Ry pyconjp2015 turtleRenyuan Lyu
 
教青少年寫程式
教青少年寫程式教青少年寫程式
教青少年寫程式Renyuan Lyu
 

Mehr von Renyuan Lyu (8)

Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
Lightning talk01 docx
Lightning talk01 docxLightning talk01 docx
Lightning talk01 docx
 
Lightning talk01
Lightning talk01Lightning talk01
Lightning talk01
 
Pycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch DetectionPycon JP 2016 ---- Pitch Detection
Pycon JP 2016 ---- Pitch Detection
 
pycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslatepycon jp 2016 ---- CguTranslate
pycon jp 2016 ---- CguTranslate
 
pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__pyconjp2015_talk_Translation of Python Program__
pyconjp2015_talk_Translation of Python Program__
 
Ry pyconjp2015 turtle
Ry pyconjp2015 turtleRy pyconjp2015 turtle
Ry pyconjp2015 turtle
 
教青少年寫程式
教青少年寫程式教青少年寫程式
教青少年寫程式
 

Kürzlich hochgeladen

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Kürzlich hochgeladen (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Py conjp2019 renyuanlyu_3

  • 1. class SelfIntro自己紹介: 1 def __init__ (私): 私.名 = 'Renyuan Lyu, 呂 仁園' 私.職業 = 'University Professor, 大学の先生' 私.研究分野 = 'Speech Recognition, 音声認識' 私.職場 = 'Chang Gung Univ (CGU), 長庚大學' 私.国 = 'TAIWAN, 台灣’ 私.誇り = ''' Pycon JP speaker (2015~2017, 2019 ), カラオケさん'''
  • 3. Real-time Pitch Detection and Speech Recognition in Python via Pyaudio, Pygame & Vpython Renyuan Lyu (呂仁園), Chang Gung University (長庚大學), TAIWAN (台灣) @ Pycon JP 2019 3
  • 4. The System 4 Multilingual Lyric Transcription (Speech Recognition) Pitch Detection (Melody Recognition) https://youtu.be/XF3oGwEsPac
  • 5. The System Singing Voice Multilingual Lyric Transcription (Speech Recognition) Pitch Detection (Melody Recognition) Lyrics (歌詞) “Twinkle Twinkle Little Star” “きらきらひかる” “一閃一閃亮晶晶” Pitch (musical notes, 音符) “C C G G A A C –” 5
  • 6. Data (Voice) acquisition • Audio Signal Processing • samplingRate= 16000 samples/sec, • bitsPerSample= 16 bits/sample = 2 bytes/sample • channelNumber= 3 (L, R, humming) • Frame-wise short-time processing Frame01 Frame02 6
  • 7. Digital Signal Processing: Spectrogram • A spectrogram is • a visual representation • of the spectrum • of frequencies • of a signal • as it varies with time. • using Fast Fourier Transform • FFT 7 https://youtu.be/bCRL5yw8fXA
  • 8. A Real-time Spectrogram http://friture.org/ 8 https://youtu.be/1sbtXqZaGXE • Friture is a program in PYTHON designed to analyze audio input in real-time. • It displays audio data as a scope, a spectrum analyzer, or with a rolling 2D spectrogram. • I found this program in 2012~2013 and was totally convinced that I can transfer into the PYTHON world to continue my career.
  • 9. Using Audacity to get audio signal 9 https://youtu.be/o9DF9SVdcVo The first step to do audio signal processing is to get some audio signal by yourself and play with it.
  • 10. WAVE PCM soundfile format (.wav) • http://soundfile.sapp.org /doc/WaveFormat/ 10 • Compared with text data, audio data is much bigger, and it is usually stored in binary form. • Being familiar with the data format is crucial to process it.
  • 11. “See” the audio signal in the raw format 11
  • 12. Extract audio header information 12
  • 13. Visualize the audio signal in waveform • As long as you can visualize the audio signal, you can make sure you read them in a correct way, • and then you can do further processing via advanced signal processing algorithms • like Pitch Detection and Speech Recognition. 13
  • 14. Human aided pitch tracking by Humming • Pitch Detection for real music signal is not easy by itself. • To simplify the task, I use some TRICK…. • I hum the song and record it in another channel, while listening the music. • I use this “clean” humming voice to detect the pitch. 14
  • 15. Multi-Threading Programming 15 def init(self): self.錄音線= threading.Thread(target= self.錄音線程) self.能量線= threading.Thread(target= self.f1_能量) self.基頻線= threading.Thread(target= self.f4_基頻) self.語音辨認線= threading.Thread(target= self.f6_語音辨認) def start(self): self.錄音線.start() self.能量線.start() self.基頻線.start() self.語音辨認線.start() • For a Realtime system, the multi-threading programming is crucial, • At least, an independent thread for data acquisition is necessary.
  • 17. A circular buffer to store the real-time audio signal 17 I set a buffer in RAM to store 16 sec of voice, It is of size 16*16000*2*3= 1,536,000 bytes
  • 18. Pitch Detection Algorithm 18 • Zoom a speech signal into scale of .01 sec, We can visualize there are periodic patterns. • the duration of a periodic pattern is called the “pitch period”. • For the A-440 note, the pitch period = 1/440 = .0023 sec • A traditionally popular pitch detection algorithm is based on auto-correlation method.
  • 20. Pitch Sampling at slower intervals 20
  • 23. 23 Speech Recognition need Large-scale of Database to train the system. Nowadays, Deep-learning algorithms play the major roles and achieve the greatest performance.
  • 24. Speech Recognition in Python 24 https://pypi.org/project/SpeechRecognition/ Google has a great Speech Recognition API. This API converts spoken text (microphone) into written text (Python strings)
  • 25. the ASR Thread 25 Get a segment (M frames) of speech ➔ x Transform x into an “AudioData” and then send it to Google Speech Recognition engine to get a recognition output “text”. To get speech data from a circular buffer is quite an issue for implementation. !!
  • 26. 26 def 語音辨認(私): 辨= sr.Recognizer() while self.語音辨認中==True: # # Get x as "singingVoice" to be 音 # 音= sr.AudioData(x, 私.取樣率, 私.樣本寬) # # Do ASR to get recognition Result as 文 # try: if lang=='ja': 文= 辨.recognize_google(音, language='ja') elif lang=='en': 文= 辨.recognize_google(音, language='en') elif lang= 'zh-TW' 文= 辨.recognize_google(音, language='zh-TW') else: 私.文= '{} ({})'.format(文, lang) except: 私.文= 'exceptionOccurs!!' pass return
  • 27. Lyric Transcription • Melodic voice (singing) recognition • Timed Text Generation • Need do Speech recognition and segmentation • Currently, it was done by human, not yet by machine. 27
  • 28. Kara OK • Pitch Tracking • Timed Text Displaying 28 https://youtu.be/F1_Xz1c5AEE
  • 30. ご清聴ありがとうございました。 Thank you for listening. 感謝聆聽。 @ PyCon Jp 2019 Renyuan Lyu From TAIWAN 30