7. Digital Signal Processing:
Spectrogram
• A spectrogram is
• a visual representation
• of the spectrum
• of frequencies
• of a signal
• as it varies with time.
• using Fast Fourier Transform
• FFT
7
https://youtu.be/bCRL5yw8fXA
8. A Real-time Spectrogram
http://friture.org/
8
https://youtu.be/1sbtXqZaGXE
• Friture is a program in PYTHON
designed to analyze audio input in
real-time.
• It displays audio data as a scope, a
spectrum analyzer, or with a rolling
2D spectrogram.
• I found this program in 2012~2013
and was totally convinced that I can
transfer into the PYTHON world to
continue my career.
9. Using Audacity
to get audio signal
9
https://youtu.be/o9DF9SVdcVo
The first step to do audio signal processing
is to get some audio signal by yourself
and play with it.
10. WAVE PCM
soundfile format
(.wav)
• http://soundfile.sapp.org
/doc/WaveFormat/
10
• Compared with text data,
audio data is much bigger,
and it is usually stored in
binary form.
• Being familiar with the data
format is crucial to process it.
13. Visualize the audio signal in waveform
• As long as you can visualize the
audio signal, you can make sure
you read them in a correct way,
• and then you can do further
processing via advanced signal
processing algorithms
• like Pitch Detection and Speech
Recognition.
13
14. Human aided pitch tracking
by Humming
• Pitch Detection for real music
signal is not easy by itself.
• To simplify the task, I use
some TRICK….
• I hum the song and record it in
another channel, while listening
the music.
• I use this “clean” humming
voice to detect the pitch.
14
15. Multi-Threading Programming
15
def init(self):
self.錄音線= threading.Thread(target= self.錄音線程)
self.能量線= threading.Thread(target= self.f1_能量)
self.基頻線= threading.Thread(target= self.f4_基頻)
self.語音辨認線= threading.Thread(target= self.f6_語音辨認)
def start(self):
self.錄音線.start()
self.能量線.start()
self.基頻線.start()
self.語音辨認線.start()
• For a Realtime system,
the multi-threading
programming is crucial,
• At least, an independent
thread for data
acquisition is necessary.
17. A circular buffer
to store the real-time
audio signal
17
I set a buffer in RAM to store 16 sec of voice,
It is of size 16*16000*2*3= 1,536,000 bytes
18. Pitch Detection Algorithm
18
• Zoom a speech signal into scale of .01 sec, We
can visualize there are periodic patterns.
• the duration of a periodic pattern is called
the “pitch period”.
• For the A-440 note, the pitch period =
1/440 = .0023 sec
• A traditionally popular pitch detection
algorithm is based on auto-correlation
method.
23. 23
Speech Recognition
need Large-scale of Database
to train the system.
Nowadays, Deep-learning
algorithms play the major roles
and achieve the greatest
performance.
24. Speech Recognition in Python
24
https://pypi.org/project/SpeechRecognition/
Google has a great Speech Recognition API.
This API converts spoken text (microphone)
into written text (Python strings)
25. the ASR Thread
25
Get a segment (M frames) of speech ➔ x
Transform x into an “AudioData” and then
send it to Google Speech Recognition engine
to get a recognition output “text”.
To get speech data from a circular buffer is
quite an issue for implementation. !!
26. 26
def 語音辨認(私):
辨= sr.Recognizer()
while self.語音辨認中==True:
#
# Get x as "singingVoice" to be 音
#
音= sr.AudioData(x, 私.取樣率, 私.樣本寬)
#
# Do ASR to get recognition Result as 文
#
try:
if lang=='ja':
文= 辨.recognize_google(音, language='ja')
elif lang=='en':
文= 辨.recognize_google(音, language='en')
elif lang= 'zh-TW'
文= 辨.recognize_google(音, language='zh-TW')
else:
私.文= '{} ({})'.format(文, lang)
except:
私.文= 'exceptionOccurs!!'
pass
return
27. Lyric Transcription
• Melodic voice (singing) recognition
• Timed Text Generation
• Need do Speech recognition and
segmentation
• Currently, it was done by human,
not yet by machine.
27
28. Kara OK
• Pitch Tracking
• Timed Text Displaying
28
https://youtu.be/F1_Xz1c5AEE