SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Speech Signal
Processing
Murtadha Al-Sabbagh
• Speech processing is the study
of speech signals and the processing methods of
these signals. The signals are usually processed
in a digital representation, so speech processing
can be regarded as a special case of digital signal
processing, applied to speech signal. Aspects of
speech processing includes the acquisition,
manipulation, storage, transfer and output
of speech signals.
• Speech processing is generally can be divided as:
• 1-recognition (will be discussed here).
• 2-synthesis (will not be discussed here).
Disciplines related to speech
Processing
•1. Signal Processing
•The process of extracting information from speech in efficient
manner
•2. Physics
•The science of understanding the relationship between speech
signal and physiological mechanisms
•3. Pattern recognition
•the set of algorithms to create patterns and match data to them
according to the degree of likeliness
4. Computer Science
To make efficient algorithms for implementing in HW or SW the
methods of speech recognitions system
5. Linguistics
The relationship between sounds , words in a language , the meaning of
those words and the overall meaning of sentences
Speech (phonemes)
•Sentences consists of words , which consists of
phonemes
•A phoneme is a basic unit of a language's spoken sounds
Speech Waveform
Characteristics
•Loudness
Voiced/Unvoiced.
Voiced: speech cords vibrating (periodic)
Unvoiced: speech cords not vibrating (aperiodic)
Pitch.
•Spectral envelope:
•Formants :the spectral peaks of the sound spectrum
Aspects of Speech processing
we will take
• Pre-processing
• Feature extraction (Analysis)
• Recognition
Pre-proceesing
Pre-processing
•We can treat (pre-process) speech signal after it has been
received by as an analog signal with three general ways:
•1-In time domain (Speech Wave)
2-In frequency domain(Spectral Envelope)
3-Combination (Spectrogram)
freq..
Energy
1KHz 2KHz
Time domain
Speech is captured by a microphone , e.g.
sampled periodically ( 16KHz) by an analogue-to-digital converter (ADC)
Each sample converted is a 16-bit data.
If sampling is too slow, sampling may fail (Nyquist Theorem)
• A sound is sampled at 22-KHz and resolution is 16 bit.
How many bytes are needed to store the sound wave
for 10 seconds?
• Answer:
• One second has 22K samples , so for 10 seconds: 22K
x 2bytes x 10 seconds =440K bytes
• *note: 2 bytes are used because 16-bit = 2 bytes
Time framing
• Since our ear cannot response to very fast change of speech
data content, we normally cut the speech data into frames
before analysis. (similar to watch fast changing still pictures to
perceive motion )
• Frame size is 10-30ms
• Frames can be overlapped, normally the overlapping region
ranges from 0 to 75% of the frame size .
Time framing : Continued…
 For a 22-KHz/16 bit sampling speech wave, frame size is 15 ms and frame
overlapping period is 40 % of the frame size. Draw the frame block diagram.
 Answer:
 Number of samples in one frame (N)= 15 ms / (1/22k)=330
 Overlapping samples = 132, m=N-132=198.
 x=Overlapping time = 132 * (1/22k)=6ms;
 Time in one frame= 330* (1/22k)=15ms.
i=1 (first window), length
=N
m
N
i=2 (second window)
n
s
n
time
x
The frequency domain
•Use DFT or FFT to transform the wave from time domain to
frequency domain (i.e. to spectral envelope).
complexisso,
numberscomplex12are
which...afterdomian)(FrequecnyOutput
samples)Ntotal(...domain)(timeInput
1),sin()cos(and,
2
,...,3,2,1,0,
numbers)(realnumbers)(complex
,2/,2,1,0
,1,2,1,01,..2,1,0
1
0
2
1..,2,1,02/.,1,0
m
j
mm
N
NNk
j
N
k
N
km
j
km
NkNm
XeXX
)(N/
XXXXFT
SSSSS
jje
N
meSX
}SFT {X
m





















|Xm|= (real2+imginary2)^0.5
The frequency domain :Continued
freq..
Energy
1KHz 2KHz
The spectrogram:to see the spectral
envelopeas time movesforward
Specgram: The white
bands are the formants
which represent high
energy frequency
contents of the speech
signal
Feature Extraction
(Analysis)
feature extraction techniques
(A) Filtering
• Ways to find the spectral envelope
• Filter banks: uniform
• Filter banks can also be non-uniform
• LPC and Cepstral LPC parameters
filter1
output
filter2
output
filter3
output
Spectral
envelop
energy
Spectral envelope SEar=“ar”
Speech recognition idea using 4 linear
filters, each bandwidth is 2.5KHz
• Two sounds with two Spectral Envelopes SEar,SEei ,E.g. Spectral
Envelop (SE) “ar”, Spectral envelop “ei”
energy
energy
Freq.
Freq.
Spectrum A Spectrum B
filter 1 2 3 4
filter 1 2 3 4
v1 v2 v3 v4 w1 w2 w3 w4
Spectral envelope SEei=“ei”
Filter
out
Filter
out
10KHz10KHz
0 0
Difference between two sounds (or
spectral envelopes SE SE’)
• Difference between two sounds, E.g.
• SEar={v1,v2,v3,v4}=“ar”,
• SEei={w1,w2,w3,w4}=“ei”
• A simple measure of the difference is
• Dist =sqrt(|v1-w1|2+|v2-w2|2+|v3-w3|2+|v4-w4|2)
• Where |x|=magnitude of x
(B) Linear Predictive coding LPC
•The concept is to find a set of parameters ie. 1, 2, 3, 4,.. p=8 to represent the
same waveform (typical values of p=8->13)
1, 2, 3, 4,.. 8
Each time frame y=512 samples
(S0,S1,S2,. Sn,SN-1=511)
512 integer numbers (16-bit each)
Each set has 8 floating point
numbers (data compressed)
’1, ’2, ’3, ’4,.. ’8
’’1, ’’2, ’’3, ’’4,..
’’8
:
Can reconstruct the waveform from
these LPC codes
Time frame y
Time frame y+1
Time frame y+2
Input waveform
30ms
30ms
30ms
For example




















































ppppp
p
p
r
r
r
a
a
a
rrrr
rrr
rrrr
rrrr
:
:
:
:
...,
:...,:::
:...,
...,
...,
2
1
2
1
0321
012
2101
1210
example
• A speech waveform S has the values
s0,s1,s2,s3,s4,s5,s6,s7,s8= [1,3,2,1,4,1,2,4,3]. The frame
size is 4.
• Find auto-correlation parameter r0, r1, r2 for the first frame.
• If we use LPC order 2 for our feature extraction system, find
LPC coefficients a1, a2.
Answer:
• Frame size=4, first frame is [1,3,2,1]
• r0=1x1+ 3x3 +2x2 +1x1=15
• r1= 3x1 +2x3 +1x2=11
• r2= 2x1 +1x3=5
•
0.4423-
1.0577
2
1
5
11
1511
1115
2
1
5
11
2
1
1511
1115
2
1
2
1
01
10










































































a
a
inv
a
a
a
a
r
r
a
a
rr
rr
(C) Cepstrum
A new word by reversing the first 4 letters of spectrum
cepstrum.
It is the spectrum of a spectrum of a signal.
Glottis and cepstrum
Speechwave(S)= Excitation(E) . Filter (H)
•
(H)
(Vocal
tract filter)Output
So voice has a
strong glottis
Excitation
Frequency
content
In Ceptsrum
We can easily
identify and
remove the
glottal excitation
Glottal excitation
From
Vocal cords
(Glottis)
(E)
(S)
Cepstral analysis
• Signal(s)=convolution(*) of
• glottal excitation (e) and vocal_tract_filter (h)
• s(n)=e(n)*h(n), n is time index
• After Fourier transform FT: FT{s(n)}=FT{e(n)*h(n)}
• Convolution(*) becomes multiplication (.)
• n(time) w(frequency),
• S(w) = E(w).H(w)
• Find Magnitude of the spectrum
• |S(w)| = |E(w)|.|H(w)|
• log10 |S(w)|= log10{|E(w)|}+ log10{|H(w)|}
Ref: http://iitg.vlab.co.in/?sub=59&brch=164&sim=615&cnt=1
Cepstrum
• C(n)=IDFT[log10 |S(w)|]=
• IDFT[ log10{|E(w)|} + log10{|H(w)|} ]
• In c(n), you can see E(n) and H(n) at two different positions
• Application: useful for (i) glottal excitation (ii) vocal tract filter
analysis
windowing DFT Log|x(w)| IDFT
X(n) X(w) Log|x(w)|
N=time index
w=frequency
I-DFT=Inverse-discrete Fourier transform
S(n) C(n)
•
Glottal excitation cepstrum
Vocal track
cepstrum
s(n) time
domain signal
x(n)=windowed(s(n))
Suppress two sides
|x(w)|=
Log (|x(w)|)
C(n)=
iDft(Log (|x(w)|))
gives Cepstrum
Liftering (to remove glottal
excitation)
• Low time liftering:
• Magnify (or Inspect) the
low time to find the
vocal tract filter
cepstrum
• High time liftering:
• Magnify (or Inspect) the
high time to find the
glottal excitation
cepstrum (remove this
part for speech
recognition.
Glottal excitation
Cepstrum, useless for
speech recognition,
Frequency =FS/ quefrency
FS=sample frequency
=22050
Vocal tract
Cepstrum
Used for
Speech
recognitio
n
Cut-off Found
by experiment
Reasons for liftering
Cepstrum of speech
• Why we need this?
• Answer: remove the ripples
• of the spectrum caused by
• glottal excitation.
Input speech signal x
Spectrum of x
Too many ripples in the
spectrum caused by vocal
cord vibrations (glottal
excitation).
But we are more interested in
the speech envelope for
recognition and reproduction
Fourier
Transform
http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf
Speech Recognition
Speech Recognition
• speech recognition (SR) is the translation of
spoken words into text. It is also known as
"automatic speech recognition" (ASR),
"computer speech recognition", or just "speech
to text" (STT).
speech recognition procedure
We will inplement all the methods we have taken to connect all
the dots and clarify the recognition system , note that only step
(4) is regarded to recognition process and the other points are
connected to the other parts we have taken before.
Steps
1. End-point detection
2. (2a) Frame blocking and (2b) Windowing
3. Feature extraction
Find cepstral cofficients by LPC
1. Auto-correlation analysis
2. LPC analysis,
3. Find Cepstral coefficients,
4. Distortion measure calculations
Step1: Get one frame and
execute end point detection
• To determine the start and end points of the speech
sound
• It is not always easy since the energy of the starting
energy is always low.
• Determined by energy & zero crossing rate
recorded
end-point
detected
n
s(n)
In our example it
is about 1 second
Step2(a):Frame blockingand Windowing
• To choose the frame size (N samples )and adjacent frames
separated by m samples.
• I.e.. a 16KHz sampling signal, a 10ms window has N=160
samples, m=40 samples.
m
N
N
n
sn
l=2 window, length = N
l=1 window, length = N
Step2(b): Windowing
•To smooth out the discontinuities at the beginning and end.
•Hamming or Hanning windows can be used.
•Hamming window
•Tutorial: write a program segment to find the result of passing
a speech frame, stored in an array int s[1000], into the
Hamming window.
10
1
2
cos46.054.0)()()(
~









Nn
N
n
nWnSnS

Effectof Hammingwindow
)(*)(
)(
~
nWnS
nS 
10
1
2
cos46.054.0
)()(
)(
~











Nn
N
n
nWnS
nS

)(nS
)(
~
nS
)(nW
Step3.1: Auto-correlation analysis
• Auto-correlation of every frame (l =1,2,..)of a
windowed signal is calculated.
• If the required output is p-th ordered LPC
• Auto-correlation for the l-th frame is
pm
mnSSmr l
mN
n
ll
,..,1,0
)(
~~
)(
1
0

 


Step3.2:LPCcalculation
TocalculateLPCcoefficintsvector




















































ppppp
p
p
r
r
r
a
a
a
rrrr
rrr
rrrr
rrrr
:
:
:
:
...,
:...,:::
:...,
...,
...,
2
1
2
1
0321
012
2101
1210
Step3.3: LPC to Cepstral
coefficients conversion
• Cepstral coefficient is more accurate in describing the
characteristics of speech signal
• Normally cepstral coefficients of order 1<=m<=p are
enough to describe the speech signal.
• Calculate c1, c2, c3,.. cp from LPC a1, a2, a3,.. ap
)neededif(,
1,
1
1
1
00
pmac
m
k
c
pmac
m
k
ac
rc
kmk
m
pmk
m
kmk
m
k
mm















Step(4) Matching method:
Dynamic programming DP
• Correlation is a simply method for pattern
matching BUT:
• The most difficult problem in speech
recognition is time alignment. No two
speech sounds are exactly the same even
produced by the same person.
• Align the speech features by an elastic
matching method -- DP.
(B) Dynamic programming algorithm
• Step 1: calculate the distortion matrix dist( )
• Step 2: calculate the accumulated matrix
• by using
D( i, j)D( i-1, j)
D( i, j-1)D( i-1, j-1)














1,(
),,1(
),1,1(
min),(),(
jiD
jiD
jiD
jidistjiD
To findthe optimalpathin the accumulatedmatrix
(and the minimumaccumulateddistortion/distance)
• Starting from the top row and right most column, find
the lowest cost D (i,j)t : it is found to be the cell at
(i,j)=(3,5), D(3,5)=7 in the top row. *(this cost is called
the “minimum accumulated distance” , or “minimum
accumulated distortion”)
• From the lowest cost position p(i,j)t, find the next
position (i,j)t-1 =argument_min_i,j{D(i-1,j), D(i-1,j-1),
D(i,j-1)}.
• E.g. p(i,j)t-1 =argument_mini,j{11,5,12)} = 5 is selected.
• Repeat above until the path reaches the left most
column or the lowest row.
• Note: argument_min_i,j{cell1, cell2, cell3} means the
argument i,j of the cell with the lowest value is selected.
Optimal path
• It should be from any element in the top row or right
most column to any element in the bottom row or left
most column.
• The reason is noise may be corrupting elements at the
beginning or the end of the input sequence.
• However, in fact, in actual processing the path should be
restrained near the 45 degree diagonal (from bottom left
to top right), see the attached diagram, the path cannot
passes the restricted regions. The user can set this
regions manually. That is a way to prohibit
unrecognizable matches. See next page.
Optimal path and restricted
regions
•
Example: for DP
• The Cepstrum codes of the speech sounds of ‘YES’and ‘NO’
and an unknown ‘input’ are shown. Is the ‘input’ = ‘Yes’ or
‘NO’?
YES' 2 4 6 9 3 4 5 8 1
NO' 7 6 2 4 7 6 10 4 5
Input 3 5 5 8 4 2 3 7 2
 



2
')( xxdistdistortion
• Answer
• Starting from the top row and
right most column, find the
lowest cost D (i,j)t : it is found
to be the cell at (i,j)=(9,9),
D(9,9)=13.
• From the lowest cost position
(i,j)t, find the next position
(i,j)t-1
• =argument_mini,j{D(i-1,j), D(i-
1,j-1), D(i,j-1)}. E.g. position
(i,j)t-1
=argument_mini,j{48,12,47)}
=(9-1,9-1)=(8,8) that contains
“12” is selected.
• Repeat above until the path
reaches the right most column
or the lowest row.
• Note: argument_min_i,j{cell1,
cell2, cell3} means the
argument i,j of the cell with the
lowest value is selected.
•
Thank you ^_~

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Speaker Recognition
Speaker RecognitionSpeaker Recognition
Speaker Recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Fingerprint recognition
Fingerprint recognitionFingerprint recognition
Fingerprint recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Digital Image Processing - Image Restoration
Digital Image Processing - Image RestorationDigital Image Processing - Image Restoration
Digital Image Processing - Image Restoration
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Image compression models
Image compression modelsImage compression models
Image compression models
 

Ähnlich wie Speech Signal Processing

DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IAmr E. Mohamed
 
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsDDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsAmr E. Mohamed
 
Digital communication
Digital communicationDigital communication
Digital communicationmeashi
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxHamzaJaved306957
 
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐCHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐlykhnh386525
 
E media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationE media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationGiacomo Vairetti
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
 
Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with NotesLeon Nguyen
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love BucharestStefan Adam
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 
Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionTakuya Yoshioka
 

Ähnlich wie Speech Signal Processing (20)

Sampling
SamplingSampling
Sampling
 
ASR_final
ASR_finalASR_final
ASR_final
 
H0814247
H0814247H0814247
H0814247
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsDDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
 
Digital communication
Digital communicationDigital communication
Digital communication
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐCHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
 
K31074076
K31074076K31074076
K31074076
 
E media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationE media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberation
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
add9.5.ppt
add9.5.pptadd9.5.ppt
add9.5.ppt
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with Notes
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognition
 
sampling-alising.pdf
sampling-alising.pdfsampling-alising.pdf
sampling-alising.pdf
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Speech Signal Processing

  • 2. • Speech processing is the study of speech signals and the processing methods of these signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. • Speech processing is generally can be divided as: • 1-recognition (will be discussed here). • 2-synthesis (will not be discussed here).
  • 3. Disciplines related to speech Processing •1. Signal Processing •The process of extracting information from speech in efficient manner •2. Physics •The science of understanding the relationship between speech signal and physiological mechanisms •3. Pattern recognition •the set of algorithms to create patterns and match data to them according to the degree of likeliness 4. Computer Science To make efficient algorithms for implementing in HW or SW the methods of speech recognitions system 5. Linguistics The relationship between sounds , words in a language , the meaning of those words and the overall meaning of sentences
  • 4. Speech (phonemes) •Sentences consists of words , which consists of phonemes •A phoneme is a basic unit of a language's spoken sounds
  • 5. Speech Waveform Characteristics •Loudness Voiced/Unvoiced. Voiced: speech cords vibrating (periodic) Unvoiced: speech cords not vibrating (aperiodic) Pitch. •Spectral envelope: •Formants :the spectral peaks of the sound spectrum
  • 6. Aspects of Speech processing we will take • Pre-processing • Feature extraction (Analysis) • Recognition
  • 8. Pre-processing •We can treat (pre-process) speech signal after it has been received by as an analog signal with three general ways: •1-In time domain (Speech Wave) 2-In frequency domain(Spectral Envelope) 3-Combination (Spectrogram) freq.. Energy 1KHz 2KHz
  • 9. Time domain Speech is captured by a microphone , e.g. sampled periodically ( 16KHz) by an analogue-to-digital converter (ADC) Each sample converted is a 16-bit data. If sampling is too slow, sampling may fail (Nyquist Theorem)
  • 10. • A sound is sampled at 22-KHz and resolution is 16 bit. How many bytes are needed to store the sound wave for 10 seconds? • Answer: • One second has 22K samples , so for 10 seconds: 22K x 2bytes x 10 seconds =440K bytes • *note: 2 bytes are used because 16-bit = 2 bytes
  • 11. Time framing • Since our ear cannot response to very fast change of speech data content, we normally cut the speech data into frames before analysis. (similar to watch fast changing still pictures to perceive motion ) • Frame size is 10-30ms • Frames can be overlapped, normally the overlapping region ranges from 0 to 75% of the frame size .
  • 12. Time framing : Continued…  For a 22-KHz/16 bit sampling speech wave, frame size is 15 ms and frame overlapping period is 40 % of the frame size. Draw the frame block diagram.  Answer:  Number of samples in one frame (N)= 15 ms / (1/22k)=330  Overlapping samples = 132, m=N-132=198.  x=Overlapping time = 132 * (1/22k)=6ms;  Time in one frame= 330* (1/22k)=15ms. i=1 (first window), length =N m N i=2 (second window) n s n time x
  • 13. The frequency domain •Use DFT or FFT to transform the wave from time domain to frequency domain (i.e. to spectral envelope). complexisso, numberscomplex12are which...afterdomian)(FrequecnyOutput samples)Ntotal(...domain)(timeInput 1),sin()cos(and, 2 ,...,3,2,1,0, numbers)(realnumbers)(complex ,2/,2,1,0 ,1,2,1,01,..2,1,0 1 0 2 1..,2,1,02/.,1,0 m j mm N NNk j N k N km j km NkNm XeXX )(N/ XXXXFT SSSSS jje N meSX }SFT {X m                      |Xm|= (real2+imginary2)^0.5
  • 14. The frequency domain :Continued freq.. Energy 1KHz 2KHz
  • 15. The spectrogram:to see the spectral envelopeas time movesforward Specgram: The white bands are the formants which represent high energy frequency contents of the speech signal
  • 18. (A) Filtering • Ways to find the spectral envelope • Filter banks: uniform • Filter banks can also be non-uniform • LPC and Cepstral LPC parameters filter1 output filter2 output filter3 output Spectral envelop energy
  • 19. Spectral envelope SEar=“ar” Speech recognition idea using 4 linear filters, each bandwidth is 2.5KHz • Two sounds with two Spectral Envelopes SEar,SEei ,E.g. Spectral Envelop (SE) “ar”, Spectral envelop “ei” energy energy Freq. Freq. Spectrum A Spectrum B filter 1 2 3 4 filter 1 2 3 4 v1 v2 v3 v4 w1 w2 w3 w4 Spectral envelope SEei=“ei” Filter out Filter out 10KHz10KHz 0 0
  • 20. Difference between two sounds (or spectral envelopes SE SE’) • Difference between two sounds, E.g. • SEar={v1,v2,v3,v4}=“ar”, • SEei={w1,w2,w3,w4}=“ei” • A simple measure of the difference is • Dist =sqrt(|v1-w1|2+|v2-w2|2+|v3-w3|2+|v4-w4|2) • Where |x|=magnitude of x
  • 21. (B) Linear Predictive coding LPC •The concept is to find a set of parameters ie. 1, 2, 3, 4,.. p=8 to represent the same waveform (typical values of p=8->13) 1, 2, 3, 4,.. 8 Each time frame y=512 samples (S0,S1,S2,. Sn,SN-1=511) 512 integer numbers (16-bit each) Each set has 8 floating point numbers (data compressed) ’1, ’2, ’3, ’4,.. ’8 ’’1, ’’2, ’’3, ’’4,.. ’’8 : Can reconstruct the waveform from these LPC codes Time frame y Time frame y+1 Time frame y+2 Input waveform 30ms 30ms 30ms For example
  • 23. example • A speech waveform S has the values s0,s1,s2,s3,s4,s5,s6,s7,s8= [1,3,2,1,4,1,2,4,3]. The frame size is 4. • Find auto-correlation parameter r0, r1, r2 for the first frame. • If we use LPC order 2 for our feature extraction system, find LPC coefficients a1, a2.
  • 24. Answer: • Frame size=4, first frame is [1,3,2,1] • r0=1x1+ 3x3 +2x2 +1x1=15 • r1= 3x1 +2x3 +1x2=11 • r2= 2x1 +1x3=5 • 0.4423- 1.0577 2 1 5 11 1511 1115 2 1 5 11 2 1 1511 1115 2 1 2 1 01 10                                                                           a a inv a a a a r r a a rr rr
  • 25. (C) Cepstrum A new word by reversing the first 4 letters of spectrum cepstrum. It is the spectrum of a spectrum of a signal.
  • 26. Glottis and cepstrum Speechwave(S)= Excitation(E) . Filter (H) • (H) (Vocal tract filter)Output So voice has a strong glottis Excitation Frequency content In Ceptsrum We can easily identify and remove the glottal excitation Glottal excitation From Vocal cords (Glottis) (E) (S)
  • 27. Cepstral analysis • Signal(s)=convolution(*) of • glottal excitation (e) and vocal_tract_filter (h) • s(n)=e(n)*h(n), n is time index • After Fourier transform FT: FT{s(n)}=FT{e(n)*h(n)} • Convolution(*) becomes multiplication (.) • n(time) w(frequency), • S(w) = E(w).H(w) • Find Magnitude of the spectrum • |S(w)| = |E(w)|.|H(w)| • log10 |S(w)|= log10{|E(w)|}+ log10{|H(w)|} Ref: http://iitg.vlab.co.in/?sub=59&brch=164&sim=615&cnt=1
  • 28. Cepstrum • C(n)=IDFT[log10 |S(w)|]= • IDFT[ log10{|E(w)|} + log10{|H(w)|} ] • In c(n), you can see E(n) and H(n) at two different positions • Application: useful for (i) glottal excitation (ii) vocal tract filter analysis windowing DFT Log|x(w)| IDFT X(n) X(w) Log|x(w)| N=time index w=frequency I-DFT=Inverse-discrete Fourier transform S(n) C(n)
  • 29. • Glottal excitation cepstrum Vocal track cepstrum s(n) time domain signal x(n)=windowed(s(n)) Suppress two sides |x(w)|= Log (|x(w)|) C(n)= iDft(Log (|x(w)|)) gives Cepstrum
  • 30. Liftering (to remove glottal excitation) • Low time liftering: • Magnify (or Inspect) the low time to find the vocal tract filter cepstrum • High time liftering: • Magnify (or Inspect) the high time to find the glottal excitation cepstrum (remove this part for speech recognition. Glottal excitation Cepstrum, useless for speech recognition, Frequency =FS/ quefrency FS=sample frequency =22050 Vocal tract Cepstrum Used for Speech recognitio n Cut-off Found by experiment
  • 31. Reasons for liftering Cepstrum of speech • Why we need this? • Answer: remove the ripples • of the spectrum caused by • glottal excitation. Input speech signal x Spectrum of x Too many ripples in the spectrum caused by vocal cord vibrations (glottal excitation). But we are more interested in the speech envelope for recognition and reproduction Fourier Transform http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf
  • 33. Speech Recognition • speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT).
  • 34. speech recognition procedure We will inplement all the methods we have taken to connect all the dots and clarify the recognition system , note that only step (4) is regarded to recognition process and the other points are connected to the other parts we have taken before. Steps 1. End-point detection 2. (2a) Frame blocking and (2b) Windowing 3. Feature extraction Find cepstral cofficients by LPC 1. Auto-correlation analysis 2. LPC analysis, 3. Find Cepstral coefficients, 4. Distortion measure calculations
  • 35. Step1: Get one frame and execute end point detection • To determine the start and end points of the speech sound • It is not always easy since the energy of the starting energy is always low. • Determined by energy & zero crossing rate recorded end-point detected n s(n) In our example it is about 1 second
  • 36. Step2(a):Frame blockingand Windowing • To choose the frame size (N samples )and adjacent frames separated by m samples. • I.e.. a 16KHz sampling signal, a 10ms window has N=160 samples, m=40 samples. m N N n sn l=2 window, length = N l=1 window, length = N
  • 37. Step2(b): Windowing •To smooth out the discontinuities at the beginning and end. •Hamming or Hanning windows can be used. •Hamming window •Tutorial: write a program segment to find the result of passing a speech frame, stored in an array int s[1000], into the Hamming window. 10 1 2 cos46.054.0)()()( ~          Nn N n nWnSnS 
  • 39. Step3.1: Auto-correlation analysis • Auto-correlation of every frame (l =1,2,..)of a windowed signal is calculated. • If the required output is p-th ordered LPC • Auto-correlation for the l-th frame is pm mnSSmr l mN n ll ,..,1,0 )( ~~ )( 1 0     
  • 41. Step3.3: LPC to Cepstral coefficients conversion • Cepstral coefficient is more accurate in describing the characteristics of speech signal • Normally cepstral coefficients of order 1<=m<=p are enough to describe the speech signal. • Calculate c1, c2, c3,.. cp from LPC a1, a2, a3,.. ap )neededif(, 1, 1 1 1 00 pmac m k c pmac m k ac rc kmk m pmk m kmk m k mm               
  • 42. Step(4) Matching method: Dynamic programming DP • Correlation is a simply method for pattern matching BUT: • The most difficult problem in speech recognition is time alignment. No two speech sounds are exactly the same even produced by the same person. • Align the speech features by an elastic matching method -- DP.
  • 43. (B) Dynamic programming algorithm • Step 1: calculate the distortion matrix dist( ) • Step 2: calculate the accumulated matrix • by using D( i, j)D( i-1, j) D( i, j-1)D( i-1, j-1)               1,( ),,1( ),1,1( min),(),( jiD jiD jiD jidistjiD
  • 44. To findthe optimalpathin the accumulatedmatrix (and the minimumaccumulateddistortion/distance) • Starting from the top row and right most column, find the lowest cost D (i,j)t : it is found to be the cell at (i,j)=(3,5), D(3,5)=7 in the top row. *(this cost is called the “minimum accumulated distance” , or “minimum accumulated distortion”) • From the lowest cost position p(i,j)t, find the next position (i,j)t-1 =argument_min_i,j{D(i-1,j), D(i-1,j-1), D(i,j-1)}. • E.g. p(i,j)t-1 =argument_mini,j{11,5,12)} = 5 is selected. • Repeat above until the path reaches the left most column or the lowest row. • Note: argument_min_i,j{cell1, cell2, cell3} means the argument i,j of the cell with the lowest value is selected.
  • 45. Optimal path • It should be from any element in the top row or right most column to any element in the bottom row or left most column. • The reason is noise may be corrupting elements at the beginning or the end of the input sequence. • However, in fact, in actual processing the path should be restrained near the 45 degree diagonal (from bottom left to top right), see the attached diagram, the path cannot passes the restricted regions. The user can set this regions manually. That is a way to prohibit unrecognizable matches. See next page.
  • 46. Optimal path and restricted regions •
  • 47. Example: for DP • The Cepstrum codes of the speech sounds of ‘YES’and ‘NO’ and an unknown ‘input’ are shown. Is the ‘input’ = ‘Yes’ or ‘NO’? YES' 2 4 6 9 3 4 5 8 1 NO' 7 6 2 4 7 6 10 4 5 Input 3 5 5 8 4 2 3 7 2      2 ')( xxdistdistortion
  • 48. • Answer • Starting from the top row and right most column, find the lowest cost D (i,j)t : it is found to be the cell at (i,j)=(9,9), D(9,9)=13. • From the lowest cost position (i,j)t, find the next position (i,j)t-1 • =argument_mini,j{D(i-1,j), D(i- 1,j-1), D(i,j-1)}. E.g. position (i,j)t-1 =argument_mini,j{48,12,47)} =(9-1,9-1)=(8,8) that contains “12” is selected. • Repeat above until the path reaches the right most column or the lowest row. • Note: argument_min_i,j{cell1, cell2, cell3} means the argument i,j of the cell with the lowest value is selected. •