A presentation done as a part of the final year project during Semester 8 in the under-graduate degree course in engineering.
This presentation explains one of the modules of the project "Speaker and Speech Recognition based Embedded System Design for User Authentication and remote Device Control" which is the Speech Recognition Module.
It effectively explains the Dynamic Time Warping Algorithm used for Speech Recognition and how that is further used along with PIC 16F676 Microcontroller to acquire control of remote devices connected to the system.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Dynamic time warping and PIC 16F676 for control of devices
1.
2. 1st
• Introduction
• Proposed System Overview
• A Simple Speech Recognition System and its Types
• Acquisition of Speech Signal and its Analysis
• Dynamic Time Warping Algorithm for Digit Recognition
2nd
• Introduction
• RS-232-C and Serial Communication with MatlabR2011b
• Serial Communications with PIC 16F676 for Device Control
• Interfacing Circuit Schematics and Design
3rd
• Summary
• Conclusion and Results
• Future Work
4. Discussion So far was with Reference to Implementation of
Speaker Recognition for the process of user Authentication
Goal of the project is to provide access to the Authenticated
user to control the devices connected to the System
Speaker Recognition Speech Recognition Device Control
The control of the devices would be via recognition of the
device Id (digits from 1 to 8) connected to the system
The Recognition of the device id is accomplished using DTW
Algorithm based Speaker Independent Isolated word Recognition
10. DTW Algorithm is based on Dynamic Programming, which is
nothing but a systematic process of comparing 2 sequences of
acoustic feature vectors
It is used for measuring 2 time series which may vary in time
or Speed
Our speech is represented by a series of feature vectors that
are computed every 10ms
This technique is used to find optimal assignment between
2 time series of acoustic feature vectors
If one of the time series is “warped” non-linearly by stretching or
shrinking along its time axis then this technique of obtaining time frames
of comparable length is called “Time Warping”
11. Whole words comprises of dozens of feature vectors. The no
of vectors depends upon how fast we speak.
Let us consider an example of a word ‘ w ’ having a vector
sequence x̂ which is to be compared with a known seq. ŵ
We need to measure the distances between these vector
sequences to determine its similarity
During the computation of distances we need to assign a “Optimal
Assignment” between the individual vector pairs and also compute
distances between the pairs
However words with different lengths of sequence vectors needs to
be taken into consideration for that pupose consider the following
diagram
12.
13. • The length Lp of the path is determined by max. no of vectors in x̂ and ŵ
• The assignment between x̂ and ŵ as given by P and it can be interpreted as
time warping between the time axes of x and w
• Thus by time warping different length of vector sequences can be
cmpensated
• For a given path P the distances between vector sequences can now be
computed as the sum of the distances between individual vectors
• d(gl) denotes the vector distance for the time indices i and j defined by the
grid point
gl={I,j} this distance would be the Euclidian distance
14. • The criterium of finding the optimal path Popt os to minimize the distance D(x̂,ŵ, P)
• However it is not necessary to compute all the paths P and the corresponding distances
D to determine which is the optimum
• Since feature vectors are measured in short time intervals we restrict time warping to
reasonable boundaries. For this pupose we need to understand local path alternatives
• The first and last vectors of X and W should be
assigned to each other
• To locally wrap the duration of the speech signal
we “reuse” the preceding vectors to restrict time
warping, with these restrictions we can draw
local path alternatives
• The grid pt. (i,j) can have the possible
predecessor path (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1)
• Popt will be the concatenation of these local
path alternatives
15. • Now that we have defined the local pathalternatives we can use
Bellman’s principle to find the optimal path Popt
• Bellman’s principle states the following:
If Popt is the optimal path through the matrix of grid points beginning at
(0, 0) and ending at (TW −1, TX −1), and the grid point (i, j) is part of path
Popt, then the partial path from (0, 0) to (i, j) is also part of Popt.
• Only 3 possible predecessor paths: (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1)
• Now let us assume we have calculated the optimal paths considering the
above 3 paths and its corresponding accumulated distance
• We can mow find the optimal path from(0,0) to grid point (i,j) by
selecting exactly the one path hypothesis which minimizes the
accumulated distance
• Since the decision for the best predecessor path hypothesis reduces the
number of paths leading to grid point (i, j) to exactly one, it is also said
that the possible path hypotheses are recombined during the
optimization step.
𝛿
𝛿(i,j)
17. 1st
• Introduction
• Proposed System Overview
• A Simple Speech Recognition System and its Types
• Acquisition of Speech Signal and its Analysis
• Dynamic Time Warping Algorithm for Digit Recognition
2nd
• Introduction
• RS-232-C and Serial Communication with MatlabR2011b
• Serial Communications with PIC 16F676 for Device Control
• Interfacing Circuit Schematics and Design
3rd
• Summary
• Conclusion and Results
• Future Work
20. • The RS-232-C convention specifies that, with respect to ground, a voltage
more negative than -3 V is interpreted as a 1 bit and a voltage more
positive than +3 V as a 0 bit.
• Serial communications, according to RS-232-C, require that transmitter
and receiver agree on a communications protocol.
21.
22. Serial communications in MatlbR2011b is possible by writing scripts which
initializes a special variable to keep track of serial connections – the Serial
Object.
Unlike normal variables which have a single value, objects have many
"attributes" or parameters that can be set. (ex. port number, baud rate, buffer
size, etc.) One of those attributes is the port number. A label that corresponds
to which port your device is connected to.
In order to send or receive data through the serial port object it must be open.
When not in use it can be closed (not the same as deleting it). We can have
many different serial objects in memory.
They can all send and receive data at the same time as long as they are each
on a different port. There can even be several objects associated with the
same physical port. However, only one of those objects associated with a
given port can actually be open (sending or receiving data) at any time.
23. a. Creating a Serial Port Object:
serialPort = serial('com1')
Resulting Intializations:
1.Serial Port Object : Serial-COM1
2.Communication Settings 3.Communication State
Port: COM1 Status: closed
BaudRate: 9600 RecordStatus: off
4.Terminator: 'LF'
5.Read/Write State
TransferStatus: idle ValuesReceived: 0
BytesAvailable: 0 ValuesSent: 0
b. Setting the Parameters
get(serialPort, 'baudrate') set(serialPort, 'BaudRate', 19200)
ans =9600
get(serialPort, 'BaudRate')
ans =19200
24. The method described previously is cumbersome if we have a lot of things that
we want to change. A better way to to set them when you create the Serial
object.
serialPort_new = serial('com1', 'baudrate', 19200, 'terminator', 'CR')
• Writing To The Serial Port
Before we can write to the serial port, you need to open it:
fopen(‘COM1’)
• Writing Binary Data
Use the command fwrite to send four bytes of binary data
fwrite(COM1, [0, 12, 117, 251]);
• Reading From The Serial Port
You can use fread to read in data (not text). It can automatically
format the data for you. Here is an example. Say the buffer
currently has 2 bytes of data in it
a = fread(serialObj, 2);% Will read two bytes and create a vector
25. Establish Serial Port
Communication with
Matlab
Acquire Results of
User
Authentication
Display Results of
the Authenticated
User
Display the Speech
Recognition Menu and
accept the Device Id utterd
by the authenticated User
Send the Identified device ID
via the Serial port to PIC to
toggle the current status of
the device
Overview of the system
26. Registers use in Asynchronous Mode
1. The SPBRG register is set up for the selected baud rate.
2. Asynchronous reception is enabled by clearing the SYNC bit in the TXSTA
register and setting the SPEN bit in the RCSTA register
27. 3. To enable the receive data interrupt, the RCIE, GIE, and PEIE bits must be set.
4. Reception is activated by setting the CREN bit in RCSTA.
5. When reception has concluded, the RCIF bit in the PIR1 register is set.
6. Received data is retrieved by reading RCREG.
7. If any error occurred the CREN bit must be cleared
28.
29. 1st
• Introduction
• Proposed System Overview
• A Simple Speech Recognition System and its Types
• Acquisition of Speech Signal and its Analysis
• Dynamic Time Warping Algorithm for Digit Recognition
2nd
• Introduction
• RS-232-C and Serial Communication with MatlabR2011b
• Serial Communications with PIC 16F676 for Device Control
• Interfacing Circuit Schematics and Design
3rd
• Summary
• Conclusion and Results
• Future Work
30. In this Presentation all the Aspects involved in the process of Speaker and
Speech Recognition and the various techniques used to achieve them
have been discussed.
Acquisition of Acoustic feature vectors and matching those vectors
with existing models in the database using Vector quantization and
optimizing it using the LBG algorithm and word identification using
DTW have been dealt with.
Serial communication between Matlab and PIC via the serial port using
the RS-232-C standard is also presented and finally the process of
granting access to the authenticated user for device control has been
dealt with in this presentation.
31. User Speaker Recognition Speech Recognition
Accuracy
(Speaker/
Speech)
Speaker Id No of
attempts
Correctly
Recognized
No of
attempts
CorrectLy
Recognized
1 10 8 10 9 (80/90)
2 10 9 10 8 (90/80)
3 10 8 10 9 (80/90)
4 10 9 10 9 (90/90)
Total 40 34 40 35 (85/86.25)
32. Insert a Class Id
Speech s/g
Duration, fs, no
of bits per sec
Speech S/g
acquisition via
mic using
audiorecorder
function
Feature Extraction
Using Mfcc (s,fs)Frame Blocking
using Hamming
Window
Mel-
frequency
filter bank
34. Speech s/g
Duration, fs, no
of bits per sec
Speech S/g
acquisition via
mic using
audiorecorder
function
Feature Extraction
Using Mfcc (s,fs)
Frame Blocking
using Hamming
Window
Mel-
frequency
filter bank
Feature Matching using
Vqlbg(d,k)
Vq Codebook
35. Vq Codebook
from Training
Phase
Vq Codebook
from Testing
Phase
Comparison
of Euclidian
Distances
User Id with Lowest Euclidian
Distance is Authenticated
38. Selection of
Optimal path
Sends the results
of recognition
word to COM port
Signal(device id)
received by PIC and the
corresponding device is
toggled
39. • The System proposed could be improved to a great extent by implementing more efficient
models for speaker Identification such as Hidden Markov Models (HMM) This uses theory
from statistics in order to (sort of) arrange our feature vectors into a Markov matrix (chains)
that stores probabilities of state transitions.
• Along with Speaker Recognition an added level of voice based biometric security could also
be provided using Speech Recognition, that is after verifying who the user , acquire some
specific keyword unique to the system.Also Integration of mobile phone based sytem access
would mean controlling any system from almost anywhere in thee world.
• The Fuzzy c-means clustering technique improves VQ performance at the classification stage.
The FVQ performance can be improved more by using a fuzzy-based hierarchical clustering
approach proposed by Haipeng.
• The performance of GMM is better than the other classifiers, even though FVQ improves the
ASR performance significantly when compared to the other VQ techniques. Additionalwork in
the area of enhanced or alternative fuzzy clustering techniques is appropriate.