This document summarizes a seminar presentation on comparing the acoustic features of MFCC and CZT-based cepstrum for speech recognition. MFCC is commonly used in speech recognition but the presentation evaluates CZT as an alternative. An experiment was conducted with 6 speakers, testing recognition rates using MFCC, CZT, and a combination. The results showed that combining MFCC and CZT-based features achieved the highest recognition rate of 86.25%, demonstrating CZT's ability to enhance frequency resolution of transient speech signals.
1. SEMINAR ON
Acoustic Feature Comparison of MFCC and
CZT-based Cepstrum for Speech Recognition
Guided by:- Presented by:-
Prof. R.V.Pawar Neehal B. Jiwane
2. Introduction
The Mel-Frequency Cepstral Coefficients (MFCC) are the most
widely used features in speech recognition field.
Automatic speech recognition (ASR) systems.
Feature extraction.
The MFCC parameters perform better than others in the recognition
accuracy.
8. Testing Testing Correct Percenta
Set Number Number ge
%
0 8 6 75 Cepstral MFCC MFCC&CZT-
Coefficients based
1 8 7 87.5
2 8 8 100 Testing 80 80
Number
3 8 6 50
Correct 66 69
4 8 6 62.5 Number
5 8 8 100 Percentage / % 79.825 86.25
6 8 7 87.5
7 8 8 100 fl=300,fh=3000,M=256
8 8 8 100
Table 4. Different Cepstral Coefficients
9 8 8 100
conditions:fl=300,fh=3000,M=512
Table 3. Recognition Rate of the MFCC+CZTBased
9. Conclusion
The design and implementation of the experiment, we come to
the following conclusions, a new approach, called CZT-based
algorithm, was developed to extract speech signals that are highly
transient in nature.
We combine the CZTbased method with MFCC has
demonstrated its superiority over the previously reported MFCC
method in that the frequency resolution of the highly transient
speech signals is much enhanced, with better accuracy,
widespread integration of speech recognition technology into end-
user applications is ahead.
10. REFERENCES
[1] L.R. Rabiner, B.Gold, in: Theory and Application of Digital Signal Processing,
Prentice-Hail, Englewood Cliffs, NJ, 1975, p.393.
[2] J.P. Openshaw, Z.P. Sun, J.S. Mason, "A comparison of composite features
under degraded speech in speaker recognition", Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing.
[3] R. Vergin, D. O’Shaughnessy, V. Gupta, "Compensated mel frequency
cepstrum coefficients", Proceedings of the International Conference on Acoustics,
Speech, and Signal Processing.
[4] Picone J W, "Signal modeling techniques in speech recognition", In
Proceedings of the IEEE,1993,81(9):1215- 1247.
[5] Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient
(MFCC) and Dynamic Time Warping (DTW) Techniques. Lindasalwa Muda,
Mumtaj Begam and I. Elamvazuthi