SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 12, No. 1, February 2022, pp. 376~382
ISSN: 2088-8708, DOI: 10.11591/ijece.v12i1.pp376-382  376
Journal homepage: http://ijece.iaescore.com
Comparative study to realize an automatic speaker recognition
system
Fadwa Abakarim, Abdenbi Abenaou
Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied Sciences,
Ibn Zohr University, Agadir, Morocco
Article Info ABSTRACT
Article history:
Received Jan 15, 2021
Revised Jul 13, 2021
Accepted Jul 26, 2021
In this research, we present an automatic speaker recognition system based
on adaptive orthogonal transformations. To obtain the informative features
with a minimum dimension from the input signals, we created an adaptive
operator, which helped to identify the speaker’s voice in a fast and efficient
manner. We test the efficiency and the performance of our method by
comparing it with another approach, mel-frequency cepstral coefficients
(MFCCs), which is widely used by researchers as their feature extraction
method. The experimental results show the importance of creating the
adaptive operator, which gives added value to the proposed approach. The
performance of the system achieved 96.8% accuracy using Fourier transform
as a compression method and 98.1% using Correlation as a compression
method.
Keywords:
Adaptive orthogonal transform
Automatic speech recognition
DTW
MFCCs
Speaker recognition system This is an open access article under the CC BY-SA license.
Corresponding Author:
Fadwa Abakarim
Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied
Sciences, Ibn Zohr University
B.P 1136, Agadir 80000, Morocco
Email: fadwa.abakarim@gmail.com
1. INTRODUCTION
Automatic speech recognition is a computer technique that is commonly used in several systems. In
car systems, speech recognition identifies the driver's voice to start responding to their commands such as
playing music, activating global positioning system (GPS), launching phone calls, and selecting radio
stations. In the field of language education, speech recognition can teach proper pronunciation and help
people to develop their oral expression, and also it facilitates education for blind students. In this context, the
research proposed a method to identify the speaker's voice using adaptive orthogonal transformations [1] and
comparing it with the method of mel-frequency cepstral coefficients (MFCCs) [2]-[4].
In order to identify the speaker’s voice several methods are used to extract the special features of
each voice, among them mel-frequency cepstral coefficients. Although numerous researchers chose it as their
feature extraction method because of its several advantages [5], [6], it reaches its limit in the improvement of
automatic speaker recognition system as described by references [7]-[9]. It needs a large voice training
dataset and a long execution time to identify the voice of each speaker [10] and the same goes for other
approaches such as principal component analysis (PCA), discrete wavelet transform (DWT) and empirical
modal decomposition (EMD) as revealed by reference [11].
Janse et al. [12] presented a comparative study between mel-frequency cepstral coefficients and
discrete wavelet transform, where it mentioned that MFCCs values are not very robust in the presence of
additive noise and that DWT requires a longer compression time. Winursito et al. [13] combined MFCCs
with data reduction methods with the aim of improving the accuracy and increasing the computational speed
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim)
377
of the classification process by decreasing the dimensions of the feature data. The data reduction process is
designed in two versions: MFCC+SVD version 1 and MFCC+PCA version 2. The results showed a
performance improvement for the proposed approach. Wang and Lawlor [14] proposed a method for a
speaker recognition system by combining MFCCs with back-propagation neural networks. It revealed that
this approach works successfully only when the number of unfamiliar speakers is not too large.
From these research studies, we deduced that many authors have used MFCCs as a feature
extraction approach and to strengthen their methodologies, they have used other approaches in the
classification process to obtain the desired results. In addition, other authors have developed new methods by
addressing the limitations of MFCCs in order to obtain an improved algorithm, which is not sensitive to
noise, and has a fast execution time. The goal of this study is to solve the problems mentioned above by
developing a fast algorithm based on adaptive orthogonal transformations for the extraction of the
informative features from the voice signal using the smallest possible training dataset, inspired by references
[1], [15]. This paper is organized as follows: Section 2 describes the new approach of orthogonal operators,
then the comparison results obtained between MFCCs and the proposed method are discussed in section 3, and
finally section 4 concludes the paper.
2. RESEARCH METHOD
2.1. Pre-processing
Before starting to apply the proposed approach, it is first necessary to pre-process the signals as
shown in Figure 1. This involves firstly removing silence, then secondly detecting the beginning and the end
of the speech by using the zero-crossing rate (ZCR) [16]-[19]. The third step is making them equal in length
by using zero padding [20], [21] because the training dataset can contain several signals that do not have the
same length. The final step is compressing their size without losing quality to avoid the problem of system
slowness by using Fourier transform [22], [23] or correlation [24], [25]. Figure 2 shows the input signal
before and after removing silence with detection of the beginning and the end of speech. Figure 3 shows the
speech signal after applying the Fourier transform method to detect the informative intervals. Figure 4 shows
the speech signal after applying correlation to detect the informative intervals.
Figure 1. The pre-processing part
Figure 2. The input signal before and after removing silence with detection of the beginning and the end of
speech
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382
378
Figure 3. The speech signal after applying the Fourier transform
Figure 4. The speech signal after applying correlation
2.2. Theoretical background
Our approach consists in searching the informative features of the signal by using the operator H,
which is a matrix operator of the transform (dimension N×N) whose number of rows corresponds to the
number of basic functions. To decompose the vector X, the calculation of the discrete spectrum Y with the
numerical methods can be represented by the following matrix [1], [15], [26]:
𝑌 =
1
𝑁
𝐻𝑋 (1)
where
𝑋 = [𝑥1, 𝑥2, … , 𝑥𝑁]𝑇
is the initial signal to be transformed (size Ν = 2𝑛
).
𝑌 = [𝑦1, 𝑦2, … , 𝑦𝑁]𝑇
is the vector of the spectral coefficients, calculated by the operator orthogonal H.
The calculation of the spectrum 𝑌 by using (1) requires 𝑁2
multiplication and addition operations.
The most efficient way to reduce the number of operations is to use a sparse matrix [27] where most of its
elements are zero, which will make the calculation and execution time of the algorithm faster. The method of
Good [28] which is used in the construction of fast transformation algorithms consists in expressing the
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim)
379
orthogonal spectral operator 𝐻 as a product of sparse matrices 𝐺𝑖 composed by minimum dimensional
matrices called spectral kernels, where these matrices take the following form:
𝑉𝑖,𝑗(𝛼𝑖,𝑗) = [
cos (𝛼𝑖,𝑗) sin (𝛼𝑖,𝑗)
sin (𝛼𝑖,𝑗) −cos (𝛼𝑖,𝑗)
] (2)
with 𝛼 𝜖 [0,2𝜋].
Then 𝐺𝑖 will be written as (3):
𝐺𝑖 =
[
𝑉𝑖,1 0 … 0
0 𝑉𝑖,2 0 0
⋮ 0 ⋱ 0
0 0 0 𝑉𝑖,𝑁
2
⁄ ]
(3)
with 𝑖 = 1 … 𝑛 , 𝑛 = log2 𝑁 which is the number of matrices 𝐺𝑖, then each matrix 𝐺𝑖 contains
𝑁
2
spectral
kernels of 𝑉𝑖,𝑗(𝛼𝑖,𝑗) of dimension 2 × 2 .
So, the formula of 𝑌 will be:
𝑌 =
1
𝑁
𝐻𝑋 =
1
𝑁
𝐺1𝐺2 … 𝐺𝑛𝑋 =
1
𝑁
∏ 𝐺𝑖𝑋
𝑛
𝑖=1 (4)
The algorithm goes through a procedure of adaptation of the operator 𝐻 to a class of input signals. It
consists in calculating the average of the statistical features at the pre-processing part (section 2 part 1) to
form the standard vector 𝑅
̂𝑠𝑑. We can say that the operator 𝐻 is adapted to a class of signals represented by a
standard vector 𝑅
̂𝑠𝑑 if it verifies the following condition:
1
𝛮
𝐻𝑎𝑅
̂𝑠𝑑 = 𝑌𝑡 = [𝑦𝑡1, 0, … ,0]𝑇
with 𝑦𝑡1 ≠ 0 (5)
where 𝑌𝑡 is the target vector that constructs the adaptation criterion of the operator 𝐻𝑎 to 𝑅
̂𝑠𝑑 .
The target vector 𝑌𝑡 is calculated as (6):
𝑌𝑖 = 𝐺𝑖𝑌𝑖−1 (6)
with 𝑖 = 1 … log2 𝑁 and 𝑌0 = 𝑅
̂𝑠𝑑.
In a simplified way, the synthesis procedure of the operator of orthogonal transformation is as follows:
For 𝑖 = 1, 𝑌1 = 𝐺1𝑅
̂𝑠𝑑 with 𝑌1 contains
𝑁
21 non-zero number of elements.
For 𝑖 = 2, 𝑌2 = 𝐺2𝑌1 with 𝑌2 contains
𝑁
22 non-zero number of elements.
For 𝑖 = 𝑛, 𝑌𝑛 = 𝑌𝑡 = 𝐺𝑛𝑌𝑛−1 with 𝑌𝑡 contains
𝑁
2𝑛 non-zero number of elements.
Then the calculation of the orthogonal spectral operator is:
𝐻𝑎 = 𝐺𝑛𝐺𝑛−1 … 𝐺1 (7)
Figure 5 shows the overall process to extract the informative features from the input signals by using
our approach:
As shown in Figure 5, the extraction of the informative features consists of 7 steps:
− Step 1: Input signals go through a pre-processing process (section 2 part 1).
− Step 2: We calculate the average of the statistical features obtained during the pre-processing part (using
Fourier transform and correlation).
− Step 3: The operator synthesis algorithm is applied to the average of the statistical features obtained from
the previous step.
− Step 4: The output of the algorithm is the adaptive operator H.
− Step 5: The projection multiplication is applied between the operator H and the rest of the statistical
features.
− Step 6: The result of the previous operation is a set of informative features that characterize each signal of
the class.
− Step 7: The average of the feature vectors is calculated. The result is an informative feature vector with a
minimum dimension that characterizes the whole class.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382
380
Figure 5. The overall process to extract the informative features from the input signals by using the adaptive
orthogonal transform method
2.3. Datasets
The speech signals are recorded and filtered by the Audacity program. Each signal is recorded by
default at 22 kHz with a duration between 1 s and 2 s. The training dataset contains 10 classes, where each
class contains 100 voice recordings of the speaker (Class 1 of Speaker A contains 100 of his voice
recordings, the same applies to Class 2 of Speaker B up to Class 10 of Speaker J).
The test dataset contains 6000 voice recordings of speaker A, B, …, J and other unfamiliar speakers.
To test the similarity between the speaker’s voice in the training dataset and the speaker’s voice in the test
dataset, dynamic time wrapping (DTW) is used. Dynamic time wrapping or DTW [29]-[32] consists in
comparing two voice signals by considering the Euclidean distance between the two vectors obtained by the
applied method, which is defined by (8):
𝐷𝑖 = √∑ (𝑎𝑖 − 𝑏𝑖)2
𝑛
𝑖=1 (8)
with
𝐷𝑖 : The distance between the 𝑖 vector of the spectrum 𝑎 and the 𝑖 vector of the spectrum 𝑏.
𝑛 : Dimension of 𝑎 and 𝑏 spectrum.
Therefore, the vector 𝑎𝑖 will correspond to class i if 𝐷𝑖 = min (𝐷𝑖=1…𝐶) where 𝐶 is the number of classes.
3. RESULTS AND DISCUSSION
The quality of recognition is measured by calculating the recognition rate which is defined as (9).
𝑅𝑎𝑡𝑒 =
𝑡ℎ𝑒 𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑 𝑠𝑝𝑒𝑎𝑘𝑒𝑟𝑠 𝑐𝑜𝑢𝑛𝑡
𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑠𝑖𝑧𝑒
∗ 100 (9)
Tables 1 and 2 show the voice recognition rate according to the size of the interval of the analysis. From
these tables, we observe that the adaptive orthogonal transform method gives good results compared to the
MFCCs approach. As mentioned in section 2 part 1, we used correlation and Fourier transform to work only
with the informative intervals of the signal instead of working with the whole signal. As we can see there is a
47.5% difference in voice identification rates with Fourier transform intervals between using our approach
(96.8%) and MFCCs (49.3%). On the other hand, we found a 45.0% difference in voice identification rates
with correlation intervals between using our approach (98.1%) and the MFCCs (53.1%). Correlation intervals
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim)
381
give us better results than Fourier transform intervals, either for our approach or for the MFCCs. The
proposed method has succeeded in identifying 5886 voice recordings among 6000 voice recordings of the
test dataset (rate 98.1%) compared to MFCCs that identified only 3186 voice recordings (rate 53.1%), and
these results show the efficiency of our algorithm.
Table 1. The voice recognition rate according to the size of the interval using Fourier transform with MFCCs
and the adaptive orthogonal transform method.
Size of interval using Fourier
Transform
The voice recognition rate with
MFCCs (%)
The voice recognition rate with the adaptive orthogonal
transform method (%)
128
256
512
1024
2048
4096
23.8
32.8
35.6
38.5
45.6
49.3
68.9
75.3
82.3
91.5
93.1
96.8
Table 2. The voice recognition rate according to the size of the interval using correlation with MFCCs and
the adaptive orthogonal transform method
Size of interval using
Correlation
The voice recognition rate with MFCCs
(%)
The voice recognition rate with the adaptive orthogonal
transform method (%)
128
256
512
1024
2048
4096
25.6
30.3
37.3
43.2
49.1
53.1
65.8
73.2
81.7
90.2
95.6
98.1
4. CONCLUSION
MFCCs is one of the most usable and well-known methods in the field of signal processing.
However, it needs a large training dataset and a long execution time to extract the important features if the
number of test dataset unfamiliar speakers is large, so for these reasons we developed a new method based on
the creation of the operator H which is adaptable to any input signal. Even though its creation goes through
several iterations log2 𝑁 iterations where N is the length of the signal, an advantage of working with a sparse
matrix where most of its elements are zero is that it makes the calculation and execution time of the
algorithm faster. Our future goal is to increase the number of voice recordings in the test dataset and to
decrease the number of voice recordings in the training dataset to see if the method continues to give
successful results or not. In addition, we will combine it with other methods that are commonly used as
classification methods such as hidden markov model (HMM) and artificial neural networks (ANN).
ACKNOWLEDGEMENTS
A special thanks to Mr. T. Hobson from Anglosphere English Center for reviewing for spelling and
grammatical mistakes, and to all the participants who recorded their voices for this research.
REFERENCES
[1] A. Abenaou, F. Ataa Allah, and B. Nsiri, “Towards an automatic speech recognition system in amazigh based on orthogonal
transformations,” Asinag, pp. 133-145, 2014.
[2] N. Easwari and P. Ponmuthuramalingam, “A comparative study on feature extraction technique for isolated word speech
recognition,” International Journal of Engineering and Techniques, vol. 1, no. 6, pp. 108-115, 2015.
[3] L. Muda, M. Begam, and I. Elamvazuthi, “Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and
dynamic time warping (DTW) techniques,” Journal of Computing, vol. 2, no. 3, 2010.
[4] P. P. Singh and P. Rani., “An approach to extract feature using MFCC,” IOSR Journal of Engineering, vol. 4, no. 8, pp. 21-25,
2014, doi: 10.9790/3021-04812125.
[5] X. Wu, H. Gong, P. Chen, Z. Zhong, and Y. Xu, “Surveillance robot utilizing video and audio information,” Journal of Intelligent
and Robotic Systems, vol. 55, no. 4, pp. 403-421, 2009, doi: 10.1007/s10846-008-9297-3.
[6] A. N. A. Kumar and S. A. Muthukumaraswamy, “Text dependent voice recognition system using MFCC and VQ for security
applications,” 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), vol. 2, 2017,
pp. 130-136, doi: 10.1109/ICECA.2017.8212779.
[7] W. Zunjing and C. Zhigang, “Improved MFCC-based feature for robust speaker identification,” Tsinghua Science and
Technology, vol. 10, no. 2, pp. 158-161, 2005, doi: 10.1016/S1007-0214(05)70048-1.
[8] W. Chen, M. Zhenjiang, and M. Xiao, “Differential MFCC and vector quantization used for real-time speaker recognition
system,” 2008 Congress on Image and Signal Processing, vol. 5, pp. 319-323, 2008, doi: 10.1109/CISP.2008.492.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382
382
[9] F. Z. Chelali and A. Djeradi, “Text dependant speaker recognition using MFCC, LPC and DWT,” International Journal of Speech
Technology, vol. 20, no. 3, pp. 725-740, 2017, doi: 10.1007/s10772-017-9441-1.
[10] M. Cutajar, E. Gatt, I. Grech, O. Casha, and J. Micallef, “Comparative study of automatic speech recognition techniques,” IET
Signal Processing, vol. 7, no. 1, pp. 25-46, 2013, doi: 10.1049/iet-spr.2012.0151.
[11] F. Abakarim and A. Abenaou, “Amazigh isolated word speech recognition system using the adaptive orthogonal transform
method,” 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), 2020, pp. 1-6,
doi: 10.1109/ISCV49265.2020.9204291.
[12] P. V. Janse et al., “A comparative study between MFCC and DWT feature extraction technique,” International Journal of
Engineering Research and Technology, vol. 3, no. 1, pp. 3124-3127, 2014.
[13] A. Winursito, R. Hidayat, A. Bejo, and M. N. Y. Utomo, “Feature data reduction of MFCC using PCA and SVD in speech
recognition system,” 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1-6, 2018,
doi: 10.1109/ICSCEE.2018.8538414.
[14] Y. Wang and B. Lawlor, “Speaker recognition based on MFCC and BP neural networks,” 2017 28th Irish Signals and Systems
Conference (ISSC) , 2017, pp. 1-4, doi: 10.1109/ISSC.2017.7983644.
[15] M. Azergui, A. Abenaou, and H. Bouzahir, “Bearing fault classification based on the adaptive orthogonal transform method,”
International Journal of Advanced Computer Science and Applications (IJACSA), vol. 9, no. 1, pp. 375-380, 2018,
doi: 10.14569/IJACSA.2018.090151.
[16] D. S. Sheteb and P. S. B. Patil, “Zero crossing rate and energy of the speech signal of devanagari script,” IOSR Journal of VLSI
and Signal Processing, vol. 4, no. 1, pp. 1-5, 2014, doi: 10.9790/4200-04110105.
[17] R. G. Bachu, S. Kopparthi, B. Adapa, and B. D. Barkana, “Voiced/unvoiced decision for speech signals based on zero-crossing
rate and energy,” Advanced Techniques in Computing Sciences and Software Engineering, 2010, pp. 279-282, doi: 10.1007/978-
90-481-3660-5_47.
[18] H. Aouani and Y. Ben Ayed, “Speech emotion recognition with deep learning,” Procedia Computer Science, vol. 176,
pp. 251-260, 2020, doi: 10.1016/j.procs.2020.08.027.
[19] T. Ijitona, H. Yue, J. Soraghan, and A. Lowit, “Improved silence-unvoiced-voiced (SUV) segmentation for dysarthric speech
signals using linear prediction error variance,” 2020 5th International Conference on Computer and Communication Systems
(ICCCS), 2020, pp. 685-690, doi: 10.1109/ICCCS49078.2020.9118462.
[20] J. Luo, Z. Xie, and M. Xie, “Interpolated DFT algorithms with zero padding for classic windows,” Mechanical Systems and
Signal Processing, vol. 70, pp. 1011-1025, 2016, doi: 10.1016/j.ymssp.2015.09.045.
[21] B. Zhang, T. Xiao, and J. Zhong, “A simple determination approach for zero-padding of FFT method in focal spot calculation,”
Optics Communications, vol. 451, pp. 260-264, 2019, doi: 10.1016/j.optcom.2019.06.065.
[22] J. Obuchowski, A. Wyłomańska, and R. Zimroz, “Selection of informative frequency band in local damage detection in rotating
machinery,” Mechanical Systems and Signal Processing, vol. 48, no. 1-2, pp. 138-152, 2014, doi: 10.1016/j.ymssp.2014.03.011.
[23] R. Mankar, M. J. Walsh, R. Bhargava, S. Prasad, and D. Mayerich, “Selecting optimal features from fourier transform infrared
spectroscopy for discrete-frequency imaging,” The Analyst, vol. 143, no. 5, pp. 1147-1156, 2018, doi: 10.1039/c7an01888f.
[24] C. Charayaphan, A. E. Marble, S. T. Nugent, and D. Swingler, “Correlation algorithm and sampling techniques for estimating the
signal-to-noise ratio of the electrocardiogram,” Journal of biomedical engineering, vol. 14, no. 6, pp. 516-520, 1992,
doi: 10.1016/0141-5425(92)90106-U.
[25] J. Zhou and A. Qu, “Informative estimation and selection of correlation structure for longitudinal data,” Journal of the American
statistical Association, vol. 107, no. 498, pp. 701-710, 2012, doi: 10.1080/01621459.2012.682534.
[26] A. Abenaou, “Development of a method for the synthesis of adaptive spectral operators for the analysis of random,” International
Journal of Computer Theory and Engineering, vol. 9, no. 4, pp. 273-276, 2017, doi: 10.7763/IJCTE.2017.V9.1150.
[27] S. Wang, J. Liu, and N. Shroff, “Coded sparse matrix multiplication,” 2018 International Conference on Machine Learning
(ICML), 2018, pp. 5152-5160.
[28] I. J. Good, “The interaction algorithm and practical fourier analysis,” Journal of the Royal Statistical Society: Series B
(Methodological), vol. 20, no. 2, pp. 361-372, 1958, doi: 10.1111/j.2517-6161.1958.tb00300.x.
[29] P. Senin, “Dynamic time warping algorithm review,” Information and Computer Science Department University of Hawaii at
Manoa Honolulu USA, vol. 855, no. 1-23, pp. 40, 2008.
[30] A. Ismail, S. Abdlerazek, and I. M. El-Henawy, “Development of smart healthcare system based on speech recognition using
support vector machine and dynamic time warping,” Sustainability, vol. 12, no. 6, pp. 2403, 2020, doi: 10.3390/su12062403.
[31] I. D. G. Y. A. Wibawa and I. D. M. B. A. Darmawan, “Implementation of audio recognition using mel frequency cepstrum
coefficient and dynamic time warping in wirama praharsini,” Journal of Physics: Conference Series, vol. 1722, no. 1, 2021, doi:
10.1088/1742-6596/1722/1/012014.
[32] S. Kinkiri and S. Keates, “Speaker identification: Variations of a human voice,” in 2020 International Conference on Advances in
Computing and Communication Engineering (ICACCE), 2020, pp. 1-4, doi:10.1109/ICACCE49060.2020.9154998.

Weitere ähnliche Inhalte

Was ist angesagt?

Mixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qosMixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qoseSAT Journals
 
Motion compensation for hand held camera devices
Motion compensation for hand held camera devicesMotion compensation for hand held camera devices
Motion compensation for hand held camera deviceseSAT Journals
 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12Alexander Decker
 
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...ijcax
 
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...IJECEIAES
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...eSAT Journals
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...eSAT Publishing House
 
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
IRJET - Wavelet based Image Fusion using FPGA for Biomedical ApplicationIRJET Journal
 
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...IRJET Journal
 
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET Journal
 
Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...elelijjournal
 
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...Cemal Ardil
 
Optimal neural network models for wind speed prediction
Optimal neural network models for wind speed predictionOptimal neural network models for wind speed prediction
Optimal neural network models for wind speed predictionIAEME Publication
 

Was ist angesagt? (17)

Mixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qosMixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qos
 
Motion compensation for hand held camera devices
Motion compensation for hand held camera devicesMotion compensation for hand held camera devices
Motion compensation for hand held camera devices
 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12
 
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
 
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
 
Kv3419501953
Kv3419501953Kv3419501953
Kv3419501953
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
 
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 
A040101001006
A040101001006A040101001006
A040101001006
 
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
 
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
 
Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...
 
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
 
Optimal neural network models for wind speed prediction
Optimal neural network models for wind speed predictionOptimal neural network models for wind speed prediction
Optimal neural network models for wind speed prediction
 
V04507125128
V04507125128V04507125128
V04507125128
 
IMQA Paper
IMQA PaperIMQA Paper
IMQA Paper
 

Ähnlich wie Fast and efficient speaker recognition system using adaptive orthogonal transforms

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
 
Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...IJECEIAES
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET Journal
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
 
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...IJCSEA Journal
 
Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...TELKOMNIKA JOURNAL
 
An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...IJECEIAES
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcscpconf
 
Automatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithmAutomatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithmTELKOMNIKA JOURNAL
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsIOSR Journals
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 
Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...IJECEIAES
 

Ähnlich wie Fast and efficient speaker recognition system using adaptive orthogonal transforms (20)

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
 
Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...
 
D04812125
D04812125D04812125
D04812125
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: Review
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCC
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
 
Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...
 
An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
Automatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithmAutomatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithm
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic Mathematics
 
F43063841
F43063841F43063841
F43063841
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...
 

Mehr von IJECEIAES

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...IJECEIAES
 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...IJECEIAES
 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...IJECEIAES
 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningIJECEIAES
 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...IJECEIAES
 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...IJECEIAES
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...IJECEIAES
 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...IJECEIAES
 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyIJECEIAES
 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationIJECEIAES
 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceIJECEIAES
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...IJECEIAES
 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...IJECEIAES
 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...IJECEIAES
 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...IJECEIAES
 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...IJECEIAES
 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...IJECEIAES
 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...IJECEIAES
 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...IJECEIAES
 

Mehr von IJECEIAES (20)

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...
 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regression
 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learning
 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...
 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...
 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...
 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancy
 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimization
 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performance
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...
 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...
 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...
 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...
 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...
 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
 

Kürzlich hochgeladen

Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Kürzlich hochgeladen (20)

Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 

Fast and efficient speaker recognition system using adaptive orthogonal transforms

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 12, No. 1, February 2022, pp. 376~382 ISSN: 2088-8708, DOI: 10.11591/ijece.v12i1.pp376-382  376 Journal homepage: http://ijece.iaescore.com Comparative study to realize an automatic speaker recognition system Fadwa Abakarim, Abdenbi Abenaou Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied Sciences, Ibn Zohr University, Agadir, Morocco Article Info ABSTRACT Article history: Received Jan 15, 2021 Revised Jul 13, 2021 Accepted Jul 26, 2021 In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method. Keywords: Adaptive orthogonal transform Automatic speech recognition DTW MFCCs Speaker recognition system This is an open access article under the CC BY-SA license. Corresponding Author: Fadwa Abakarim Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied Sciences, Ibn Zohr University B.P 1136, Agadir 80000, Morocco Email: fadwa.abakarim@gmail.com 1. INTRODUCTION Automatic speech recognition is a computer technique that is commonly used in several systems. In car systems, speech recognition identifies the driver's voice to start responding to their commands such as playing music, activating global positioning system (GPS), launching phone calls, and selecting radio stations. In the field of language education, speech recognition can teach proper pronunciation and help people to develop their oral expression, and also it facilitates education for blind students. In this context, the research proposed a method to identify the speaker's voice using adaptive orthogonal transformations [1] and comparing it with the method of mel-frequency cepstral coefficients (MFCCs) [2]-[4]. In order to identify the speaker’s voice several methods are used to extract the special features of each voice, among them mel-frequency cepstral coefficients. Although numerous researchers chose it as their feature extraction method because of its several advantages [5], [6], it reaches its limit in the improvement of automatic speaker recognition system as described by references [7]-[9]. It needs a large voice training dataset and a long execution time to identify the voice of each speaker [10] and the same goes for other approaches such as principal component analysis (PCA), discrete wavelet transform (DWT) and empirical modal decomposition (EMD) as revealed by reference [11]. Janse et al. [12] presented a comparative study between mel-frequency cepstral coefficients and discrete wavelet transform, where it mentioned that MFCCs values are not very robust in the presence of additive noise and that DWT requires a longer compression time. Winursito et al. [13] combined MFCCs with data reduction methods with the aim of improving the accuracy and increasing the computational speed
  • 2. Int J Elec & Comp Eng ISSN: 2088-8708  Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim) 377 of the classification process by decreasing the dimensions of the feature data. The data reduction process is designed in two versions: MFCC+SVD version 1 and MFCC+PCA version 2. The results showed a performance improvement for the proposed approach. Wang and Lawlor [14] proposed a method for a speaker recognition system by combining MFCCs with back-propagation neural networks. It revealed that this approach works successfully only when the number of unfamiliar speakers is not too large. From these research studies, we deduced that many authors have used MFCCs as a feature extraction approach and to strengthen their methodologies, they have used other approaches in the classification process to obtain the desired results. In addition, other authors have developed new methods by addressing the limitations of MFCCs in order to obtain an improved algorithm, which is not sensitive to noise, and has a fast execution time. The goal of this study is to solve the problems mentioned above by developing a fast algorithm based on adaptive orthogonal transformations for the extraction of the informative features from the voice signal using the smallest possible training dataset, inspired by references [1], [15]. This paper is organized as follows: Section 2 describes the new approach of orthogonal operators, then the comparison results obtained between MFCCs and the proposed method are discussed in section 3, and finally section 4 concludes the paper. 2. RESEARCH METHOD 2.1. Pre-processing Before starting to apply the proposed approach, it is first necessary to pre-process the signals as shown in Figure 1. This involves firstly removing silence, then secondly detecting the beginning and the end of the speech by using the zero-crossing rate (ZCR) [16]-[19]. The third step is making them equal in length by using zero padding [20], [21] because the training dataset can contain several signals that do not have the same length. The final step is compressing their size without losing quality to avoid the problem of system slowness by using Fourier transform [22], [23] or correlation [24], [25]. Figure 2 shows the input signal before and after removing silence with detection of the beginning and the end of speech. Figure 3 shows the speech signal after applying the Fourier transform method to detect the informative intervals. Figure 4 shows the speech signal after applying correlation to detect the informative intervals. Figure 1. The pre-processing part Figure 2. The input signal before and after removing silence with detection of the beginning and the end of speech
  • 3.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382 378 Figure 3. The speech signal after applying the Fourier transform Figure 4. The speech signal after applying correlation 2.2. Theoretical background Our approach consists in searching the informative features of the signal by using the operator H, which is a matrix operator of the transform (dimension N×N) whose number of rows corresponds to the number of basic functions. To decompose the vector X, the calculation of the discrete spectrum Y with the numerical methods can be represented by the following matrix [1], [15], [26]: 𝑌 = 1 𝑁 𝐻𝑋 (1) where 𝑋 = [𝑥1, 𝑥2, … , 𝑥𝑁]𝑇 is the initial signal to be transformed (size Ν = 2𝑛 ). 𝑌 = [𝑦1, 𝑦2, … , 𝑦𝑁]𝑇 is the vector of the spectral coefficients, calculated by the operator orthogonal H. The calculation of the spectrum 𝑌 by using (1) requires 𝑁2 multiplication and addition operations. The most efficient way to reduce the number of operations is to use a sparse matrix [27] where most of its elements are zero, which will make the calculation and execution time of the algorithm faster. The method of Good [28] which is used in the construction of fast transformation algorithms consists in expressing the
  • 4. Int J Elec & Comp Eng ISSN: 2088-8708  Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim) 379 orthogonal spectral operator 𝐻 as a product of sparse matrices 𝐺𝑖 composed by minimum dimensional matrices called spectral kernels, where these matrices take the following form: 𝑉𝑖,𝑗(𝛼𝑖,𝑗) = [ cos (𝛼𝑖,𝑗) sin (𝛼𝑖,𝑗) sin (𝛼𝑖,𝑗) −cos (𝛼𝑖,𝑗) ] (2) with 𝛼 𝜖 [0,2𝜋]. Then 𝐺𝑖 will be written as (3): 𝐺𝑖 = [ 𝑉𝑖,1 0 … 0 0 𝑉𝑖,2 0 0 ⋮ 0 ⋱ 0 0 0 0 𝑉𝑖,𝑁 2 ⁄ ] (3) with 𝑖 = 1 … 𝑛 , 𝑛 = log2 𝑁 which is the number of matrices 𝐺𝑖, then each matrix 𝐺𝑖 contains 𝑁 2 spectral kernels of 𝑉𝑖,𝑗(𝛼𝑖,𝑗) of dimension 2 × 2 . So, the formula of 𝑌 will be: 𝑌 = 1 𝑁 𝐻𝑋 = 1 𝑁 𝐺1𝐺2 … 𝐺𝑛𝑋 = 1 𝑁 ∏ 𝐺𝑖𝑋 𝑛 𝑖=1 (4) The algorithm goes through a procedure of adaptation of the operator 𝐻 to a class of input signals. It consists in calculating the average of the statistical features at the pre-processing part (section 2 part 1) to form the standard vector 𝑅 ̂𝑠𝑑. We can say that the operator 𝐻 is adapted to a class of signals represented by a standard vector 𝑅 ̂𝑠𝑑 if it verifies the following condition: 1 𝛮 𝐻𝑎𝑅 ̂𝑠𝑑 = 𝑌𝑡 = [𝑦𝑡1, 0, … ,0]𝑇 with 𝑦𝑡1 ≠ 0 (5) where 𝑌𝑡 is the target vector that constructs the adaptation criterion of the operator 𝐻𝑎 to 𝑅 ̂𝑠𝑑 . The target vector 𝑌𝑡 is calculated as (6): 𝑌𝑖 = 𝐺𝑖𝑌𝑖−1 (6) with 𝑖 = 1 … log2 𝑁 and 𝑌0 = 𝑅 ̂𝑠𝑑. In a simplified way, the synthesis procedure of the operator of orthogonal transformation is as follows: For 𝑖 = 1, 𝑌1 = 𝐺1𝑅 ̂𝑠𝑑 with 𝑌1 contains 𝑁 21 non-zero number of elements. For 𝑖 = 2, 𝑌2 = 𝐺2𝑌1 with 𝑌2 contains 𝑁 22 non-zero number of elements. For 𝑖 = 𝑛, 𝑌𝑛 = 𝑌𝑡 = 𝐺𝑛𝑌𝑛−1 with 𝑌𝑡 contains 𝑁 2𝑛 non-zero number of elements. Then the calculation of the orthogonal spectral operator is: 𝐻𝑎 = 𝐺𝑛𝐺𝑛−1 … 𝐺1 (7) Figure 5 shows the overall process to extract the informative features from the input signals by using our approach: As shown in Figure 5, the extraction of the informative features consists of 7 steps: − Step 1: Input signals go through a pre-processing process (section 2 part 1). − Step 2: We calculate the average of the statistical features obtained during the pre-processing part (using Fourier transform and correlation). − Step 3: The operator synthesis algorithm is applied to the average of the statistical features obtained from the previous step. − Step 4: The output of the algorithm is the adaptive operator H. − Step 5: The projection multiplication is applied between the operator H and the rest of the statistical features. − Step 6: The result of the previous operation is a set of informative features that characterize each signal of the class. − Step 7: The average of the feature vectors is calculated. The result is an informative feature vector with a minimum dimension that characterizes the whole class.
  • 5.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382 380 Figure 5. The overall process to extract the informative features from the input signals by using the adaptive orthogonal transform method 2.3. Datasets The speech signals are recorded and filtered by the Audacity program. Each signal is recorded by default at 22 kHz with a duration between 1 s and 2 s. The training dataset contains 10 classes, where each class contains 100 voice recordings of the speaker (Class 1 of Speaker A contains 100 of his voice recordings, the same applies to Class 2 of Speaker B up to Class 10 of Speaker J). The test dataset contains 6000 voice recordings of speaker A, B, …, J and other unfamiliar speakers. To test the similarity between the speaker’s voice in the training dataset and the speaker’s voice in the test dataset, dynamic time wrapping (DTW) is used. Dynamic time wrapping or DTW [29]-[32] consists in comparing two voice signals by considering the Euclidean distance between the two vectors obtained by the applied method, which is defined by (8): 𝐷𝑖 = √∑ (𝑎𝑖 − 𝑏𝑖)2 𝑛 𝑖=1 (8) with 𝐷𝑖 : The distance between the 𝑖 vector of the spectrum 𝑎 and the 𝑖 vector of the spectrum 𝑏. 𝑛 : Dimension of 𝑎 and 𝑏 spectrum. Therefore, the vector 𝑎𝑖 will correspond to class i if 𝐷𝑖 = min (𝐷𝑖=1…𝐶) where 𝐶 is the number of classes. 3. RESULTS AND DISCUSSION The quality of recognition is measured by calculating the recognition rate which is defined as (9). 𝑅𝑎𝑡𝑒 = 𝑡ℎ𝑒 𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑 𝑠𝑝𝑒𝑎𝑘𝑒𝑟𝑠 𝑐𝑜𝑢𝑛𝑡 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑠𝑖𝑧𝑒 ∗ 100 (9) Tables 1 and 2 show the voice recognition rate according to the size of the interval of the analysis. From these tables, we observe that the adaptive orthogonal transform method gives good results compared to the MFCCs approach. As mentioned in section 2 part 1, we used correlation and Fourier transform to work only with the informative intervals of the signal instead of working with the whole signal. As we can see there is a 47.5% difference in voice identification rates with Fourier transform intervals between using our approach (96.8%) and MFCCs (49.3%). On the other hand, we found a 45.0% difference in voice identification rates with correlation intervals between using our approach (98.1%) and the MFCCs (53.1%). Correlation intervals
  • 6. Int J Elec & Comp Eng ISSN: 2088-8708  Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim) 381 give us better results than Fourier transform intervals, either for our approach or for the MFCCs. The proposed method has succeeded in identifying 5886 voice recordings among 6000 voice recordings of the test dataset (rate 98.1%) compared to MFCCs that identified only 3186 voice recordings (rate 53.1%), and these results show the efficiency of our algorithm. Table 1. The voice recognition rate according to the size of the interval using Fourier transform with MFCCs and the adaptive orthogonal transform method. Size of interval using Fourier Transform The voice recognition rate with MFCCs (%) The voice recognition rate with the adaptive orthogonal transform method (%) 128 256 512 1024 2048 4096 23.8 32.8 35.6 38.5 45.6 49.3 68.9 75.3 82.3 91.5 93.1 96.8 Table 2. The voice recognition rate according to the size of the interval using correlation with MFCCs and the adaptive orthogonal transform method Size of interval using Correlation The voice recognition rate with MFCCs (%) The voice recognition rate with the adaptive orthogonal transform method (%) 128 256 512 1024 2048 4096 25.6 30.3 37.3 43.2 49.1 53.1 65.8 73.2 81.7 90.2 95.6 98.1 4. CONCLUSION MFCCs is one of the most usable and well-known methods in the field of signal processing. However, it needs a large training dataset and a long execution time to extract the important features if the number of test dataset unfamiliar speakers is large, so for these reasons we developed a new method based on the creation of the operator H which is adaptable to any input signal. Even though its creation goes through several iterations log2 𝑁 iterations where N is the length of the signal, an advantage of working with a sparse matrix where most of its elements are zero is that it makes the calculation and execution time of the algorithm faster. Our future goal is to increase the number of voice recordings in the test dataset and to decrease the number of voice recordings in the training dataset to see if the method continues to give successful results or not. In addition, we will combine it with other methods that are commonly used as classification methods such as hidden markov model (HMM) and artificial neural networks (ANN). ACKNOWLEDGEMENTS A special thanks to Mr. T. Hobson from Anglosphere English Center for reviewing for spelling and grammatical mistakes, and to all the participants who recorded their voices for this research. REFERENCES [1] A. Abenaou, F. Ataa Allah, and B. Nsiri, “Towards an automatic speech recognition system in amazigh based on orthogonal transformations,” Asinag, pp. 133-145, 2014. [2] N. Easwari and P. Ponmuthuramalingam, “A comparative study on feature extraction technique for isolated word speech recognition,” International Journal of Engineering and Techniques, vol. 1, no. 6, pp. 108-115, 2015. [3] L. Muda, M. Begam, and I. Elamvazuthi, “Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques,” Journal of Computing, vol. 2, no. 3, 2010. [4] P. P. Singh and P. Rani., “An approach to extract feature using MFCC,” IOSR Journal of Engineering, vol. 4, no. 8, pp. 21-25, 2014, doi: 10.9790/3021-04812125. [5] X. Wu, H. Gong, P. Chen, Z. Zhong, and Y. Xu, “Surveillance robot utilizing video and audio information,” Journal of Intelligent and Robotic Systems, vol. 55, no. 4, pp. 403-421, 2009, doi: 10.1007/s10846-008-9297-3. [6] A. N. A. Kumar and S. A. Muthukumaraswamy, “Text dependent voice recognition system using MFCC and VQ for security applications,” 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), vol. 2, 2017, pp. 130-136, doi: 10.1109/ICECA.2017.8212779. [7] W. Zunjing and C. Zhigang, “Improved MFCC-based feature for robust speaker identification,” Tsinghua Science and Technology, vol. 10, no. 2, pp. 158-161, 2005, doi: 10.1016/S1007-0214(05)70048-1. [8] W. Chen, M. Zhenjiang, and M. Xiao, “Differential MFCC and vector quantization used for real-time speaker recognition system,” 2008 Congress on Image and Signal Processing, vol. 5, pp. 319-323, 2008, doi: 10.1109/CISP.2008.492.
  • 7.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382 382 [9] F. Z. Chelali and A. Djeradi, “Text dependant speaker recognition using MFCC, LPC and DWT,” International Journal of Speech Technology, vol. 20, no. 3, pp. 725-740, 2017, doi: 10.1007/s10772-017-9441-1. [10] M. Cutajar, E. Gatt, I. Grech, O. Casha, and J. Micallef, “Comparative study of automatic speech recognition techniques,” IET Signal Processing, vol. 7, no. 1, pp. 25-46, 2013, doi: 10.1049/iet-spr.2012.0151. [11] F. Abakarim and A. Abenaou, “Amazigh isolated word speech recognition system using the adaptive orthogonal transform method,” 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), 2020, pp. 1-6, doi: 10.1109/ISCV49265.2020.9204291. [12] P. V. Janse et al., “A comparative study between MFCC and DWT feature extraction technique,” International Journal of Engineering Research and Technology, vol. 3, no. 1, pp. 3124-3127, 2014. [13] A. Winursito, R. Hidayat, A. Bejo, and M. N. Y. Utomo, “Feature data reduction of MFCC using PCA and SVD in speech recognition system,” 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1-6, 2018, doi: 10.1109/ICSCEE.2018.8538414. [14] Y. Wang and B. Lawlor, “Speaker recognition based on MFCC and BP neural networks,” 2017 28th Irish Signals and Systems Conference (ISSC) , 2017, pp. 1-4, doi: 10.1109/ISSC.2017.7983644. [15] M. Azergui, A. Abenaou, and H. Bouzahir, “Bearing fault classification based on the adaptive orthogonal transform method,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 9, no. 1, pp. 375-380, 2018, doi: 10.14569/IJACSA.2018.090151. [16] D. S. Sheteb and P. S. B. Patil, “Zero crossing rate and energy of the speech signal of devanagari script,” IOSR Journal of VLSI and Signal Processing, vol. 4, no. 1, pp. 1-5, 2014, doi: 10.9790/4200-04110105. [17] R. G. Bachu, S. Kopparthi, B. Adapa, and B. D. Barkana, “Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy,” Advanced Techniques in Computing Sciences and Software Engineering, 2010, pp. 279-282, doi: 10.1007/978- 90-481-3660-5_47. [18] H. Aouani and Y. Ben Ayed, “Speech emotion recognition with deep learning,” Procedia Computer Science, vol. 176, pp. 251-260, 2020, doi: 10.1016/j.procs.2020.08.027. [19] T. Ijitona, H. Yue, J. Soraghan, and A. Lowit, “Improved silence-unvoiced-voiced (SUV) segmentation for dysarthric speech signals using linear prediction error variance,” 2020 5th International Conference on Computer and Communication Systems (ICCCS), 2020, pp. 685-690, doi: 10.1109/ICCCS49078.2020.9118462. [20] J. Luo, Z. Xie, and M. Xie, “Interpolated DFT algorithms with zero padding for classic windows,” Mechanical Systems and Signal Processing, vol. 70, pp. 1011-1025, 2016, doi: 10.1016/j.ymssp.2015.09.045. [21] B. Zhang, T. Xiao, and J. Zhong, “A simple determination approach for zero-padding of FFT method in focal spot calculation,” Optics Communications, vol. 451, pp. 260-264, 2019, doi: 10.1016/j.optcom.2019.06.065. [22] J. Obuchowski, A. Wyłomańska, and R. Zimroz, “Selection of informative frequency band in local damage detection in rotating machinery,” Mechanical Systems and Signal Processing, vol. 48, no. 1-2, pp. 138-152, 2014, doi: 10.1016/j.ymssp.2014.03.011. [23] R. Mankar, M. J. Walsh, R. Bhargava, S. Prasad, and D. Mayerich, “Selecting optimal features from fourier transform infrared spectroscopy for discrete-frequency imaging,” The Analyst, vol. 143, no. 5, pp. 1147-1156, 2018, doi: 10.1039/c7an01888f. [24] C. Charayaphan, A. E. Marble, S. T. Nugent, and D. Swingler, “Correlation algorithm and sampling techniques for estimating the signal-to-noise ratio of the electrocardiogram,” Journal of biomedical engineering, vol. 14, no. 6, pp. 516-520, 1992, doi: 10.1016/0141-5425(92)90106-U. [25] J. Zhou and A. Qu, “Informative estimation and selection of correlation structure for longitudinal data,” Journal of the American statistical Association, vol. 107, no. 498, pp. 701-710, 2012, doi: 10.1080/01621459.2012.682534. [26] A. Abenaou, “Development of a method for the synthesis of adaptive spectral operators for the analysis of random,” International Journal of Computer Theory and Engineering, vol. 9, no. 4, pp. 273-276, 2017, doi: 10.7763/IJCTE.2017.V9.1150. [27] S. Wang, J. Liu, and N. Shroff, “Coded sparse matrix multiplication,” 2018 International Conference on Machine Learning (ICML), 2018, pp. 5152-5160. [28] I. J. Good, “The interaction algorithm and practical fourier analysis,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 20, no. 2, pp. 361-372, 1958, doi: 10.1111/j.2517-6161.1958.tb00300.x. [29] P. Senin, “Dynamic time warping algorithm review,” Information and Computer Science Department University of Hawaii at Manoa Honolulu USA, vol. 855, no. 1-23, pp. 40, 2008. [30] A. Ismail, S. Abdlerazek, and I. M. El-Henawy, “Development of smart healthcare system based on speech recognition using support vector machine and dynamic time warping,” Sustainability, vol. 12, no. 6, pp. 2403, 2020, doi: 10.3390/su12062403. [31] I. D. G. Y. A. Wibawa and I. D. M. B. A. Darmawan, “Implementation of audio recognition using mel frequency cepstrum coefficient and dynamic time warping in wirama praharsini,” Journal of Physics: Conference Series, vol. 1722, no. 1, 2021, doi: 10.1088/1742-6596/1722/1/012014. [32] S. Kinkiri and S. Keates, “Speaker identification: Variations of a human voice,” in 2020 International Conference on Advances in Computing and Communication Engineering (ICACCE), 2020, pp. 1-4, doi:10.1109/ICACCE49060.2020.9154998.