SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
DOI:10.5121/ijcsa.2014.4104 35
Automatic Speech Emotion and Speaker
Recognition based on Hybrid GMM and FFBNN
J. Sirisha Devi1
, Y. Srinivas2
and Siva Prasad Nandyala3
1
Assitant Professor, Dept of IT, GRIET ,Hyderabad,501401, Andhra Pradesh, India
2
Professor, Dept of IT, GITAM University,Visakhapatnam, 530045,
Andhra Pradesh, India
3
Research Scholar, Dept of ECE, NIT Warangal,Warangal, 506004,
Andhra Pradesh, India
ABSTRACT
In this paper we present text dependent speaker recognition with an enhancement of detecting the emotion
of the speaker prior using the hybrid FFBN and GMM methods. The emotional state of the speaker
influences recognition system. Mel-frequency Cepstral Coefficient (MFCC) feature set is used for
experimentation. To recognize the emotional state of a speaker Gaussian Mixture Model (GMM) is used in
training phase and in testing phase Feed Forward Back Propagation Neural Network (FFBNN). Speech
database consisting of 25 speakers recorded in five different emotional states: happy, angry, sad, surprise
and neutral is used for experimentation. The results reveal that the emotional state of the speaker shows a
significant impact on the accuracy of speaker recognition.
KEYWORDS
Mel-frequency Cepstral Coefficient (MFCC), Feed Forward Back Propagation Neural Network (FFBNN),
Gaussian Mixture Model (GMM).
1. INTRODUCTION
Speaker recognition refers to recognize the person from their speech. Accuracy of recognition
increases if the system is text-independent [1]. The speech signal contains the message being
spoken, the emotional state of the speaker and the information of the speaker. Therefore we can
use the speech signal for both speech and speaker recognition [2]. Speaker recognition extracts
the underlying linguistic message in an utterance. It is the process of automatically recognizing
who is speaking on the basis of features present in the speech signal. Speaker recognition is
helpful in the areas such as voice dialling, banking by telephone, telephone shopping, database
access services, information services, and security control for confidential information areas,
voice mail and remote access to computers [9]. Influence of emotional state of human speech in
speaker recognition is very high [13]. The term “emotion” can refer to an extremely complex
state associated with a wide variety of mental, physiological and physical events. Emotional
speech database is valuable for this speaker recognition. In this paper a combination of speech
features is considered. The spectral feature Mel frequency cepstral coefficients best suited for
speaker recognition and the prosodic feature pitch, which is strongly dependent on the emotional
state of the speaker.
In section 2, feature extraction of speech signal is performed. In section 3, classification of the
feature vectors is explained. In section 4, experiment shows how training and testing are
performed and result analysis is done.
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
36
2. FEATURE EXTRACTION
2.1 Pre-processing
Speaker specific features will be present in the voiced part of the speech signals [9]. Pre-
processing is a process of separating the voiced and unvoiced speech signals.
2.1.1 Noise removal
While recording the speech signal, it consists of various unwanted piece of signals which are
considered to be noise/unvoiced part of speech. This noise part may be due to low quality of
source, loss of speech segments, channel fading, and presense of noise in the channel or echo or
reverberation.
A low-pass filter allows signal frequencies below the low cut-off frequency to pass and stops
frequencies above the cut-off frequency. Speaker specific information is available in the higher
frequencies (above 4 kHz), so a low pass filter is used to remove all the frequencies less than 4
KHz as unvoiced data.
2.1.2 Framing
Each speech signal is split into frame segments of size 30ms (milli second) where the sampling
rate is 22.05 KHz. Each segment is extracted at every 50ms interval. This implies that the overlap
between segments is 10ms.
2.1.3 Windowing
After framing, each and every frame will contain discontinuities at the beginning and end of each
frame. The concept here is to minimize the spectral distortion by using the window to taper the
signal to zero at the beginning and end of each frame. If we define the window as Eq.1
( ), 0 ≤ ≤ − 1 (1)
Where, N is the number of samples in each frame, then the result of windowing is the signal, Eq.2
( ) = ( ) ( ), 0 ≤ ≤ − 1 (2)
Typically the Hamming window is used, which has the form (Eq.3):
10,
1
2
cos46.054.0)( −≤≤





−
−= Nn
N
n
nw

(3)
2.1.4 Fast Fourier Transform (FFT
The next processing step is the Fast Fourier Transform, which converts each frame of N samples
from the time domain into the frequency domain. FFT gets log magnitude spectrum to determine
MFCC. We have used 1024 point to get better frequency resolution.
2.2 Mel- Frequency Cepstral Coefficients(MFCC)
Prosody is the suprasegmental phonology of speech sounds, involving (but not limited to) aspects
such as pitch, loudness, tempo, and stress [10]. Pitch, often referred to as fundamental frequency,
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
37
is the vibration rate of the vocal folds. Gold-Rabiner algorithm [3] illustrates pitch extraction
based on the fact that locating the position of the maximum point of excitation is not always
determinable from the time-waveform. Pitch is elaborated as the time difference between two
major peaks in the voiced signal. The method of pitch extraction from the waveforms is described
in the following steps:
Step 1: Speech signal is given as input to the system.
Step 2: Normalized Cross Correlation Function is computed periodically for all lags of f0 and
the locations of the all local maxima obtained in the first iteration are stored.
Step 3: Normalized cross correlation is performed for improved high sampled rate signals in
order to determine the accurate location of peaks, with estimations applied to amplitude.
Step 4: f0 for the particular frame is obtained by the normalized cross correlation related to the
particular frame.
Step 5: Dynamic programming is used in order to select the set of cross correlation peaks or
unvoiced hypothesis across all frames.
Spectral features characterize signal properties in the frequency domain, thus providing useful
additions to prosodic features. The frequency domain representation can be obtained by taking the
discrete Fourier transform (DFT) of the discrete-time signal. However, the DFT of a signal
usually results in a high-dimensional representation, in addition, with fine spectral details that
may turn out to impair recognition performance. Therefore, the DFT spectrum is usually
transformed into some more compact feature representations for speech related recognition tasks,
such as the mel-frequency cepstral coefficients (MFCCs).
The 20 Mel triangular filters are designed with 50% overlapping. From each filter the spectrum is
added to get one coefficient each, in this way we have considered the first 13 coefficients as our
features. These frequencies are converted to Mel scale using following conversion formula
(Eq.4).
( ) = 2595 ∗ log 1 + 700 (4)
We have considered 13 MFCC coefficients because; of the fact it gives better recognition
accuracy than other coefficients.
3. CLASSIFICATION TECHNIQUES
In both the training and testing phases, before speaker recognition the emotional state of speaker
is identified [11]. A new hybrid model is proposed in which GMM in training and FFBN in
testing is used for emotion recognition.
3.1 Emotion Recognition
3.1.1 Gaussian Mixture Model
The feature vector thus obtained is send to GMM classifier for emotion recognition in the training
phase. In the GMM based classifier [5] [7], probability density function of the feature space is
modeled with a diagonal covariance GMM for each emotion [12]. Probability density function is
a weighted combination of K component densities given by Eq.5.
( ) = ∑ ( ⎟ ) (5)
where f : observation feature vector
ѡk : mixture weight associated with the k-th Gaussian component.
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
38
The weights satisfy the constraints, 0 ≤ ≤ 1 and ∑ = 1
The conditional probability p(f |k) is modeled by Gaussian distribution with the component mean
vector µk, and the diagonal covariance matrix Σk. The GMM for a given emotion is extracted
through the expectation-maximization based iterative training process using a set of training
feature vectors representing the emotion.
In the emotion recognition phase [8], posterior probability of the features of a given speech
utterance is maximized over all emotion GMM densities. Given a sequence of feature vectors for
a speech utterance, F = {f1, f2, . . . , fT }, let’s define the loglikelihood of this utterance for emotion
class e with a GMM density model γe as given in Eq. (6),
= log ( ⎟ ) = ∑ log (f │γ ) (6)
Where, p(F|sγe) is the GMM probability density for the emotion class e as defined in eq. 5.
Then, the emotion GMM density that maximizes posterior probability of the utterance is set as the
recognized emotion class is given by Eq. (7).
= arg max (7)
Where, E is the set of emotions and ε is the recognized emotion.
3.1.2 Feed Forward Backpropogation Neural Network (FFBNN)
In testing phase, emotion recognition from the speech signals is done by FFBNN. The features
are extracted for the speech signal and given to the FFBNN to perform the testing process. The
speech signal’s corresponding MFCC features are taken as input to the FFBNN [4]. Here, we
have taken three inputs nodes as energy entropy (Ee), short-time energy (Se) and Zero crossing
Rate (ZCR), dN number of hidden layers and one output layer, which is a prosody effect of the
given input signal. The proposed emotion classification FFBNN structure is shown in Figure 1.
Initially, the input feature values are transmitted to the hidden layer and then, to the output layer.
Each node in the hidden layer gets input from the input layer, which are multiplexed with suitable
weights and summed. The hidden layer input value calculation function is called as bias function,
which is described below by Eq. (8),
)(
1
ihCRhihehiheh
N
h
ZwSwEwH
d
i +++= ∑=

(8)
In Eq. (8), ieE , ieS and iCRZ are the feature values of the ith
person speech signal. The activation
function in the output layer is given in Eq. (9). The output values from the output layer are
compared with target values and the learning error rate for the neural network is computed, which
is given in Eq. (10).
iH
e
−
+
=
1
1
 (9)
∑
−
=
−=
1
0
1 d
i
N
h
i
hh
d
OD
N
 (10)
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
39
In eqn. (10),
 - is the learning error rate
i
hD - is the desired output
i
hO - is the actual output.
The error between the nodes is transmitted back to the hidden layer and this process is called the
backward pass of the back propagation algorithm. The reduction of error by back propagation
algorithm is described in the subsequent steps.
(i) Initially, the weights are assigned to hidden layer neurons. The input layer has a constant
weight, whereas the weights for output layer neurons are chosen arbitrarily. Subsequently, the
bias function and output layer activation function are computed by using the Eq. (8) and (9).
(ii) Next, the back propagation error is computed for each node and the weights are updated
by using the Eq. (11).
ihihih www ∆+= (11)
Where, the weight ihw∆ is changed, which is given by Eq.(12),
)(
.. 
 ENw ihih =∆ (12)
Where,  is the learning rate that normally ranges from 0.2 to 0.5, and )(
E is the BP error. The
bias function, activation function, and BP error calculation process are continued till the BP error
gets reduced i.e. 1.0)(
<
E . If the BP error reaches a minimum value, then the FFBNN is well
trained by the speech signals feature values for performing the emotion classification. The well
trained FFBNN provides an accurate classification results for the input emotion speech signals.
The well trained FFBNN classify the input speech signals based on the prosody which the speech
signal belongs to.
3.1.3 Hybrid GMM/FFBN Approach
Therefore, we followed a different approach, namely the extension of a state-of-the-art system
that achieves extremely good recognition rates with a GMM that is trained. For recognition
purpose used the FFBN method.
3.2 Speaker Recognition using HMM
Hidden Markov Model is used to recognize the speaker after emotion is recognized by hybrid
GMM/FFBN. HMM is a statistical tool for modeling generative sequence to generate observable
sequence when characterized by the primary process. The observation is turned to be a
probabilistic function (discrete or continuous) of a state instead of a one-to-one correspondence of
a state. Each state randomly generates one of M observations (or visible states). To define hidden
Markov model, the following probabilities have to be specified:
Matrix of transition probabilities, A=(aij), aij= P(si | sj)
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
40
Matrix of observation probabilities, B=(bi (vm )), bi(vm ) = P(vm | si)
Vector of initial probabilities, π=(πi), πi = P(si)
Model is represented by M=(A, B, π)
4. EXPERIMENTS AND RESULTS
MATLAB R2013a has been used for experimentation.
4.1 Training phase
In training phase, the extracted MFCC vector is sent to both the classification techniques. First
emotion recognition is done using GMM classifier then the same feature vector is passed to
HMM classifier for speaker recognition (SR). For each and every emotion a unique record is
generated. Figure 2 explains the block diagram of the entire procedure.
4.2 Testing phase
After feature extraction the feature vector is passed to FFBN classifier for emotion recognition.
Ones the emotion recognition is recognized, based on the emotional state the corresponding
record will be supplied for further recognition part which is speaker recognition using HMM.
4.3 Results
When the emotional state of speaker differs in the testing phase the recognition rate decreased
significantly. The results shows that the accuracy rate of speaker recognition has been
considerably increased when compared to the recognition rate where emotional state of the
speaker was not considered. The accuracy rate of speaker indirectly depends on the accuracy of
emotion recognition. As well as when we compared the accuracy rate of emotion recognition
using GMM with the proposed system, the accuracy rates have been considerably increased. The
recognition rate obtained is 93 % with the hybrid model compared to 91% using the same method
for training and testing for emotion recognition. Table 1shows the comparison of speaker
recognition results.
The Figure 3 gives the simulation results of speaker recognition for the given emotions.
Table 1: Comparison of speaker recognition rate for different emotions
Emotional
State Happy Angry Sad Surprise Neutral
Recognition
Rate (%)
Emotion
(GMM)
89 94 89 86 94
Speaker
(HMM)
88 97 92 90 96
Recognition
Rate (%) of
Hybrid
Model
Emotion
(Hybrid
GMM/FFBN)
90 95 89 85 95
Speaker
(HMM)
90 97 92 92 97
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
41
Figure 1: Structure of FFBNN in Emotion Classification
Figure 2: Process Flow
Figure 3: Simulation Results
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014
42
REFERENCES
[1] Gish, H., Schmidt, M., 1994. Text-independent speaker recognition. IEEE Signal Process. Magazine
(October), 18–32.
[2] Furui, S., 1981. Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust.
Speech Signal Process. 29 (2), 254–272.
[3] Gold, B. and Rabiner, L.R, Parallel processing techniques for estimating pitch periods of speech in
time-domain.
[4] Raul Fernandez, Rosalind Picard, “Recognizing affect from speech prosody using hierarchical
graphical models”, Journal Speech Communication, Vol. 53, No. 9-10, pp. 1088-1103, 2011.
[5] Lukas Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej
Glembek, Nagendra Goel, Martin Karafiat, Daniel Povey, Ariya Rastrow, Richard C. Rose, Samuel
Thomas, "Multilingual Acoustic Modeling for Speech Recognition Based on Subspace Gaussian
Mixture Models", In Proceedings of IEEE International Conference on Acoustics Speech and Signal
Processing, Dallas, TX, pp. 4334 - 4337, 2010.
[6] Lawrence Rabiner, Biing-Hwang Juang and B.Yegnanarayana, ”Fundamental of Speech
Recognition”, Prentice-Hall, Englewood Cliffs, 2009.
[7] B. Schuller, S. Steidl, and A. Batliner, “The interspeech 2009 emotion challenge,” in Interspeech
(2009), ISCA, Brighton, UK, 2009.
[8] K. R. Scherer, “How emotion is expressed in speech and singing,” in Proceedings of XIIIth
International Congress of Phonetic Sciences., pp. 90-96. 1995
[9] M. Kockmann, L. Ferrer, L. Burget, E. Shriberg, and J. H. Cernocký, "Recent progress in prosodic
speaker verification," in Proc. IEEE ICASSP, (Prague), pp. 4556--4559, May 2011.
[10] Oliver Gauci, Carl J. Debono, Paul Micallef, "A Reproducing Kernel Hilbert Space Approach for
Speech Enhancement" ISCCSP 2008, Malta, 12-14 March 2008
[11] Justin, J. and I. Vennila,” A Hybrid Speech Recognition System with Hidden Markov Model and
Radial Basis Function Neural Network” American Journal of Applied Sciences, Volume10, Issue 10,
Pages 1148-1153.
[12] Aditya Bihar Kandali, Aurobinda Routray, Tapan Kumar Basu,” Emotion recognition from Assamese
speeches using MFCC features and GMM classifier”, Tencon 2008 - 2008 IEEE Region 10
Conference.
[13] Marius Vasile Ghiurcau , Corneliu Rusu,Jaakko Astola, “A Study Of The Effect Of Emotional State
Upon Text-Independent Speaker Identification”, ICASSP 2011

Weitere ähnliche Inhalte

Was ist angesagt?

Text-Independent Speaker Verification Report
Text-Independent Speaker Verification ReportText-Independent Speaker Verification Report
Text-Independent Speaker Verification ReportCody Ray
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systemsNamratha Dcruz
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From SpeechIJERA Editor
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
 
Improvement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation methodImprovement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Speaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMSpeaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMinventionjournals
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker VerificationCody Ray
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
 

Was ist angesagt? (19)

Text-Independent Speaker Verification Report
Text-Independent Speaker Verification ReportText-Independent Speaker Verification Report
Text-Independent Speaker Verification Report
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From Speech
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
 
Improvement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation methodImprovement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation method
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Speaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMSpeaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBM
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 

Andere mochten auch

Applications of Emotions Recognition
Applications of Emotions RecognitionApplications of Emotions Recognition
Applications of Emotions RecognitionFrancesco Bonadiman
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechIOSR Journals
 
Vulnerabilities and attacks targeting social networks and industrial control ...
Vulnerabilities and attacks targeting social networks and industrial control ...Vulnerabilities and attacks targeting social networks and industrial control ...
Vulnerabilities and attacks targeting social networks and industrial control ...ijcsa
 
Nocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless linksNocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless linksijcsa
 
Vulnerability scanners a proactive approach to assess web application security
Vulnerability scanners a proactive approach to assess web application securityVulnerability scanners a proactive approach to assess web application security
Vulnerability scanners a proactive approach to assess web application securityijcsa
 
A survey on cloud security issues and techniques
A survey on cloud security issues and techniquesA survey on cloud security issues and techniques
A survey on cloud security issues and techniquesijcsa
 
Impact of HeartBleed Bug in Android and Counter Measures
Impact of HeartBleed Bug in Android and Counter  Measures Impact of HeartBleed Bug in Android and Counter  Measures
Impact of HeartBleed Bug in Android and Counter Measures ijcsa
 
Persian arabic document segmentation based on hybrid approach
Persian arabic document segmentation based on hybrid approachPersian arabic document segmentation based on hybrid approach
Persian arabic document segmentation based on hybrid approachijcsa
 
Accelerating the ant colony optimization by
Accelerating the ant colony optimization byAccelerating the ant colony optimization by
Accelerating the ant colony optimization byijcsa
 
Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...ijcsa
 
A survey early detection of
A survey early detection ofA survey early detection of
A survey early detection ofijcsa
 
Implementation and performance evaluation of
Implementation and performance evaluation ofImplementation and performance evaluation of
Implementation and performance evaluation ofijcsa
 
Modeling of manufacturing of a field effect transistor to determine condition...
Modeling of manufacturing of a field effect transistor to determine condition...Modeling of manufacturing of a field effect transistor to determine condition...
Modeling of manufacturing of a field effect transistor to determine condition...ijcsa
 
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...ijcsa
 
Web log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy cWeb log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy cijcsa
 
Comparative study of augmented reality
Comparative study of augmented realityComparative study of augmented reality
Comparative study of augmented realityijcsa
 
Labview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric imagesLabview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric imagesijcsa
 
Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...ijcsa
 
An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...
An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...
An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...ijcsa
 

Andere mochten auch (20)

Emotion recognition
Emotion recognitionEmotion recognition
Emotion recognition
 
Applications of Emotions Recognition
Applications of Emotions RecognitionApplications of Emotions Recognition
Applications of Emotions Recognition
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
Vulnerabilities and attacks targeting social networks and industrial control ...
Vulnerabilities and attacks targeting social networks and industrial control ...Vulnerabilities and attacks targeting social networks and industrial control ...
Vulnerabilities and attacks targeting social networks and industrial control ...
 
Nocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless linksNocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless links
 
Vulnerability scanners a proactive approach to assess web application security
Vulnerability scanners a proactive approach to assess web application securityVulnerability scanners a proactive approach to assess web application security
Vulnerability scanners a proactive approach to assess web application security
 
A survey on cloud security issues and techniques
A survey on cloud security issues and techniquesA survey on cloud security issues and techniques
A survey on cloud security issues and techniques
 
Impact of HeartBleed Bug in Android and Counter Measures
Impact of HeartBleed Bug in Android and Counter  Measures Impact of HeartBleed Bug in Android and Counter  Measures
Impact of HeartBleed Bug in Android and Counter Measures
 
Persian arabic document segmentation based on hybrid approach
Persian arabic document segmentation based on hybrid approachPersian arabic document segmentation based on hybrid approach
Persian arabic document segmentation based on hybrid approach
 
Accelerating the ant colony optimization by
Accelerating the ant colony optimization byAccelerating the ant colony optimization by
Accelerating the ant colony optimization by
 
Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...
 
A survey early detection of
A survey early detection ofA survey early detection of
A survey early detection of
 
Implementation and performance evaluation of
Implementation and performance evaluation ofImplementation and performance evaluation of
Implementation and performance evaluation of
 
Modeling of manufacturing of a field effect transistor to determine condition...
Modeling of manufacturing of a field effect transistor to determine condition...Modeling of manufacturing of a field effect transistor to determine condition...
Modeling of manufacturing of a field effect transistor to determine condition...
 
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
 
Web log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy cWeb log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy c
 
Comparative study of augmented reality
Comparative study of augmented realityComparative study of augmented reality
Comparative study of augmented reality
 
Labview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric imagesLabview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric images
 
Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...
 
An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...
An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...
An ahp (analytic hierarchy process)fce (fuzzy comprehensive evaluation) based...
 

Ähnlich wie Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn

Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features ijsc
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionjournalBEEI
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
 
Mfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechMfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechsipij
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignmentskevig
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identificationijcisjournal
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 

Ähnlich wie Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn (20)

Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
A017410108
A017410108A017410108
A017410108
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Mfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechMfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speech
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak Transform
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identification
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 

Kürzlich hochgeladen

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn

  • 1. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 DOI:10.5121/ijcsa.2014.4104 35 Automatic Speech Emotion and Speaker Recognition based on Hybrid GMM and FFBNN J. Sirisha Devi1 , Y. Srinivas2 and Siva Prasad Nandyala3 1 Assitant Professor, Dept of IT, GRIET ,Hyderabad,501401, Andhra Pradesh, India 2 Professor, Dept of IT, GITAM University,Visakhapatnam, 530045, Andhra Pradesh, India 3 Research Scholar, Dept of ECE, NIT Warangal,Warangal, 506004, Andhra Pradesh, India ABSTRACT In this paper we present text dependent speaker recognition with an enhancement of detecting the emotion of the speaker prior using the hybrid FFBN and GMM methods. The emotional state of the speaker influences recognition system. Mel-frequency Cepstral Coefficient (MFCC) feature set is used for experimentation. To recognize the emotional state of a speaker Gaussian Mixture Model (GMM) is used in training phase and in testing phase Feed Forward Back Propagation Neural Network (FFBNN). Speech database consisting of 25 speakers recorded in five different emotional states: happy, angry, sad, surprise and neutral is used for experimentation. The results reveal that the emotional state of the speaker shows a significant impact on the accuracy of speaker recognition. KEYWORDS Mel-frequency Cepstral Coefficient (MFCC), Feed Forward Back Propagation Neural Network (FFBNN), Gaussian Mixture Model (GMM). 1. INTRODUCTION Speaker recognition refers to recognize the person from their speech. Accuracy of recognition increases if the system is text-independent [1]. The speech signal contains the message being spoken, the emotional state of the speaker and the information of the speaker. Therefore we can use the speech signal for both speech and speaker recognition [2]. Speaker recognition extracts the underlying linguistic message in an utterance. It is the process of automatically recognizing who is speaking on the basis of features present in the speech signal. Speaker recognition is helpful in the areas such as voice dialling, banking by telephone, telephone shopping, database access services, information services, and security control for confidential information areas, voice mail and remote access to computers [9]. Influence of emotional state of human speech in speaker recognition is very high [13]. The term “emotion” can refer to an extremely complex state associated with a wide variety of mental, physiological and physical events. Emotional speech database is valuable for this speaker recognition. In this paper a combination of speech features is considered. The spectral feature Mel frequency cepstral coefficients best suited for speaker recognition and the prosodic feature pitch, which is strongly dependent on the emotional state of the speaker. In section 2, feature extraction of speech signal is performed. In section 3, classification of the feature vectors is explained. In section 4, experiment shows how training and testing are performed and result analysis is done.
  • 2. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 36 2. FEATURE EXTRACTION 2.1 Pre-processing Speaker specific features will be present in the voiced part of the speech signals [9]. Pre- processing is a process of separating the voiced and unvoiced speech signals. 2.1.1 Noise removal While recording the speech signal, it consists of various unwanted piece of signals which are considered to be noise/unvoiced part of speech. This noise part may be due to low quality of source, loss of speech segments, channel fading, and presense of noise in the channel or echo or reverberation. A low-pass filter allows signal frequencies below the low cut-off frequency to pass and stops frequencies above the cut-off frequency. Speaker specific information is available in the higher frequencies (above 4 kHz), so a low pass filter is used to remove all the frequencies less than 4 KHz as unvoiced data. 2.1.2 Framing Each speech signal is split into frame segments of size 30ms (milli second) where the sampling rate is 22.05 KHz. Each segment is extracted at every 50ms interval. This implies that the overlap between segments is 10ms. 2.1.3 Windowing After framing, each and every frame will contain discontinuities at the beginning and end of each frame. The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. If we define the window as Eq.1 ( ), 0 ≤ ≤ − 1 (1) Where, N is the number of samples in each frame, then the result of windowing is the signal, Eq.2 ( ) = ( ) ( ), 0 ≤ ≤ − 1 (2) Typically the Hamming window is used, which has the form (Eq.3): 10, 1 2 cos46.054.0)( −≤≤      − −= Nn N n nw  (3) 2.1.4 Fast Fourier Transform (FFT The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain. FFT gets log magnitude spectrum to determine MFCC. We have used 1024 point to get better frequency resolution. 2.2 Mel- Frequency Cepstral Coefficients(MFCC) Prosody is the suprasegmental phonology of speech sounds, involving (but not limited to) aspects such as pitch, loudness, tempo, and stress [10]. Pitch, often referred to as fundamental frequency,
  • 3. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 37 is the vibration rate of the vocal folds. Gold-Rabiner algorithm [3] illustrates pitch extraction based on the fact that locating the position of the maximum point of excitation is not always determinable from the time-waveform. Pitch is elaborated as the time difference between two major peaks in the voiced signal. The method of pitch extraction from the waveforms is described in the following steps: Step 1: Speech signal is given as input to the system. Step 2: Normalized Cross Correlation Function is computed periodically for all lags of f0 and the locations of the all local maxima obtained in the first iteration are stored. Step 3: Normalized cross correlation is performed for improved high sampled rate signals in order to determine the accurate location of peaks, with estimations applied to amplitude. Step 4: f0 for the particular frame is obtained by the normalized cross correlation related to the particular frame. Step 5: Dynamic programming is used in order to select the set of cross correlation peaks or unvoiced hypothesis across all frames. Spectral features characterize signal properties in the frequency domain, thus providing useful additions to prosodic features. The frequency domain representation can be obtained by taking the discrete Fourier transform (DFT) of the discrete-time signal. However, the DFT of a signal usually results in a high-dimensional representation, in addition, with fine spectral details that may turn out to impair recognition performance. Therefore, the DFT spectrum is usually transformed into some more compact feature representations for speech related recognition tasks, such as the mel-frequency cepstral coefficients (MFCCs). The 20 Mel triangular filters are designed with 50% overlapping. From each filter the spectrum is added to get one coefficient each, in this way we have considered the first 13 coefficients as our features. These frequencies are converted to Mel scale using following conversion formula (Eq.4). ( ) = 2595 ∗ log 1 + 700 (4) We have considered 13 MFCC coefficients because; of the fact it gives better recognition accuracy than other coefficients. 3. CLASSIFICATION TECHNIQUES In both the training and testing phases, before speaker recognition the emotional state of speaker is identified [11]. A new hybrid model is proposed in which GMM in training and FFBN in testing is used for emotion recognition. 3.1 Emotion Recognition 3.1.1 Gaussian Mixture Model The feature vector thus obtained is send to GMM classifier for emotion recognition in the training phase. In the GMM based classifier [5] [7], probability density function of the feature space is modeled with a diagonal covariance GMM for each emotion [12]. Probability density function is a weighted combination of K component densities given by Eq.5. ( ) = ∑ ( ⎟ ) (5) where f : observation feature vector ѡk : mixture weight associated with the k-th Gaussian component.
  • 4. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 38 The weights satisfy the constraints, 0 ≤ ≤ 1 and ∑ = 1 The conditional probability p(f |k) is modeled by Gaussian distribution with the component mean vector µk, and the diagonal covariance matrix Σk. The GMM for a given emotion is extracted through the expectation-maximization based iterative training process using a set of training feature vectors representing the emotion. In the emotion recognition phase [8], posterior probability of the features of a given speech utterance is maximized over all emotion GMM densities. Given a sequence of feature vectors for a speech utterance, F = {f1, f2, . . . , fT }, let’s define the loglikelihood of this utterance for emotion class e with a GMM density model γe as given in Eq. (6), = log ( ⎟ ) = ∑ log (f │γ ) (6) Where, p(F|sγe) is the GMM probability density for the emotion class e as defined in eq. 5. Then, the emotion GMM density that maximizes posterior probability of the utterance is set as the recognized emotion class is given by Eq. (7). = arg max (7) Where, E is the set of emotions and ε is the recognized emotion. 3.1.2 Feed Forward Backpropogation Neural Network (FFBNN) In testing phase, emotion recognition from the speech signals is done by FFBNN. The features are extracted for the speech signal and given to the FFBNN to perform the testing process. The speech signal’s corresponding MFCC features are taken as input to the FFBNN [4]. Here, we have taken three inputs nodes as energy entropy (Ee), short-time energy (Se) and Zero crossing Rate (ZCR), dN number of hidden layers and one output layer, which is a prosody effect of the given input signal. The proposed emotion classification FFBNN structure is shown in Figure 1. Initially, the input feature values are transmitted to the hidden layer and then, to the output layer. Each node in the hidden layer gets input from the input layer, which are multiplexed with suitable weights and summed. The hidden layer input value calculation function is called as bias function, which is described below by Eq. (8), )( 1 ihCRhihehiheh N h ZwSwEwH d i +++= ∑=  (8) In Eq. (8), ieE , ieS and iCRZ are the feature values of the ith person speech signal. The activation function in the output layer is given in Eq. (9). The output values from the output layer are compared with target values and the learning error rate for the neural network is computed, which is given in Eq. (10). iH e − + = 1 1  (9) ∑ − = −= 1 0 1 d i N h i hh d OD N  (10)
  • 5. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 39 In eqn. (10),  - is the learning error rate i hD - is the desired output i hO - is the actual output. The error between the nodes is transmitted back to the hidden layer and this process is called the backward pass of the back propagation algorithm. The reduction of error by back propagation algorithm is described in the subsequent steps. (i) Initially, the weights are assigned to hidden layer neurons. The input layer has a constant weight, whereas the weights for output layer neurons are chosen arbitrarily. Subsequently, the bias function and output layer activation function are computed by using the Eq. (8) and (9). (ii) Next, the back propagation error is computed for each node and the weights are updated by using the Eq. (11). ihihih www ∆+= (11) Where, the weight ihw∆ is changed, which is given by Eq.(12), )( ..   ENw ihih =∆ (12) Where,  is the learning rate that normally ranges from 0.2 to 0.5, and )( E is the BP error. The bias function, activation function, and BP error calculation process are continued till the BP error gets reduced i.e. 1.0)( < E . If the BP error reaches a minimum value, then the FFBNN is well trained by the speech signals feature values for performing the emotion classification. The well trained FFBNN provides an accurate classification results for the input emotion speech signals. The well trained FFBNN classify the input speech signals based on the prosody which the speech signal belongs to. 3.1.3 Hybrid GMM/FFBN Approach Therefore, we followed a different approach, namely the extension of a state-of-the-art system that achieves extremely good recognition rates with a GMM that is trained. For recognition purpose used the FFBN method. 3.2 Speaker Recognition using HMM Hidden Markov Model is used to recognize the speaker after emotion is recognized by hybrid GMM/FFBN. HMM is a statistical tool for modeling generative sequence to generate observable sequence when characterized by the primary process. The observation is turned to be a probabilistic function (discrete or continuous) of a state instead of a one-to-one correspondence of a state. Each state randomly generates one of M observations (or visible states). To define hidden Markov model, the following probabilities have to be specified: Matrix of transition probabilities, A=(aij), aij= P(si | sj)
  • 6. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 40 Matrix of observation probabilities, B=(bi (vm )), bi(vm ) = P(vm | si) Vector of initial probabilities, π=(πi), πi = P(si) Model is represented by M=(A, B, π) 4. EXPERIMENTS AND RESULTS MATLAB R2013a has been used for experimentation. 4.1 Training phase In training phase, the extracted MFCC vector is sent to both the classification techniques. First emotion recognition is done using GMM classifier then the same feature vector is passed to HMM classifier for speaker recognition (SR). For each and every emotion a unique record is generated. Figure 2 explains the block diagram of the entire procedure. 4.2 Testing phase After feature extraction the feature vector is passed to FFBN classifier for emotion recognition. Ones the emotion recognition is recognized, based on the emotional state the corresponding record will be supplied for further recognition part which is speaker recognition using HMM. 4.3 Results When the emotional state of speaker differs in the testing phase the recognition rate decreased significantly. The results shows that the accuracy rate of speaker recognition has been considerably increased when compared to the recognition rate where emotional state of the speaker was not considered. The accuracy rate of speaker indirectly depends on the accuracy of emotion recognition. As well as when we compared the accuracy rate of emotion recognition using GMM with the proposed system, the accuracy rates have been considerably increased. The recognition rate obtained is 93 % with the hybrid model compared to 91% using the same method for training and testing for emotion recognition. Table 1shows the comparison of speaker recognition results. The Figure 3 gives the simulation results of speaker recognition for the given emotions. Table 1: Comparison of speaker recognition rate for different emotions Emotional State Happy Angry Sad Surprise Neutral Recognition Rate (%) Emotion (GMM) 89 94 89 86 94 Speaker (HMM) 88 97 92 90 96 Recognition Rate (%) of Hybrid Model Emotion (Hybrid GMM/FFBN) 90 95 89 85 95 Speaker (HMM) 90 97 92 92 97
  • 7. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 41 Figure 1: Structure of FFBNN in Emotion Classification Figure 2: Process Flow Figure 3: Simulation Results
  • 8. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.1, February 2014 42 REFERENCES [1] Gish, H., Schmidt, M., 1994. Text-independent speaker recognition. IEEE Signal Process. Magazine (October), 18–32. [2] Furui, S., 1981. Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29 (2), 254–272. [3] Gold, B. and Rabiner, L.R, Parallel processing techniques for estimating pitch periods of speech in time-domain. [4] Raul Fernandez, Rosalind Picard, “Recognizing affect from speech prosody using hierarchical graphical models”, Journal Speech Communication, Vol. 53, No. 9-10, pp. 1088-1103, 2011. [5] Lukas Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra Goel, Martin Karafiat, Daniel Povey, Ariya Rastrow, Richard C. Rose, Samuel Thomas, "Multilingual Acoustic Modeling for Speech Recognition Based on Subspace Gaussian Mixture Models", In Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, Dallas, TX, pp. 4334 - 4337, 2010. [6] Lawrence Rabiner, Biing-Hwang Juang and B.Yegnanarayana, ”Fundamental of Speech Recognition”, Prentice-Hall, Englewood Cliffs, 2009. [7] B. Schuller, S. Steidl, and A. Batliner, “The interspeech 2009 emotion challenge,” in Interspeech (2009), ISCA, Brighton, UK, 2009. [8] K. R. Scherer, “How emotion is expressed in speech and singing,” in Proceedings of XIIIth International Congress of Phonetic Sciences., pp. 90-96. 1995 [9] M. Kockmann, L. Ferrer, L. Burget, E. Shriberg, and J. H. Cernocký, "Recent progress in prosodic speaker verification," in Proc. IEEE ICASSP, (Prague), pp. 4556--4559, May 2011. [10] Oliver Gauci, Carl J. Debono, Paul Micallef, "A Reproducing Kernel Hilbert Space Approach for Speech Enhancement" ISCCSP 2008, Malta, 12-14 March 2008 [11] Justin, J. and I. Vennila,” A Hybrid Speech Recognition System with Hidden Markov Model and Radial Basis Function Neural Network” American Journal of Applied Sciences, Volume10, Issue 10, Pages 1148-1153. [12] Aditya Bihar Kandali, Aurobinda Routray, Tapan Kumar Basu,” Emotion recognition from Assamese speeches using MFCC features and GMM classifier”, Tencon 2008 - 2008 IEEE Region 10 Conference. [13] Marius Vasile Ghiurcau , Corneliu Rusu,Jaakko Astola, “A Study Of The Effect Of Emotional State Upon Text-Independent Speaker Identification”, ICASSP 2011