SlideShare a Scribd company logo
1 of 14
Application of Fisher Linear Discriminant Analysis
to Speech/Music Classification
Enrique Alexandre, Manuel Rosa, Lucas Cuadra, and Roberto Gil-Pita
Departamento de Teor´ıa de la Se˜nal y Comunicaciones
Universidad de Alcal´a. 28805 - Alcal´a de Henares,
Madrid, Spain
Presented By:
S. Lushanthan
Agenda
 Objective
 Time Frequency Decomposition
 Feature Extraction
 Classification Algorithms
 Data Collection
 Results and Discussion
Objective
 The well-known K-N-N algorithm has been widely used in many sound
classification applications. The Objective here is to,
“Demonstrate the superior behavior of the Fishers Linear Discriminant
algorithm compared to the K-Nearest-Neighbor algorithm”
Why Speech/ Music Classification?
 Fisher LDA Classifier has not been tried much in the domain of speech/ audio
classification
 If this succeeds, this would be a first-step in many Music-Genre Classification
systems
Signal Processing – Time frequency
decomposition
Feature Extraction
 Literature says that features can be classified in to 3 different classes,
1. Timbre-related
2. Rhythm related
3. Pitch-related
 For simplification purposes only “timbre-related” features are used
 A 512-samples window is used, with no overlap between adjacent frames
 The time-frequency decomposition is performed using either a Modified
Discrete Cosine Transform (MDCT), or a Discrete Fourier Transform (DFT)
 All the features are calculated and their mean and standard deviation are
computed every 43 frames (1.85 seconds at our sampling rate). Thus a 2-
dimensional vector, containing the mean and standard deviation
computed every 43 frames
Feature Description Mathematical Equation
Spectral Centroid Measure of brightness of a sound
Spectral Roll-off Shape of the spectrum
Zero Crossing Rate (ZCR) How noisy a signal is
High Zero Crossing Rate Ratio # of frames whose ZCR is 1.5x above
the mean ZCR
Short-Time Energy (STE) Mean energy of the signal within each
analysis frame
Low Short-Time Energy Ratio Ratio of frames whose STE is 0.5x below
the mean STE
Mel-frequency Cepstral
Coefficients (MFCC)
Provide a compact representation of
the spectral envelope
Voice2White measure of the energy inside the
typical speech band (300-4000 Hz)
respect to the whole energy of the
signal
Activity Level calculated using method for the
objective measurement of active
speech
Classification Algorithms
K- Nearest- Neighbor
 Classification Rule
Assume that we have a training set with L vectors grouped into C different classes. To
obtain the class corresponding to a new observed vector X, the algorithm has simply
to look for the K nearest neighbors to the test vector X, and weigh their class
numbers they belong to, usually using a majority rule.
Fisher LDA
 Data are projected onto a line, and the classification is performed in this one-
dimensional space
 The class separability function in a direction w є Rn is defined as:
 Find an analytic expression for w which maximizes J(w):
SB and SW are the between-class and
within class scatter matrixes respectively
Data Collection
 Corpus for speech/music classification provided by Dan Ellis originally recorded by
Eric Scheirer during his internship at Interval Research Corporation
“Music-Speech” Corpus
Training
Data
music
(60 files)
speech
(60 files)
m + s
(60 files)
Test
Data
speech
(without bgm)
(120 files)
music with
no vocals
(126 files)
music with
vocals
(120 files)
45 minutes,
15 seconds
each
15.25 minutes,
2.5 seconds
each
Results and Discussion
 Fisher LDA, 1-N-N, 3-N-N for all features individually
 Probability of Error
Feature Fisher 1-NN 3-NN
Centroid (MDCT) 8.74% 17.48% 21.85%
Centroid (DFT) 16.66% 29.23% 30.60%
Roll-off (MDCT) 14.48% 25.40% 21.85%
Roll-off (DFT) 8.19% 13.11% 13.11%
ZCR 9.83% 19.67% 18.03%
HZCRR 25.13% 39.89% 36.33%
STE 48.63% 22.40% 22.67%
LSTER 11.74% 33.87% 23.77%
MFCC 4.09% 22.13% 26.50%
Voice2White 4.91% 6.28% 6.01%
Activity level 12.84% 18.03% 18.85%
Combination of two or more
of these features does not
seem to improve the results.
e.g:
MFCC and the Voice2White
features with a Fisher linear
discriminant classifier, leads to a
probability of error equal to
4.09%,the same with MFCC alone
Confusion matrixes using the Voice2White
feature
Classifier Speech Music
Fisher
Speech 104 16
Music 2 244
1-N-N
Speech 114 6
Music 17 229
3-N-N Speech 116 4
Music 18 228
Fisher LDA has
high probability
of error when the
input is Speech
K-N-N has high
probability of
error when the
input is Music
So Why not combine classifiers using
Majority Rule for better results?
Probability of Error drops to 4.5%
Conclusion
 Fisher linear discriminant analysis can provide very promising results using
only one feature for the classification
 Better results may be obtained combining the results obtained from two or
more classifiers
Thank you!

More Related Content

What's hot

CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
zukun
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
Miaolan Xie
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
James Bell
 
Smooth entropies a tutorial
Smooth entropies a tutorialSmooth entropies a tutorial
Smooth entropies a tutorial
wtyru1989
 
icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part II
zukun
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
zukun
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
presentation
presentationpresentation
presentation
jie ren
 

What's hot (20)

NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
Interactive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationInteractive Latent Dirichlet Allocation
Interactive Latent Dirichlet Allocation
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition
 
Db Scan
Db ScanDb Scan
Db Scan
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
 
Smooth entropies a tutorial
Smooth entropies a tutorialSmooth entropies a tutorial
Smooth entropies a tutorial
 
Combinatorial Optimization
Combinatorial OptimizationCombinatorial Optimization
Combinatorial Optimization
 
icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part II
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 
Dft
DftDft
Dft
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
presentation
presentationpresentation
presentation
 
main
mainmain
main
 

Viewers also liked

face recognition system
face recognition systemface recognition system
face recognition system
Anil Kumar
 
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
MLconf
 
Kernel fisher discriminant
Kernel fisher discriminantKernel fisher discriminant
Kernel fisher discriminant
Đỗ Hợp
 
Face recognition using laplacian faces
Face recognition using laplacian facesFace recognition using laplacian faces
Face recognition using laplacian faces
Pulkiŧ Sharma
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
Bhasker Rajan
 
T18 discriminant analysis
T18 discriminant analysisT18 discriminant analysis
T18 discriminant analysis
kompellark
 

Viewers also liked (20)

Introduction to Functional Data Analysis
Introduction to Functional Data AnalysisIntroduction to Functional Data Analysis
Introduction to Functional Data Analysis
 
face recognition system
face recognition systemface recognition system
face recognition system
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
LDA presentation
LDA presentationLDA presentation
LDA presentation
 
4 new-patch-agggregation.pptx
4 new-patch-agggregation.pptx4 new-patch-agggregation.pptx
4 new-patch-agggregation.pptx
 
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15
 
Kernel fisher discriminant
Kernel fisher discriminantKernel fisher discriminant
Kernel fisher discriminant
 
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorLec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
 
Microsoft Web Technology Stack
Microsoft Web Technology StackMicrosoft Web Technology Stack
Microsoft Web Technology Stack
 
Face recognition using LDA
Face recognition using LDAFace recognition using LDA
Face recognition using LDA
 
Face recognition using laplacian faces
Face recognition using laplacian facesFace recognition using laplacian faces
Face recognition using laplacian faces
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
LDA
LDALDA
LDA
 
PCA vs LDA
PCA vs LDAPCA vs LDA
PCA vs LDA
 
T18 discriminant analysis
T18 discriminant analysisT18 discriminant analysis
T18 discriminant analysis
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Lda
LdaLda
Lda
 
discriminant analysis
discriminant analysisdiscriminant analysis
discriminant analysis
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 

Similar to Application of Fisher Linear Discriminant Analysis to Speech/Music Classification

129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
威華 王
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
CSCJournals
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
威華 王
 

Similar to Application of Fisher Linear Discriminant Analysis to Speech/Music Classification (20)

sound level meter octave band ananlyser.pptx
sound level meter octave band ananlyser.pptxsound level meter octave band ananlyser.pptx
sound level meter octave band ananlyser.pptx
 
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
 
F010334548
F010334548F010334548
F010334548
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
Acoustic fMRI noise reduction: a perceived loudness approach
Acoustic fMRI noise reduction: a perceived loudness approachAcoustic fMRI noise reduction: a perceived loudness approach
Acoustic fMRI noise reduction: a perceived loudness approach
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
example based audio editing
example based audio editingexample based audio editing
example based audio editing
 
T26123129
T26123129T26123129
T26123129
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotions
 
S@P Noise.pptx
S@P Noise.pptxS@P Noise.pptx
S@P Noise.pptx
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Application of Fisher Linear Discriminant Analysis to Speech/Music Classification

  • 1. Application of Fisher Linear Discriminant Analysis to Speech/Music Classification Enrique Alexandre, Manuel Rosa, Lucas Cuadra, and Roberto Gil-Pita Departamento de Teor´ıa de la Se˜nal y Comunicaciones Universidad de Alcal´a. 28805 - Alcal´a de Henares, Madrid, Spain Presented By: S. Lushanthan
  • 2. Agenda  Objective  Time Frequency Decomposition  Feature Extraction  Classification Algorithms  Data Collection  Results and Discussion
  • 3. Objective  The well-known K-N-N algorithm has been widely used in many sound classification applications. The Objective here is to, “Demonstrate the superior behavior of the Fishers Linear Discriminant algorithm compared to the K-Nearest-Neighbor algorithm” Why Speech/ Music Classification?  Fisher LDA Classifier has not been tried much in the domain of speech/ audio classification  If this succeeds, this would be a first-step in many Music-Genre Classification systems
  • 4. Signal Processing – Time frequency decomposition
  • 5. Feature Extraction  Literature says that features can be classified in to 3 different classes, 1. Timbre-related 2. Rhythm related 3. Pitch-related  For simplification purposes only “timbre-related” features are used  A 512-samples window is used, with no overlap between adjacent frames  The time-frequency decomposition is performed using either a Modified Discrete Cosine Transform (MDCT), or a Discrete Fourier Transform (DFT)  All the features are calculated and their mean and standard deviation are computed every 43 frames (1.85 seconds at our sampling rate). Thus a 2- dimensional vector, containing the mean and standard deviation computed every 43 frames
  • 6. Feature Description Mathematical Equation Spectral Centroid Measure of brightness of a sound Spectral Roll-off Shape of the spectrum Zero Crossing Rate (ZCR) How noisy a signal is High Zero Crossing Rate Ratio # of frames whose ZCR is 1.5x above the mean ZCR Short-Time Energy (STE) Mean energy of the signal within each analysis frame Low Short-Time Energy Ratio Ratio of frames whose STE is 0.5x below the mean STE Mel-frequency Cepstral Coefficients (MFCC) Provide a compact representation of the spectral envelope Voice2White measure of the energy inside the typical speech band (300-4000 Hz) respect to the whole energy of the signal Activity Level calculated using method for the objective measurement of active speech
  • 7. Classification Algorithms K- Nearest- Neighbor  Classification Rule Assume that we have a training set with L vectors grouped into C different classes. To obtain the class corresponding to a new observed vector X, the algorithm has simply to look for the K nearest neighbors to the test vector X, and weigh their class numbers they belong to, usually using a majority rule.
  • 8. Fisher LDA  Data are projected onto a line, and the classification is performed in this one- dimensional space  The class separability function in a direction w є Rn is defined as:  Find an analytic expression for w which maximizes J(w): SB and SW are the between-class and within class scatter matrixes respectively
  • 9. Data Collection  Corpus for speech/music classification provided by Dan Ellis originally recorded by Eric Scheirer during his internship at Interval Research Corporation
  • 10. “Music-Speech” Corpus Training Data music (60 files) speech (60 files) m + s (60 files) Test Data speech (without bgm) (120 files) music with no vocals (126 files) music with vocals (120 files) 45 minutes, 15 seconds each 15.25 minutes, 2.5 seconds each
  • 11. Results and Discussion  Fisher LDA, 1-N-N, 3-N-N for all features individually  Probability of Error Feature Fisher 1-NN 3-NN Centroid (MDCT) 8.74% 17.48% 21.85% Centroid (DFT) 16.66% 29.23% 30.60% Roll-off (MDCT) 14.48% 25.40% 21.85% Roll-off (DFT) 8.19% 13.11% 13.11% ZCR 9.83% 19.67% 18.03% HZCRR 25.13% 39.89% 36.33% STE 48.63% 22.40% 22.67% LSTER 11.74% 33.87% 23.77% MFCC 4.09% 22.13% 26.50% Voice2White 4.91% 6.28% 6.01% Activity level 12.84% 18.03% 18.85% Combination of two or more of these features does not seem to improve the results. e.g: MFCC and the Voice2White features with a Fisher linear discriminant classifier, leads to a probability of error equal to 4.09%,the same with MFCC alone
  • 12. Confusion matrixes using the Voice2White feature Classifier Speech Music Fisher Speech 104 16 Music 2 244 1-N-N Speech 114 6 Music 17 229 3-N-N Speech 116 4 Music 18 228 Fisher LDA has high probability of error when the input is Speech K-N-N has high probability of error when the input is Music So Why not combine classifiers using Majority Rule for better results? Probability of Error drops to 4.5%
  • 13. Conclusion  Fisher linear discriminant analysis can provide very promising results using only one feature for the classification  Better results may be obtained combining the results obtained from two or more classifiers

Editor's Notes

  1. Discreate Fourier Transformation
  2. J(w) Equation- Rayleigh quotient