SlideShare ist ein Scribd-Unternehmen logo
1 von 36
HIgh Performance
Computing & Systems LAB
Unsupervised feature learning for audio classif-
ication using convolutional deep belief networks
Honglak Lee Yan Largman Peter Pham Andrew Y. Ng
Thesis Presenter Chung il Kim
Computer Science Departement, Stanford University
Stanford, CA 94305
Advances in Neural Information Processing Systems 22 (NIPS 2009)
High Performance Computing & Systems Lab
Contents
 Abstract & Introduction
 Theory & Algorithm
 Convolutional Deep Belief Networks(CDBN)
 on Shift Invariant Sparse Coding(SISC)
 Unsupervised Feature Learning
 Application to Audio Recognition Tasks
 Speech Recognition
 Music Classification
 Discussion and Conclusion
31th Aug 2017, Paper Seminar 2
High Performance Computing & Systems Lab
1. Abstract & Introduction (1)
 Abstract
 Deep learning approaches
 Build hieratical representations on unlabeled data
 Focusing on unlabeled auditory data
 Using Convolutional deep belief network(CDBN)
 Evaluate auditory data on various audio classification tasks
 RAW
 MFCC
 CDBN(L1, L2)
31th Aug 2017, Paper Seminar 3
High Performance Computing & Systems Lab
1. Abstract & Introduction (2)
 Introduction
 Issue of Audio data recognition
 Toward high dimension and complex
 Previous work[1, 2]
 sparse coding leads to filters correspond to cochlear filters
 Related work[3]
 Efficient sparse coding algorithm for audio classification tasks
– Feature sign search algorithm(FS-EXACT, FS-Window)
– Lagrangian of DFT
31th Aug 2017, Paper Seminar 4
[1] E. C. Smith and M. S. Lewicki. Efficient auditory coding. Nature, 439:978–982, 2006.
[2] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.
[3] R. Grosse, R. Raina, H. Kwong, and A.Y. Ng. Shift-invariant sparse codig for audio classification. In UAI, 2007.
High Performance Computing & Systems Lab
1. Abstract & Introduction (3)
 Introduction
 The limit of those methods
 Applied to learn relatively shallow
 1-layer representations
 Many promising approached [4, 5, 6, 7, 8] usually Image
 Fast
 With energy-based model
 Greedy
 Empirical evaluation
 But Deep learning not applied to auditory data
31th Aug 2017, Paper Seminar 5
[4]G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
[5]M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Efficient learning of sparse representations with an energy-based model. In NIPS, 2006.
[6]Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, 2006.
[7]H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In ICML, 2007.
[8]H. Lee, C. Ekanadham, and A. Y. Ng. Sparse deep belief network model for visual area V2. In NIPS, 2008.
High Performance Computing & Systems Lab
1. Abstract & Introduction (4)
 Introduction
 Deep belief network
 Generative probabilitistic model
– Composed 1 visible layer, and many hidden layer
 Well-trained using ‘Greedy Layerwise Training’
 Convolutional deep belief network(CDBN) [9]
 Also trained as greedy, bottom-up fashion
 Good performance in several visual recognition tasks
 CDBN on unlabeled audio data
 evaluate the learned feature representations
– several audio classification tasks
31th Aug 2017, Paper Seminar 6
[9]H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, 2009.
High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (1)
 Convolutional Restricted Boltzmann Machines(CRBMs)
 CDBN, consist of CRBMs block model
31th Aug 2017, Paper Seminar 7
<Figure 1> Image of Convolution Deep Belief Networks
1. Set partial area
2. Get detection through filter
(highly overcomplete, sparse needed)
3. Pooling(usually max-pooling)
4. Greedy layerwise traing
-more than 1
5. Get pattern of visible data.
High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (2)
 Convolutional Restricted Boltzmann Machines(CRBMs)
 Extension of ‘regular’ Restricted Boltzmann Machines(RBMs)
 Decrease related dimension
 makes sparsity problem
31th Aug 2017, Paper Seminar 8
<Figure2> dimension down, sparse
High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (3)
 CDBNs
 Energy function
 CRBMs’ Probability distribution referred using energy(next page)
31th Aug 2017, Paper Seminar 9
<Formula 1> Energy function of CRBMs in binary(up) and real-valued(down)
nv : No. dimensional array of binary unit
nw : No. dimensional filter array
K : number of filter
nH : No. dimensional array of hidden unit
(nv – nw + 1)
bk : shared bias for each group
c : shared bias for visible units
High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (4)
 CDBNs
 Probability distribution
 CRBMs’ Probability distribution referred using energy
31th Aug 2017, Paper Seminar 10
<Formula 2> joint and conditional probability distributions
*v : valid convolution
*f : full convolution
High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (5)
 Pooling layer
 Shrink data map
 In classification, usually using max-pooling
31th Aug 2017, Paper Seminar 11
0 0.5 0.5 0.4
0.7 0.1 0.2 0.4
0.9 0.3 0.7 0.5
0.5 0.8 0.2 0
0.7 0.5 0.5
0.9 0.7 0.7
0.9 0.8 0.7
<Picture3> Image of max-pooling
High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (6)
 Process of CDBNs
 Set partial area
 Get detection through filter (highly overcomplete, sparse needed)
 Pooling(usually max-pooling)
 Greedy layerwise Training more than 1
 Get pattern of visible data.
31th Aug 2017, Paper Seminar 12
<Picture4> Process of CDBNs
https://deeplearning4j.org/kr/convolutionnets
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (1)
 Sparsity
 Typical CRBM is highly overcomplete
 Sparsity penalty term added to log-likelihood
 To solve overfitting problem in deep neural networks
 Avoid full-connectivity
 This algorithm uses LASSO(the Least Absolute Shrinkage and Selection Operator)
31th Aug 2017, Paper Seminar 13
<Formula 4> the objective of sparsity
<Formula 3> the training objective
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (2)
 Two algorithm for solve SISC in audio data
 Coefficient, Figure-Sign Search algorithm
 Efficiency for short signals (x->low-dimensional)
 Not good for over 1minute
31th Aug 2017, Paper Seminar 14
<Pseudo 1> Feature-sign search algorithm 1
R. Grosse, R. Raina, H. Kwong, and A.Y. Ng. Shift-invariant sparse coding for audio classification. In UAI, 2007
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (3)
 Two algorithm for solve SISC in audio data
 Bases, using Lagrangian and DFT
 1st, Discrete Fourier Transform
– signal decompose
 2nd, Set Lagrangian
– To solve optimization
 3rd, using Newton’s method(in this Paper)
31th Aug 2017, Paper Seminar 15
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (4)
 Approach Tasks
 Using LASSO
 Partial differential equation
 Bias ↑, variance ↓ (trade off)
31th Aug 2017, Paper Seminar 16
Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/
<Pseudo 2> Feature-sign search algorithm 2
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (5)
 Approach Tasks
 By resulting ‘unconstrained QP’
 Compute analytical solution
 This is subvector of x
 Using discrete line search(LS), update x with to the point.
 Collect value which coefficient changes sign, and update the lowest one
31th Aug 2017, Paper Seminar 17
Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/
<Pseudo 3> Feature-sign search algorithm 2
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (6)
 Approach Tasks
 Last matching those condition, and repeat it.
31th Aug 2017, Paper Seminar 18
Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/
<Pseudo 2> Feature-sign search algorithm 2
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (7)
 Result of FS search(learning speed)
31th Aug 2017, Paper Seminar 19
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (8)
 Result of FS search(Speech)
 Speech data (TIMIT)
 1 second long, 32 speech signal with basis function
 Filter
 SISC(with FS), MFCC(Mel Frequency Cepstral Coefficient), RAW
31th Aug 2017, Paper Seminar 20
High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (9)
 Result of FS search(Musical genre)
 2-second, 5-way musical genre song.
 Filter
 SISC(with FS), TC(Tzanetakis & Cook)
 MFCC(Mel Frequency Cepstrum Coefficient), RAW
31th Aug 2017, Paper Seminar 21
High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (1)
 Description of TIMIT Data
 For researching speech recognition systems
 American English
 In This Research
 Spectrogram form
 Window size : 20ms
 Overlaps : 10ms
 Using PCA-Whitening(with 80 components)
– To reduce the dimensionality
 Research Contents
 Phonemes
 Speaker gender
31th Aug 2017, Paper Seminar 22
High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (2)
 Layer and Training Setting
 1st layer
 300 bases
 Filter length(nw) : 6
 Max-pooling ratio : 3
31th Aug 2017, Paper Seminar 23
 2nd layer
 300 bases (output of 1st layer)
 Filter length : 6
 Max-pooling ratio : 3
High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (3)
 Phonemes and the CDBN features
31th Aug 2017, Paper Seminar 24
 Analysis
 Vowel(“ah”, “oy”)
 Prominent horizontal bands
 Lower freq.
 “oy”
 Upward slanting pattern
High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (4)
 Phonemes and the CDBN features
31th Aug 2017, Paper Seminar 25
 Analysis
 Fricatives(“s”)
 Energy in the high freq.
 “el”
 High intensity in low freq.
 Low intensity follows in high freq.
High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (5)
 Speaker gender information & CDBN features
 Female, finer horizontal banding pattern in low freq.
 L1, L2 correspond to basis.
31th Aug 2017, Paper Seminar 26
High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (1)
 About bases data
 No. speakers : 168
 Sentenses per speaker : 10
 Total sentenses : 1680
 1. Speaker Identification Test
 10 Random trials
 Training : TIMIT data
 All data expressed as Spectrogram
 RAW, MFCC, CDBN L1, CDBN L2, CDBN L1+L2
 Simple summary statistics for each channel
 Evaluate features using standard supervised classifiers
 SVM(Sub Vector Machine), GDA(Gaussian Discriminant Analysis), KNN
(K-Nearest Neigbor classification)
31th Aug 2017, Paper Seminar 27
High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (2)
 Speaker Identification
31th Aug 2017, Paper Seminar 28
High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (3)
 2. Speaker Gender classification
 Randomly sampled training examples
 200 testing examples
 20 trials
31th Aug 2017, Paper Seminar 29
High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (4)
 3. Phone Classification
 39way phone classification accuracy
 Over 5 random trials
31th Aug 2017, Paper Seminar 30
High Performance Computing & Systems Lab
6. Music Classification (1)
 1. Genre classification
 1st and 2nd layer
 Music data from: ISMIR
 Bases : 300
 Filter length : 10
 Max-pooling ratio : 3
 Randomly sampled 3-second segment(Training or testing sample)
 Genre : 5-way(classical, electirc, jazz, pop and rock)
 20 random trials on each training samples
31th Aug 2017, Paper Seminar 31
High Performance Computing & Systems Lab
6. Music Classification (2)
 2. Artist classification
 1st and 2nd layer (same as genre classification)
 Music data from: ISMIR
 Bases : 300
 Filter length : 10
 Max-pooling ratio : 3
 Randomly sampled 3-second segment(Training or testing sample)
 Genre only classical music
 Only 4-way artist
 Over 20 random trials (in average)
31th Aug 2017, Paper Seminar 32
High Performance Computing & Systems Lab
6. Music Classification (2)
 2. Artist classification
31th Aug 2017, Paper Seminar 33
High Performance Computing & Systems Lab
7. Discussion
 Not suitable on Modern Speech
 Much larger than the TIMIT data set.
 This research’s target
 Restrict amount of the labeled data
 Remains interesting problem
 Deep learning to larger datasets
 More challenging tasks
31th Aug 2017, Paper Seminar 34
High Performance Computing & Systems Lab
8. Conclusion
 Applied CDBN to audio data
 Evaluate on various audio classification tasks
 Not using a large Amount of data
 This learned feature often equaled or surpassed MFCC
 (MFCC hand-tailored to audio data)
 Combining both, achieve higher classification accuracy
 L1 CDBN, high performance on multiple audio recognition tasks
 Hope Inspiring automatically learning deep feature
 In audio data
31th Aug 2017, Paper Seminar 35
High Performance Computing & Systems Lab
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
EuroIoTa
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximatePrograms
Mohid Nabil
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 

Was ist angesagt? (17)

20 26
20 26 20 26
20 26
 
A Review of Comparison Techniques of Image Steganography
A Review of Comparison Techniques of Image SteganographyA Review of Comparison Techniques of Image Steganography
A Review of Comparison Techniques of Image Steganography
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
 
G0210032039
G0210032039G0210032039
G0210032039
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
 
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT ConferenceDeep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
Improved LSB Steganograhy Technique for grayscale and RGB images
Improved LSB Steganograhy Technique for grayscale and RGB imagesImproved LSB Steganograhy Technique for grayscale and RGB images
Improved LSB Steganograhy Technique for grayscale and RGB images
 
Hk3312911294
Hk3312911294Hk3312911294
Hk3312911294
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximatePrograms
 
Convolutional neural networks deepa
Convolutional neural networks deepaConvolutional neural networks deepa
Convolutional neural networks deepa
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
AN ADAPTIVE PSEUDORANDOM STEGO-CRYPTO TECHNIQUE FOR DATA COMMUNICATION
AN ADAPTIVE PSEUDORANDOM STEGO-CRYPTO TECHNIQUE FOR DATA COMMUNICATIONAN ADAPTIVE PSEUDORANDOM STEGO-CRYPTO TECHNIQUE FOR DATA COMMUNICATION
AN ADAPTIVE PSEUDORANDOM STEGO-CRYPTO TECHNIQUE FOR DATA COMMUNICATION
 

Ähnlich wie [Chung il kim] 0829 thesis

International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
CSCJournals
 
deeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptxdeeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptx
JeetDesai14
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
NAVER Engineering
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 

Ähnlich wie [Chung il kim] 0829 thesis (20)

Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
 
deeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptxdeeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptx
 
Improving Isolated Bangla Compound Character Recognition Through Feature-map ...
Improving Isolated Bangla Compound Character Recognition Through Feature-map ...Improving Isolated Bangla Compound Character Recognition Through Feature-map ...
Improving Isolated Bangla Compound Character Recognition Through Feature-map ...
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Distribution Modelling and Analytics of Large Spectrum Data: Spectrum Occupan...
Distribution Modelling and Analytics of Large Spectrum Data: Spectrum Occupan...Distribution Modelling and Analytics of Large Spectrum Data: Spectrum Occupan...
Distribution Modelling and Analytics of Large Spectrum Data: Spectrum Occupan...
 

Kürzlich hochgeladen

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Kürzlich hochgeladen (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

[Chung il kim] 0829 thesis

  • 1. HIgh Performance Computing & Systems LAB Unsupervised feature learning for audio classif- ication using convolutional deep belief networks Honglak Lee Yan Largman Peter Pham Andrew Y. Ng Thesis Presenter Chung il Kim Computer Science Departement, Stanford University Stanford, CA 94305 Advances in Neural Information Processing Systems 22 (NIPS 2009)
  • 2. High Performance Computing & Systems Lab Contents  Abstract & Introduction  Theory & Algorithm  Convolutional Deep Belief Networks(CDBN)  on Shift Invariant Sparse Coding(SISC)  Unsupervised Feature Learning  Application to Audio Recognition Tasks  Speech Recognition  Music Classification  Discussion and Conclusion 31th Aug 2017, Paper Seminar 2
  • 3. High Performance Computing & Systems Lab 1. Abstract & Introduction (1)  Abstract  Deep learning approaches  Build hieratical representations on unlabeled data  Focusing on unlabeled auditory data  Using Convolutional deep belief network(CDBN)  Evaluate auditory data on various audio classification tasks  RAW  MFCC  CDBN(L1, L2) 31th Aug 2017, Paper Seminar 3
  • 4. High Performance Computing & Systems Lab 1. Abstract & Introduction (2)  Introduction  Issue of Audio data recognition  Toward high dimension and complex  Previous work[1, 2]  sparse coding leads to filters correspond to cochlear filters  Related work[3]  Efficient sparse coding algorithm for audio classification tasks – Feature sign search algorithm(FS-EXACT, FS-Window) – Lagrangian of DFT 31th Aug 2017, Paper Seminar 4 [1] E. C. Smith and M. S. Lewicki. Efficient auditory coding. Nature, 439:978–982, 2006. [2] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. [3] R. Grosse, R. Raina, H. Kwong, and A.Y. Ng. Shift-invariant sparse codig for audio classification. In UAI, 2007.
  • 5. High Performance Computing & Systems Lab 1. Abstract & Introduction (3)  Introduction  The limit of those methods  Applied to learn relatively shallow  1-layer representations  Many promising approached [4, 5, 6, 7, 8] usually Image  Fast  With energy-based model  Greedy  Empirical evaluation  But Deep learning not applied to auditory data 31th Aug 2017, Paper Seminar 5 [4]G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006. [5]M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Efficient learning of sparse representations with an energy-based model. In NIPS, 2006. [6]Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, 2006. [7]H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In ICML, 2007. [8]H. Lee, C. Ekanadham, and A. Y. Ng. Sparse deep belief network model for visual area V2. In NIPS, 2008.
  • 6. High Performance Computing & Systems Lab 1. Abstract & Introduction (4)  Introduction  Deep belief network  Generative probabilitistic model – Composed 1 visible layer, and many hidden layer  Well-trained using ‘Greedy Layerwise Training’  Convolutional deep belief network(CDBN) [9]  Also trained as greedy, bottom-up fashion  Good performance in several visual recognition tasks  CDBN on unlabeled audio data  evaluate the learned feature representations – several audio classification tasks 31th Aug 2017, Paper Seminar 6 [9]H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, 2009.
  • 7. High Performance Computing & Systems Lab 2. Convolutional Deep Belief Network (1)  Convolutional Restricted Boltzmann Machines(CRBMs)  CDBN, consist of CRBMs block model 31th Aug 2017, Paper Seminar 7 <Figure 1> Image of Convolution Deep Belief Networks 1. Set partial area 2. Get detection through filter (highly overcomplete, sparse needed) 3. Pooling(usually max-pooling) 4. Greedy layerwise traing -more than 1 5. Get pattern of visible data.
  • 8. High Performance Computing & Systems Lab 2. Convolutional Deep Belief Network (2)  Convolutional Restricted Boltzmann Machines(CRBMs)  Extension of ‘regular’ Restricted Boltzmann Machines(RBMs)  Decrease related dimension  makes sparsity problem 31th Aug 2017, Paper Seminar 8 <Figure2> dimension down, sparse
  • 9. High Performance Computing & Systems Lab 2. Convolutional Deep Belief Network (3)  CDBNs  Energy function  CRBMs’ Probability distribution referred using energy(next page) 31th Aug 2017, Paper Seminar 9 <Formula 1> Energy function of CRBMs in binary(up) and real-valued(down) nv : No. dimensional array of binary unit nw : No. dimensional filter array K : number of filter nH : No. dimensional array of hidden unit (nv – nw + 1) bk : shared bias for each group c : shared bias for visible units
  • 10. High Performance Computing & Systems Lab 2. Convolutional Deep Belief Network (4)  CDBNs  Probability distribution  CRBMs’ Probability distribution referred using energy 31th Aug 2017, Paper Seminar 10 <Formula 2> joint and conditional probability distributions *v : valid convolution *f : full convolution
  • 11. High Performance Computing & Systems Lab 2. Convolutional Deep Belief Network (5)  Pooling layer  Shrink data map  In classification, usually using max-pooling 31th Aug 2017, Paper Seminar 11 0 0.5 0.5 0.4 0.7 0.1 0.2 0.4 0.9 0.3 0.7 0.5 0.5 0.8 0.2 0 0.7 0.5 0.5 0.9 0.7 0.7 0.9 0.8 0.7 <Picture3> Image of max-pooling
  • 12. High Performance Computing & Systems Lab 2. Convolutional Deep Belief Network (6)  Process of CDBNs  Set partial area  Get detection through filter (highly overcomplete, sparse needed)  Pooling(usually max-pooling)  Greedy layerwise Training more than 1  Get pattern of visible data. 31th Aug 2017, Paper Seminar 12 <Picture4> Process of CDBNs https://deeplearning4j.org/kr/convolutionnets
  • 13. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (1)  Sparsity  Typical CRBM is highly overcomplete  Sparsity penalty term added to log-likelihood  To solve overfitting problem in deep neural networks  Avoid full-connectivity  This algorithm uses LASSO(the Least Absolute Shrinkage and Selection Operator) 31th Aug 2017, Paper Seminar 13 <Formula 4> the objective of sparsity <Formula 3> the training objective
  • 14. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (2)  Two algorithm for solve SISC in audio data  Coefficient, Figure-Sign Search algorithm  Efficiency for short signals (x->low-dimensional)  Not good for over 1minute 31th Aug 2017, Paper Seminar 14 <Pseudo 1> Feature-sign search algorithm 1 R. Grosse, R. Raina, H. Kwong, and A.Y. Ng. Shift-invariant sparse coding for audio classification. In UAI, 2007
  • 15. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (3)  Two algorithm for solve SISC in audio data  Bases, using Lagrangian and DFT  1st, Discrete Fourier Transform – signal decompose  2nd, Set Lagrangian – To solve optimization  3rd, using Newton’s method(in this Paper) 31th Aug 2017, Paper Seminar 15
  • 16. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (4)  Approach Tasks  Using LASSO  Partial differential equation  Bias ↑, variance ↓ (trade off) 31th Aug 2017, Paper Seminar 16 Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/ <Pseudo 2> Feature-sign search algorithm 2
  • 17. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (5)  Approach Tasks  By resulting ‘unconstrained QP’  Compute analytical solution  This is subvector of x  Using discrete line search(LS), update x with to the point.  Collect value which coefficient changes sign, and update the lowest one 31th Aug 2017, Paper Seminar 17 Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/ <Pseudo 3> Feature-sign search algorithm 2
  • 18. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (6)  Approach Tasks  Last matching those condition, and repeat it. 31th Aug 2017, Paper Seminar 18 Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/ <Pseudo 2> Feature-sign search algorithm 2
  • 19. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (7)  Result of FS search(learning speed) 31th Aug 2017, Paper Seminar 19
  • 20. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (8)  Result of FS search(Speech)  Speech data (TIMIT)  1 second long, 32 speech signal with basis function  Filter  SISC(with FS), MFCC(Mel Frequency Cepstral Coefficient), RAW 31th Aug 2017, Paper Seminar 20
  • 21. High Performance Computing & Systems Lab 3. on Shift Invariant Sparse Coding (9)  Result of FS search(Musical genre)  2-second, 5-way musical genre song.  Filter  SISC(with FS), TC(Tzanetakis & Cook)  MFCC(Mel Frequency Cepstrum Coefficient), RAW 31th Aug 2017, Paper Seminar 21
  • 22. High Performance Computing & Systems Lab 4. Unsupervised Feature Learning (1)  Description of TIMIT Data  For researching speech recognition systems  American English  In This Research  Spectrogram form  Window size : 20ms  Overlaps : 10ms  Using PCA-Whitening(with 80 components) – To reduce the dimensionality  Research Contents  Phonemes  Speaker gender 31th Aug 2017, Paper Seminar 22
  • 23. High Performance Computing & Systems Lab 4. Unsupervised Feature Learning (2)  Layer and Training Setting  1st layer  300 bases  Filter length(nw) : 6  Max-pooling ratio : 3 31th Aug 2017, Paper Seminar 23  2nd layer  300 bases (output of 1st layer)  Filter length : 6  Max-pooling ratio : 3
  • 24. High Performance Computing & Systems Lab 4. Unsupervised Feature Learning (3)  Phonemes and the CDBN features 31th Aug 2017, Paper Seminar 24  Analysis  Vowel(“ah”, “oy”)  Prominent horizontal bands  Lower freq.  “oy”  Upward slanting pattern
  • 25. High Performance Computing & Systems Lab 4. Unsupervised Feature Learning (4)  Phonemes and the CDBN features 31th Aug 2017, Paper Seminar 25  Analysis  Fricatives(“s”)  Energy in the high freq.  “el”  High intensity in low freq.  Low intensity follows in high freq.
  • 26. High Performance Computing & Systems Lab 4. Unsupervised Feature Learning (5)  Speaker gender information & CDBN features  Female, finer horizontal banding pattern in low freq.  L1, L2 correspond to basis. 31th Aug 2017, Paper Seminar 26
  • 27. High Performance Computing & Systems Lab 5. Speech Recognition(Speaker ID) (1)  About bases data  No. speakers : 168  Sentenses per speaker : 10  Total sentenses : 1680  1. Speaker Identification Test  10 Random trials  Training : TIMIT data  All data expressed as Spectrogram  RAW, MFCC, CDBN L1, CDBN L2, CDBN L1+L2  Simple summary statistics for each channel  Evaluate features using standard supervised classifiers  SVM(Sub Vector Machine), GDA(Gaussian Discriminant Analysis), KNN (K-Nearest Neigbor classification) 31th Aug 2017, Paper Seminar 27
  • 28. High Performance Computing & Systems Lab 5. Speech Recognition(Speaker ID) (2)  Speaker Identification 31th Aug 2017, Paper Seminar 28
  • 29. High Performance Computing & Systems Lab 5. Speech Recognition(Speaker ID) (3)  2. Speaker Gender classification  Randomly sampled training examples  200 testing examples  20 trials 31th Aug 2017, Paper Seminar 29
  • 30. High Performance Computing & Systems Lab 5. Speech Recognition(Speaker ID) (4)  3. Phone Classification  39way phone classification accuracy  Over 5 random trials 31th Aug 2017, Paper Seminar 30
  • 31. High Performance Computing & Systems Lab 6. Music Classification (1)  1. Genre classification  1st and 2nd layer  Music data from: ISMIR  Bases : 300  Filter length : 10  Max-pooling ratio : 3  Randomly sampled 3-second segment(Training or testing sample)  Genre : 5-way(classical, electirc, jazz, pop and rock)  20 random trials on each training samples 31th Aug 2017, Paper Seminar 31
  • 32. High Performance Computing & Systems Lab 6. Music Classification (2)  2. Artist classification  1st and 2nd layer (same as genre classification)  Music data from: ISMIR  Bases : 300  Filter length : 10  Max-pooling ratio : 3  Randomly sampled 3-second segment(Training or testing sample)  Genre only classical music  Only 4-way artist  Over 20 random trials (in average) 31th Aug 2017, Paper Seminar 32
  • 33. High Performance Computing & Systems Lab 6. Music Classification (2)  2. Artist classification 31th Aug 2017, Paper Seminar 33
  • 34. High Performance Computing & Systems Lab 7. Discussion  Not suitable on Modern Speech  Much larger than the TIMIT data set.  This research’s target  Restrict amount of the labeled data  Remains interesting problem  Deep learning to larger datasets  More challenging tasks 31th Aug 2017, Paper Seminar 34
  • 35. High Performance Computing & Systems Lab 8. Conclusion  Applied CDBN to audio data  Evaluate on various audio classification tasks  Not using a large Amount of data  This learned feature often equaled or surpassed MFCC  (MFCC hand-tailored to audio data)  Combining both, achieve higher classification accuracy  L1 CDBN, high performance on multiple audio recognition tasks  Hope Inspiring automatically learning deep feature  In audio data 31th Aug 2017, Paper Seminar 35
  • 36. High Performance Computing & Systems Lab Thank you