Graphical visualization of musical emotions

GRAPHICAL VISUALIZATION
OF MUSICAL EMOTIONS
Presented by:
Pranay Prasoon
MT/SC/10002/2012
M. TECH. SCIENTIFIC COMPUTING
Under the guidance of:
Dr. Saubhik Chakraborty
Associate Professor, Dept. of Applied Mathematics

Hindustani Classical Music(ICM)
 It is rich in both emotional content and musical content.
 There are seven notes in ICM i.e. Sa re ga ma pa dha ni.
 The base if Indian classical music is Raga.
RAGA
 A raga is simply a group of notes .
 Each raga invokes a certain mood/ emotion.
 Different sequence of notes represent different raga.
Example:-
Bageshri(sad) : ni sa ga ma dha ni sa
Bhupali(happy): sa re ga pa dha sa

Music relation to mathematics
 Mathematics is "the basis of sound" and sound is the
basic of musical aspects.
 Some basic terms with which we can relate music to
the mathematics
1. Sound
 Music is sound that is organized in a meaningful way
with rhythm, melody, and harmony. These are
consider as the three dimensions of music.
 Sound is the form of energy

Music and mathematics(contd.)
2. Frequency
 The number of times the sound wave completes a cycle of
oscillation in one second is called its frequency. Frequency is
measured in cycles per second or Hertz (Hz).
 Higher the frequency , higher is the pitch value. And vice
versa.
3. Amplitude
 Amplitude is the size of the vibration, and this determines
how loud the sound is.Measured in decibels and the range for
human ear is (2-130 db)

4. Pitch Scale
 In India Classical Music each note pitch value is
dependent upon the previous note pitch value.
S=1 P=1.5
R=1.125 D=1.6875
G=1.1265625 N=1.8584375
M=1.423828

 Time
 Tempo: speed of Beats
 Rhythm : relative time durations of notes determine
in musical language is called rhythm.

Literature survey
 We carefully studied the papers from international publishers to understand
the concept of India classical music, depth study of ragas of Indian classical
music, neural network approach in music, pattern recognition, emotion
recognition and features of musical clips.
 Soltani . K, Ainon R.N (2007) and Yongjin Wang, Ling Guan(2008) author
discussed deeply about the emotion recognition in speech signals.
 Coutinho, E. &Cangelosi, A. (2010) : In this Book the author discusses
about a model capacity of prediction of human emotion while listening to
music. In this he added both psychoacoustic and physiological features for
the prediction of emotion.
 Keshi Dai, Harriet J. Fell, and Joel MacAuslan. (2012) : The author shows
the good use of audio features to recognize emotion and the different
combination of emotion to recognize and find the accuracy of each
combination of emotions.

Literature survey(contd.)
 Zhen-GuoChe, Tzu-An Chiang and Zhen-Hua Che. (2010) : In
this paper the author discusses the advantage and
characteristic of the genetic algorithm and back propagation
neural network to train a feed forward neural network to cope
with weight adjustment problem.
 We found that very less models have been developed for raga
identification in Indian classical music.
 From our survey we find that neural networks gives high
accuracy for identification when uses with multiple features.

Characterization of the Problem
 Our aim is to better explain and explore the relationship
between musical features and emotion.
 The main problem initially identified was to extract musical
features for each audio file.
 Another problem was to develop a model for recognition of
emotion from the musical clips.
 Then to graphically visualize the performance of the process
of recognition.

Objective of work
1. The main objective of the work is to spread the light on the
emotion recognition in Indian classical music.
2. To understand the dependencies of features of music in
recognition process.
3. To understand the ANN concept for recognition process.
4. To analyze the error in each recognition process.
5. Visualize the performance of model. (for training, validation
and testing)

Research methodology
 Ground truth data with their features will lead the recognition
process with ANN.
 Artificial neural network: ANN are computational models
inspired by the nervous system of brain which is capable of
machine learning as well as pattern recognition.
 ANN when use with features of audio clips will be used for
emotion classification.

Data Selection
Serial No. Target
emotions
Ragas in audio
clips
Number of
audio clips
Total
H Bhupali 20
A Bihag 28
1 P Desh 30 98
P Marwa 20
Y
Bageshree 24
S Bhairavi 19
2 A Bhimpalashi 20 98
D Deskar 15
Todi 20
Total 2 9 196 196

Pre-processing
 Manually divide all samples into two parts named happy and
sad.
 Convert our dataset to standard wave format 44100 Hz.
 The audio clips are we take is of 30 second each.

 Feature extraction involves analysis of speech signal.
 We extracted 13 features for our work:
1. Roll Off 8. pulse clarity
2. Spread 9. mode
3. Zero cross 10. Entropy
4. Centroid 11. Brightness
5. RMS Energy 12. Probability of increment
6. Low Energy of two successive pitch
7. Event density 13. Probability of decrement
of two successive pitch

Root Mean Square Energy
 Energy of the signal x can be computed simply by taking the root average
of the square of the amplitude, called root-mean-square (RMS):
Formula:
 Happy labelled son have more RMS energy as compared to sad labelled
song.

Root Mean Square Energy(contd.)
S.N Happy Sad
1 .20566 .095603
2 .16613 .07754

Low energy
 It is defined as the percentage of analysis windows that have
less RMS energy than the average RMS energy across the
texture window.
 As an example, vocal music with silences will have large low-
energy value while continuous strings will have small low-
energy value

Low energy(Contd.)
S.N HAPPY S.N SAD
1 .341212 1 .51122
2 .59215 2 .51104
3 .46183 3 .48924
4 .48755 4 .51524

Entropy
 The relation of entropy is with the emotion type of surprise.
 If the probability is less for the occurrence of any event than
the expectation is more. And then the surprise element is
more.
 The entropy system is based on this equation.

Entropy(contd.)
S.N Happy S.N sad
1 .85721 1 .83682
2 .85596 2 .83643
3 .85319 3 .83185
4 .85982 4 .83892

Zero Cross Rating
 The zero-crossing rate is the rate of sign-changes along
a signal.
 Where S is a signal of length T and the indicator function |
{} is 1 if its argument is true and 0 otherwise.

Zero cross Rating(contd.)
S.N Happy Sad
1 789.1929 878.5048
2 736.8308 873.3451
3 945.9796 776.6267
4 1045.694 855.9022

Pitch
 Pitch is a perceptual property that allows the ordering
of sounds on a frequency related scale.
 Pitches are compared as "higher" and "lower”.
 Pitches are usually quantified as frequencies in cycles per
second.
 With the use of pitch(pitch contour) we find two Landmark
features for our work. Which are Probability of increment in-
between two successive pitches and probability of decrement
in between two successive pitches.

Pitch(contd.)
S.N Happy_Prob_inc Happy_Prob_dec Sad_prob_ind Sad_Prob_dec
1
0.4403 0.4283 0.4137 0.3673
2
0.4295 0.4815 0.4133 0.391
3
0.406 0.4233 0.3317 0.3627
4
0.4147 0.4487 0.3881 0.3848

Event Density
 It estimates number of note onset per second.
S.N Happy Sad
1 2.3038 1.5359
2 2.1369 2.0367
3 3.3723 1.5359
4 2.6711 0.70117

Centroid
 The centroid is defined as the center of gravity of the
spectrum
 It is calculated as the mean of the frequencies present in the
signal, with their magnitudes as the weights.
 x(n) represents the weighted frequency value,
and f(n) represents the centre frequency.

Centroid(contd.)
S.N Happy Sad
1
1907.3599 1857.3502
2
2353.4706 1936.1145
3
2307.5883 1748.8305
4
1995.0818 1756.3829

Mode
 It estimates the modality, i.e. Major vs. Minor.
 Mode return value between -1 to +1 .
 When closer to +1, the more major the given excerpts is
predicted to be, the closer the value is to -1, the more minor
the excerpt might be.

Mode(contd.)
S.N Happy Sad
1
-0.18676 0.14946
2
-0.06964 -0.0049425
4
-0.16293 -0.051353
4
-0.14412 0.061022

Pulse clarity
 Pulse clarity is considered as a high-level musical dimension
that conveys how easily in a given musical piece, listeners can
Understand the underlying rhythmic.
S.N Happy S.N Sad
1 .40913 1 .11393
2 .30407 2 .31914
3 .18223 3 .18962
4 .14884 4 .18083

Roll Off
 Roll-off is the steepness of a transmission
function with frequency.
 The roll-off refers to the rate at which the filter attenuates the
input frequency after the cut-off frequency point.
S.N Happy S.N Sad
1 4091.8974 1 3543.9783
2 4236.4054 2 3168.4931
3 4525.5054 3 4038.4008
4 4271.0604 4 4007.0263

Classification
 Classify the data in two category.
 As input we take list of features.
 Classifier Used: ANN
 Three process
1. Training 2. Validation 3. Testing

Artificial neural Network
 It is a computational model inspired by the human nervous
system.
 ANN generally presented as interconnected neurons, which
can compute values from input.
 ANN can be defined based on three characteristic.
1. Architecture 2. Learning mechanism 3. activation function
 Every system basically have 3 layer.
1. Input 2. Hidden layer 3. output

ANN(contd.)
 Architecture:
 Directed graph : each edge is assign an orientation.
 Classification using: Multilayer Feed forward NN.
 Learning method: Supervised Learning.
 Algorithm used: back propagation.

ANN(contd.)
 Steps of Back propagation algorithm.
1. Normalize all I/P values in between 0 to 1.
2. Number of hidden node= No of I/P node × No of O/P Node
2
3. V= weight between I/P and Hidden Nodes.
W= weight between hidden and O/P Nodes.
( weights are initialized to random values between -1 to +1).
4. Input and Out put of Input layer:
{O}I = {I}I

ANN(contd.)
 Input to the hidden layer is computed by multiplying I/P
values to their corresponding weights.
{I}H = [V ] {O}I
 Output of Hidden layer is computed using sigmoidal function.

ANN(contd.)
 Input to output layer are computed by multiplying
corresponding weights.
{I}O = [W] {O}H
 output of output layer is calculated as:

ANN(contd.)
 Error can be calculated as:
 {d} which is local gradient of node is calculated as:

ANN(Contd.)
 [y] matrix is calculated :
[Y] = {O}H × d
 Change in Weight:

ANN(contd.)
 Error in Hidden layer :
 And the new d is :
 Calculate [x] matrix:
[x]= ×

ANN(contd.)
 Change in weight of input-hidden layer:
 Updated weights for next training:
 The process will be repeated again and again until the error rate will
reduce to very small.

RESULT and STUDY
 Training Data: 70% of the total (138)
 Validation data: 15 % of the total (39)
 Testing data : 15% of total(39)

Recognition of two emotional states
 1st experiment:
 13 features taken as input.
 Testing data: 39
 Correctly classified for happy = 15 out of 17
 Correctly classified for sad = 21 out of 22
Output
Input Happy Sad
Happy 15 2
Sad 21 1

Performance analysing
 Performance for 10 training and testing process. And we see that there is
little variation in accuracy in our model.
Test Number Performance Accuracy with LF Performance accuracy without LF
1. 96.687 62.646
2. 96.947 73.454
3. 98.159 65.09
4. 97.31 62.112
5. 98.054 63.878
6. 96.004 63.003
7. 96.808 68.433
8. 98.691 65.06
9. 82.258 70.112
10. 97.243 64.123

conclusion
 We proposed a new model for automatic recognition of musical emotion which is
based on Artificial neural network .
 We used multilayer feed forward neural network for classification and the
algorithm used is Back propagation.
 A total of 13 features were extracted from each audio samples. We proposed two
new features ( probability of increment in between two successive pitch and
probability of decrement in between two successive pitches). Classification process
carried out using neural network toolbox in matlab.
 A total of 10 experiments were done and each time the accuracy of recognition of
emotion was more the 90%. When we classify the model without our land mark
features the accuracy is reduced by 30%. The average accuracy we archived for
our model is 95.8161
features Without landmark feature With landmark feature
Accuracy 65.7911 95.8161

scope
 Study of work is useful for them who have
difficulties in understanding ragas in India classical
music.
 The study will be helpful in psychology science to
study the changes in brain when listen to Indian
classical music.
 Our study is useful in medical science.

Future work
 we plan to develop an automatic emotion recognizer for
Indian classical music with more emotions category for those
people who have difficulties in understanding and identifying
emotion in the Indian classical music.
 We planed to develop model in which we will add some more
physiological features like heart beat rate, skin temperature
and brain signals. We feel that when we include physiological
features the accuracy will increase more for the system.

Publication
 P. Prasoon and S. Chakraborty, “Raga Analysis
using Artificial Neural Network” - Communicated to
Computational Music Science (Book Series), Springer
as a research monograph.

Reference
[1] A. Srinivasan (2011). “Speech Recognition Using Hidden Markov
Model”. Applied Mathematical Sciences, Vol. 5, 2011, no. 79, 3943
– 3948
[2] BjörnSchuller, Manfred Lang, Gerhard Rigoll (2002): "Multimodal
Emotion Recognition in Audiovisual Communication", Proc. ICME
2002, 3rd International Conference on Multimedia and Expo, IEEE,
vol. 1, pp. 745-748, Lausanne, Switzerland,
[3] Coutinho, E. &Cangelosi, A. (2010). “A Neural Network Model for
the Prediction of Musical Emotions.”In S. Nefti-Meziani& J.G. Grey
(Ed.). Advances in Cognitive Systems (pp. 331-368). London: IET
Publisher. ISBN: 978-1849190756

Reference(contd.)
[4] Daniela and Bernd Willimek. (2013). Music and Emotions Research
on the Theory of Musical Equilibration (die Strebetendenz-Theorie).
Copyright © 2011 Daniela und Bernd Willimek
[5] Deryaozkan, Stefan scherer and Louis-philippemorency. (2013)
“Step-wise emotion recognition using concatenated-HMM”, IEEE
Transactions on Multimedia 15(2): 326-338
[6] Gaurav Pandey, chaitanya Mishra and Paul Ipe, “TANSEN: A
system for automatic raga Identification “, (2003). PP.1350-1363.
Indian International conference on AI.
[7]Jack H. David Jr.(1995) , “The Mathematics of Music”. Spring,Math
1513.5097

Reference(contd.)
[8] Keshi Dai, Harriet J. Fell, and Joel MacAuslan. (2012) “Recognizing emotion in
speech using neural network”
[9]Mohammad abd- alrahman mahmaoud Abushariah, Raja Noor
Ainon, RoziatiZainuddin, MoustafaElshafei, Othman OmranKhalifa: (2012) “ Arabic
speaker-independent continuous automatic speech recognition based on a
phonetically rich and balanced speech corpus” . Int. Arab J. Inf. Technol. 9(1): 84-93
[10] O. Lartillot and P. Toiviainen, “A matlab toolbox for musical feature extraction from
audio,” in Proc. Digital Audio Effects (DAFx-07), Bordeaux, France, Sep. 10-15
2007
[11] Sandeep bagchee, (1998) “Nad: Understanding Raga music” Eshwar, 1st edition.
ISBN-13: 978-8186982075

Reference(contd.)
[12] www.wekepedia.org
[13] www.paragchordia.com
[14] www.swarganga.org
[15] www.mathworks.in
[16] www.shadjamadhyam.com
[17] www.22shruti.com
[18] www.knowyourraga.com
[19]www.skeptic.skepticgeek.com

Reference(contd.)
[20] Yading Song, Simon Dixon, Marcus Pearce (2012) .”EVALUATION OF
MUSICAL FEATURES FOR EMOTION CLASSIFICATION”.13th
international society for Music Information Retrieval Conference (ISMIR).
[21]Yongjin Wang, Ling Guan(2008) : Recognizing Human Emotional State
From Audiovisual Signals. IEEE Transactions on Multimedia 10(4): 659-
668
[22] Zhen-GuoChe, Tzu-An Chiang and Zhen-Hua Che. (2010) .“Feed
forward neural networks training: A comparison between genetic algorithm
and back propagation learning algorithm”. International journal of
innovation and computing, information and control .volume 7

Graphical visualization of musical emotions

Recommended

Recommended

More Related Content

Similar to Graphical visualization of musical emotions

Similar to Graphical visualization of musical emotions (20)

Recently uploaded

Recently uploaded (20)

Graphical visualization of musical emotions