SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Landmark Detection in Hindustani
Music Melodies
Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1
sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu
1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
3Indian Institute of Technology Bombay, Mumbai, India
SMC-ICMC 2014, Athens, Greece
5
Indian Art Music
§  Hindustani music (North Indian music)
§  Carnatic music
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
S	 r	 R	 g	 G	 m	 M	 P	 d	 D	 n	 N	
Do	 Re	 Mi	 Fa	 Sol	 La	 Ti	
Svaras
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
S	 r	 R	 g	 G	 m	 M	 P	 d	 D	 n	 N	
Do	 Re	 Mi	 Fa	 Sol	 La	 Ti	
Svaras
Bhairavi Thaat
S	 r	 g	 m	 P	 d	 n
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
S	 r	 R	 g	 G	 m	 M	 P	 d	 D	 n	 N	
Do	 Re	 Mi	 Fa	 Sol	 La	 Ti	
Svaras
Bhairavi Thaat
*Nyās Svar (Rāg Bilaskhani todi)
S	 r*	 g*	 m*	 P	 d*	 n	
S	 r	 g	 m	 P	 d	 n	
Nyās translates to home/residence
§  Example
Melodic Landmark: Nyās Occurrences
178 180 182 184 186 188 190
−200
0
200
400
600
800
Time (s)
F0frequency(cents)
N1 N5N4N3N2
A.	K.	Dey,	Nyāsa	in		rāga:	the	pleasant	pause	in	Hindustani	music.	Kanishka	Publishers,	Distributors,	
2008.
Goal and Motivation
§  Methodology for detecting nyās occurrences
Goal and Motivation
§  Methodology for detecting nyās occurrences
N	 N	 N	 N	 N
Goal and Motivation
§  Methodology for detecting nyās occurrences
§  Motivation
§  Melodic motif discovery [Ross and Rao 2012]
§  Melodic segmentation
§  Music transcription
N	 N	 N	 N	 N	
J.	C.	Ross	and	P.	Rao,“DetecKon	of	raga-characterisKc	phrases	from	Hindustani	classical	music	
audio,”	in	Proc.	of	2nd	CompMusic	Workshop,	2012,	pp.	133–	138.
Methodology: Block Diagram
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Block	diagram	of	the	proposed	methodology
Methodology: Pred. Pitch Estimation
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Methodology: Segmentation
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Methodology: Feature Extraction
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
Methodology: Segment Classification
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
Methodology: Segment Classification
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Nyās
Non nyās
Nyās
Non nyās
Non nyās
Pred. Pitch Estimation and Representation
§  Predominant pitch estimation
§  Method by Salamon and Gómez (2012)
§  Favorable results in MIREX’11
§  Tonic Normalization
§  Pitch values converted from Hertz to Cents
§  Multi-pitch approach by Gulati et al. (2014)
J.	Salamon	and	E.	Go	́mez,	“Melody	extracKon	from	polyphonic	music	signals	using	pitch	contour	charac-	terisKcs,”	IEEE	TransacKons	on	
Audio,	Speech,	and	Language	Processing,	vol.	20,	no.	6,	pp.	1759–1770,	2012.	
S.	GulaK,	A.	Bellur,	J.	Salamon,	H.	Ranjani,	V.	Ishwar,	H.	A.	Murthy,	and	X.	Serra,	“AutomaKc	Tonic	IdenK-	ficaKon	in	Indian	Art	Music:	
Approaches	and	Evalua-	Kon,”	Journal	of	New	Music	Research,	vol.	43,	no.	01,	pp.	55–73,	2014.
Melody Segmentation
1 2 3 4 5 6 7 8 9 10
850
900
950
1000
1050
1100
1150
T1
Sn
Sn-1
Sn+1
ε
ρ1T2 T3
ρ2
T4 T7 T8 T9
T5 T6
Nyās segment
Time (s)
FundamentalFrequency(cents)
§  Baseline: Piecewise linear
segmentation (PLS)
E.	Keogh,	S.	Chu,	D.	Hart,	and	M.	Pazzani,	“SegmenKng	Kme	series:	A	survey	and	novel	approach,”	Data	
Mining	in	Time	Series	Databases,	vol.	57,	pp.	1–22,	2004.
Feature Extraction
Feature Extraction
§  Local (9 features)
Feature Extraction
§  Local (9 features)
§  Contextual (24 features)
Feature Extraction
§  Local (9 features)
§  Contextual (24 features)
§  Local + Contextual (33 features)
Feature Extraction: Local
§  Segment Length
l
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
μ	σ
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
d1	 d2	 d3	 d4	 d5
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
§  μ and σ of the peak and valley amplitudes
p1	
p2	
p3	
p4	
p5	
p6
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
§  μ and σ of the peak and valley amplitudes
§  Temporal centroid (length normalized)
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
§  μ and σ of the peak and valley amplitudes
§  Temporal centroid (length normalized)
§  Binary flatness measure
Flat	or	non-flat
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
§  Time difference from the succeeding and preceding
breath pauses
d1	 d2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
§  Time difference from the succeeding and
proceeding unvoiced regions
§  Local features of neighboring segments (9*2=18)
Breath	
Pause	
Breath	
Pause
Segment Classification
§  Class: Nyās and Non-nyās
§  Classifiers:
§  Trees (min_sample_split=10)
§  K nearest neighbors (n_neighbors=5)
§  Naive bayes (fit_prior=False)
§  Logistic regression (class_weight=‘auto’)
§  Support vector machines (RBF)(class weight=‘auto’)
§  Testing methodology
§  Cross-fold validation
§  Software: Scikit-learn, version 0.14.1
F.	Pedregosa,	G.	Varoquaux,	A.	Gramfort,	V.	Michel,	B.	Thirion,	O.	Grisel,	M.	Blondel,	P.	Prejenhofer,	R.	Weiss,	V.	Dubourg,	J.	Vanderplas,	A.	
Passos,	D.	Cournapeau,	M.	Brucher,	M.	Perrot,	and	E.	Duches-	nay,	“Scikit-learn:	machine	learning	in	Python,”	Journal	of	Machine	Learning	
Research,	vol.	12,	pp.	2825–	2830,	2011
Evaluation: Dataset
§  Audio:
§  Number of recordings: 20 Ālap vocal pieces
§  Duration of recordings: 1.5 hours
§  Number of artists: 8
§  Number of rāgs: 16
§  Type of audio: 15 polyphonic commercial recordings, 5
in-house monophonic recordings**
§  Annotations: Musician with > 15 years of training
§  Statistics:
§  1257 nyās segments
§  150 ms to16.7 s
§  mean 2.46 s, median 1.47 s.
**Openly	available	under	CC	license	in	freesound.org
Evaluation: Measures
§  Boundary annotations (F-scores)
§  Hit rate
§  Allowed deviation: 100 ms
§  Label annotations (F-scores):
§  Pairwise frame clustering method [Levy and Sandler
2008]
§  Statistical significance: Mann-Whitney U test
(p=0.05)
§  Multiple comparison: Holm-Bonferroni method
M.	Levy	and	M.	Sandler,	“Structural	segmentaKon	of	musical	audio	by	constrained	clustering,”	IEEE	TransacKons	on	
Audio,	Speech,	and	Language	Processing,	vol.	16,	no.	2,	pp.	318–326,	2008.	
H.	B.	Mann	and	D.	R.	Whitney,	“On	a	test	of	whether	one	of	two	random	variables	is	stochasKcally	larger	than	the	
other,”	The	annals	of	mathemaKcal	staKsKcs,	vol.	18,	no.	1,	pp.	50–60,	1947.	
S.	Holm,	“A	simple	sequenKally	rejecKve	mulKple	test	procedure,”	Scandinavian	journal	of	staKsKcs,	pp.	65–	70,	1979.
Evaluation: Baseline Approach
§  DTW based kNN classification (k=5)
§  Frequently used for time series classification
§  Random baselines
§  Randomly planting boundaries
§  Evenly planting boundaries at every 100 ms *
§  Ground truth boundaries, randomly assign class labels
X.	Xi,	E.	J.	Keogh,	C.	R.	Shelton,	L.	Wei,	and	C.	A.	Ratanamahatana,	“Fast	Kme	series	classificaKon	using	numerosity	
reducKon,”	in	Proc.	of	the	Int.	Conf.	on	Ma-	chine	Learning,	2006,	pp.	1033–1040.	
X.	Wang,	A.	Mueen,	H.	Ding,	G.	Trajcevski,	P.	Scheuermann,	and	E.	J.	Keogh,	“Experimental	com-	parison	of	
representaKon	methods	and	distance	mea-	sures	for	Kme	series	data,”	Data	Mining	and	Knowl-	edge	Discovery,	
vol.	26,	no.	2,	pp.	275–309,	2013.
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Label Annotation
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
A
L 0.553 0.685 0.723 0.621 0.727 0.722
C 0.251 0.639 0.631 0.690 0.688 0.674
L+C 0.389 0.694 0.693 0.708 0.722 0.706
B
L 0.546 0.708 0.754 0.714 0.749 0.758
C 0.281 0.671 0.611 0.697 0.689 0.697
L+C 0.332 0.672 0.710 0.730 0.743 0.731
Table 2. F-scores for ny¯as and non-ny¯as label annotations
task using PLS method (A) and the proposed segmenta-
tion method (B). Results are shown for different classifiers
(Tree, KNN, NB, LR, SVM) and local (L), contextual (C)
and local together with contextual (L+C) features. DTW is
the baseline method used for comparison. The best random
baseline F-score is 0.153 obtained using RB2.
mentation method consistently performs better than PLS.
However, the differences are not statistically significant.
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whi
the future we plan to
mances, which is a co
sic. We also plan to
and different evaluati
Acknowledgments
This work is partly s
Council under the Eu
Program, as part of
agreement 267583).
from Generalitat de C
the European Commi
and European Social
6.
[1] A. D. Patel, Musi
UK: Oxford Univ
[2] W. S. Rockstro,
ers, and J. Rushto
Conclusions and Future work
§  Proposed segmentation better than PLS method
§  Proposed methodology better than standard DTW based
kNN classification
§  Local features yield highest accuracy
§  Contextual features are also important (maybe not
complementary to local features)
§  Future work
§  Perform similar analysis on Bandish performances
§  Incorporate Rāga specific knowledge
Landmark Detection in Hindustani
Music Melodies
Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1
sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu
1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
3Indian Institute of Technology Bombay, Mumbai, India
SMC-ICMC 2014, Athens, Greece
5

Weitere ähnliche Inhalte

Ähnlich wie Landmark Detection in Hindustani Music Melodies

Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernelsDev Nath
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC techniqueChun Hao Wang
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
 
Metric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsMetric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsYing-Shu Kuo
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
 
Transferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongTransferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongNAVER Engineering
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based RankingAndrea Gazzarini
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
is2015_poster
is2015_posteris2015_poster
is2015_posterJan Svec
 
Inductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonologyInductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonologyKevin McMullin
 
AlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.pptAlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.pptSkanderBena
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature surveyAkshay Hegde
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesJie Bao
 

Ähnlich wie Landmark Detection in Hindustani Music Melodies (20)

Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course Lectures
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC technique
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space Modeling
 
Metric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsMetric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target Playlists
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
Transferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongTransferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given Song
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based Ranking
 
LSH
LSHLSH
LSH
 
More than Words: Advancing Prosodic Analysis
More than Words: Advancing Prosodic AnalysisMore than Words: Advancing Prosodic Analysis
More than Words: Advancing Prosodic Analysis
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
 
Rna seq
Rna seqRna seq
Rna seq
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
is2015_poster
is2015_posteris2015_poster
is2015_poster
 
Inductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonologyInductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonology
 
AlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.pptAlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.ppt
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
 

Mehr von Sankalp Gulati

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicSankalp Gulati
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicSankalp Gulati
 

Mehr von Sankalp Gulati (7)

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art Music
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
 
Hindify
HindifyHindify
Hindify
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic Music
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art Music
 

Kürzlich hochgeladen

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 

Landmark Detection in Hindustani Music Melodies

  • 1. Landmark Detection in Hindustani Music Melodies Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1 sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu 1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain 2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain 3Indian Institute of Technology Bombay, Mumbai, India SMC-ICMC 2014, Athens, Greece 5
  • 2. Indian Art Music §  Hindustani music (North Indian music) §  Carnatic music
  • 3. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music
  • 4. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music S r R g G m M P d D n N Do Re Mi Fa Sol La Ti Svaras
  • 5. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music S r R g G m M P d D n N Do Re Mi Fa Sol La Ti Svaras Bhairavi Thaat S r g m P d n
  • 6. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music S r R g G m M P d D n N Do Re Mi Fa Sol La Ti Svaras Bhairavi Thaat *Nyās Svar (Rāg Bilaskhani todi) S r* g* m* P d* n S r g m P d n Nyās translates to home/residence
  • 7. §  Example Melodic Landmark: Nyās Occurrences 178 180 182 184 186 188 190 −200 0 200 400 600 800 Time (s) F0frequency(cents) N1 N5N4N3N2 A. K. Dey, Nyāsa in rāga: the pleasant pause in Hindustani music. Kanishka Publishers, Distributors, 2008.
  • 8. Goal and Motivation §  Methodology for detecting nyās occurrences
  • 9. Goal and Motivation §  Methodology for detecting nyās occurrences N N N N N
  • 10. Goal and Motivation §  Methodology for detecting nyās occurrences §  Motivation §  Melodic motif discovery [Ross and Rao 2012] §  Melodic segmentation §  Music transcription N N N N N J. C. Ross and P. Rao,“DetecKon of raga-characterisKc phrases from Hindustani classical music audio,” in Proc. of 2nd CompMusic Workshop, 2012, pp. 133– 138.
  • 11. Methodology: Block Diagram Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion Block diagram of the proposed methodology
  • 12. Methodology: Pred. Pitch Estimation Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion
  • 13. Methodology: Segmentation Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion
  • 14. Methodology: Feature Extraction Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion 0.12, 0.34, 0.59, 0.23, 0.54 0.21, 0.24, 0.54, 0.54, 0.42 0.32, 0.23, 0.34, 0.41, 0.63 0.66, 0.98, 0.74, 0.33, 0.12 0.90, 0.42, 0.14, 0.83, 0.76
  • 15. Methodology: Segment Classification Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion 0.12, 0.34, 0.59, 0.23, 0.54 0.21, 0.24, 0.54, 0.54, 0.42 0.32, 0.23, 0.34, 0.41, 0.63 0.66, 0.98, 0.74, 0.33, 0.12 0.90, 0.42, 0.14, 0.83, 0.76
  • 16. Methodology: Segment Classification Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion Nyās Non nyās Nyās Non nyās Non nyās
  • 17. Pred. Pitch Estimation and Representation §  Predominant pitch estimation §  Method by Salamon and Gómez (2012) §  Favorable results in MIREX’11 §  Tonic Normalization §  Pitch values converted from Hertz to Cents §  Multi-pitch approach by Gulati et al. (2014) J. Salamon and E. Go ́mez, “Melody extracKon from polyphonic music signals using pitch contour charac- terisKcs,” IEEE TransacKons on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, 2012. S. GulaK, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H. A. Murthy, and X. Serra, “AutomaKc Tonic IdenK- ficaKon in Indian Art Music: Approaches and Evalua- Kon,” Journal of New Music Research, vol. 43, no. 01, pp. 55–73, 2014.
  • 18. Melody Segmentation 1 2 3 4 5 6 7 8 9 10 850 900 950 1000 1050 1100 1150 T1 Sn Sn-1 Sn+1 ε ρ1T2 T3 ρ2 T4 T7 T8 T9 T5 T6 Nyās segment Time (s) FundamentalFrequency(cents) §  Baseline: Piecewise linear segmentation (PLS) E. Keogh, S. Chu, D. Hart, and M. Pazzani, “SegmenKng Kme series: A survey and novel approach,” Data Mining in Time Series Databases, vol. 57, pp. 1–22, 2004.
  • 21. Feature Extraction §  Local (9 features) §  Contextual (24 features)
  • 22. Feature Extraction §  Local (9 features) §  Contextual (24 features) §  Local + Contextual (33 features)
  • 24. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values μ σ
  • 25. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations d1 d2 d3 d4 d5
  • 26. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations §  μ and σ of the peak and valley amplitudes p1 p2 p3 p4 p5 p6
  • 27. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations §  μ and σ of the peak and valley amplitudes §  Temporal centroid (length normalized)
  • 28. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations §  μ and σ of the peak and valley amplitudes §  Temporal centroid (length normalized) §  Binary flatness measure Flat or non-flat
  • 29. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 30. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 31. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 32. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 33. Feature Extraction: Contextual §  4 different normalized segment lengths §  Time difference from the succeeding and preceding breath pauses d1 d2 Breath Pause Breath Pause
  • 34. Feature Extraction: Contextual §  4 different normalized segment lengths §  Time difference from the succeeding and proceeding unvoiced regions §  Local features of neighboring segments (9*2=18) Breath Pause Breath Pause
  • 35. Segment Classification §  Class: Nyās and Non-nyās §  Classifiers: §  Trees (min_sample_split=10) §  K nearest neighbors (n_neighbors=5) §  Naive bayes (fit_prior=False) §  Logistic regression (class_weight=‘auto’) §  Support vector machines (RBF)(class weight=‘auto’) §  Testing methodology §  Cross-fold validation §  Software: Scikit-learn, version 0.14.1 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prejenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duches- nay, “Scikit-learn: machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825– 2830, 2011
  • 36. Evaluation: Dataset §  Audio: §  Number of recordings: 20 Ālap vocal pieces §  Duration of recordings: 1.5 hours §  Number of artists: 8 §  Number of rāgs: 16 §  Type of audio: 15 polyphonic commercial recordings, 5 in-house monophonic recordings** §  Annotations: Musician with > 15 years of training §  Statistics: §  1257 nyās segments §  150 ms to16.7 s §  mean 2.46 s, median 1.47 s. **Openly available under CC license in freesound.org
  • 37. Evaluation: Measures §  Boundary annotations (F-scores) §  Hit rate §  Allowed deviation: 100 ms §  Label annotations (F-scores): §  Pairwise frame clustering method [Levy and Sandler 2008] §  Statistical significance: Mann-Whitney U test (p=0.05) §  Multiple comparison: Holm-Bonferroni method M. Levy and M. Sandler, “Structural segmentaKon of musical audio by constrained clustering,” IEEE TransacKons on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 318–326, 2008. H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochasKcally larger than the other,” The annals of mathemaKcal staKsKcs, vol. 18, no. 1, pp. 50–60, 1947. S. Holm, “A simple sequenKally rejecKve mulKple test procedure,” Scandinavian journal of staKsKcs, pp. 65– 70, 1979.
  • 38. Evaluation: Baseline Approach §  DTW based kNN classification (k=5) §  Frequently used for time series classification §  Random baselines §  Randomly planting boundaries §  Evenly planting boundaries at every 100 ms * §  Ground truth boundaries, randomly assign class labels X. Xi, E. J. Keogh, C. R. Shelton, L. Wei, and C. A. Ratanamahatana, “Fast Kme series classificaKon using numerosity reducKon,” in Proc. of the Int. Conf. on Ma- chine Learning, 2006, pp. 1033–1040. X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. J. Keogh, “Experimental com- parison of representaKon methods and distance mea- sures for Kme series data,” Data Mining and Knowl- edge Discovery, vol. 26, no. 2, pp. 275–309, 2013.
  • 39. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 40. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 41. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 42. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 43. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 44. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 45. Results: Nyās Label Annotation NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM A L 0.553 0.685 0.723 0.621 0.727 0.722 C 0.251 0.639 0.631 0.690 0.688 0.674 L+C 0.389 0.694 0.693 0.708 0.722 0.706 B L 0.546 0.708 0.754 0.714 0.749 0.758 C 0.281 0.671 0.611 0.697 0.689 0.697 L+C 0.332 0.672 0.710 0.730 0.743 0.731 Table 2. F-scores for ny¯as and non-ny¯as label annotations task using PLS method (A) and the proposed segmenta- tion method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local together with contextual (L+C) features. DTW is the baseline method used for comparison. The best random baseline F-score is 0.153 obtained using RB2. mentation method consistently performs better than PLS. However, the differences are not statistically significant. proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whi the future we plan to mances, which is a co sic. We also plan to and different evaluati Acknowledgments This work is partly s Council under the Eu Program, as part of agreement 267583). from Generalitat de C the European Commi and European Social 6. [1] A. D. Patel, Musi UK: Oxford Univ [2] W. S. Rockstro, ers, and J. Rushto
  • 46. Conclusions and Future work §  Proposed segmentation better than PLS method §  Proposed methodology better than standard DTW based kNN classification §  Local features yield highest accuracy §  Contextual features are also important (maybe not complementary to local features) §  Future work §  Perform similar analysis on Bandish performances §  Incorporate Rāga specific knowledge
  • 47. Landmark Detection in Hindustani Music Melodies Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1 sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu 1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain 2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain 3Indian Institute of Technology Bombay, Mumbai, India SMC-ICMC 2014, Athens, Greece 5