SlideShare ist ein Scribd-Unternehmen logo
1 von 21
SHF Public Report Version 1.0                                                      WP2




                                                        SemanticHIFI
                                                            IST-507913



                      Public Report of WP2 “Indexing”
                       Covering period: December 2003 – October 2005
Report version: 1.0
Report preparation date:
Writers: Geoffroy Peeters, Jean-Julien Aucouturier, Florian Plenge, Matthias Gruhne,
Christian Sailer, Etan Fisher
Control: Hugues Vinet, Francis Rousseaux, IRCAM
Classification: Public
Contract start date: December, 1st 2003
Duration: 36 months
Project co-ordinator: Hugues Vinet, IRCAM
Involved Partners: Ircam-AS (WP Co-ordinator), SonyCSL, Fraunhofer IDMT, BGU, Native
Instrument


                                              Project funded by the European Community
                                              under the "Information Society Technology"
                                              Program




SHF-IST-507913                                                                    page 1
SHF Public Report Version 1.0                                                                                                                                    WP2

                                                                 Table of contents

1 WP2 OVERVIEW................................................................................................................................................3
    1.1 OBJECTIVES....................................................................................................................................................... 3
    1.2 PARTNERS’ ROLES............................................................................................................................................... 3
    1.3 WP2 CONTRIBUTION TO THE PROJECT.................................................................................................................... 3
    1.4 SYNTHESIS OF MAIN ACHIEVEMENTS....................................................................................................................... 4
2 WP2 RESULTS AND ACHIEVEMENTS........................................................................................................ 4
    2.1 SEGMENTATION/AUDIO SUMMARY.........................................................................................................................4
    2.2 RHYTHM DESCRIPTION......................................................................................................................................... 6
    2.3 TONALITY DESCRIPTION........................................................................................................................................8
    2.4 EXTRACTOR DISCOVERY SYSTEM........................................................................................................................ 10
    2.5 TEMPO AND PHASE DETECTION............................................................................................................................ 13
    2.6 BROWSING BY LYRICS....................................................................................................................................... 13
    2.7 SOUND SOURCE SEPARATION / SCORE ALIGNMENT................................................................................................14
    2.8 AUDIOID........................................................................................................................................................ 16
3 METHODOLOGIES EMPLOYED ................................................................................................................ 16
    3.1 WP MANAGEMENT AND CO-ORDINATION............................................................................................................ 16
    3.2 MARKET FOLLOWING.........................................................................................................................................17
    3.3 SCIENTIFIC RESEARCH METHODOLOGIES.................................................................................................................18
    3.4 SOFTWARE DEVELOPMENT PRACTICES................................................................................................................... 18
4 DISSEMINATION.............................................................................................................................................18
    4.1 SCIENTIFIC PUBLICATIONS................................................................................................................................... 18
    4.2 OTHER SCIENTIFIC DISSEMINATION ACTIONS...........................................................................................................19
    4.3 CONTRIBUTION TO CLUSTERING AND STANDARDISATION...........................................................................................19
    4.4 PROFESSIONAL COMMUNICATIONS........................................................................................................................ 19
    4.5 PRESS ARTICLES AND INTERVIEWS........................................................................................................................ 20
5 OUTLOOK......................................................................................................................................................... 20
    5.1 INFORMATION PRODUCED BY THE WP.................................................................................................................. 20
    5.2 NEW METHODS................................................................................................................................................. 20
    5.3 SCIENTIFIC BREAKTHROUGHT...............................................................................................................................20
6 CONCLUSION...................................................................................................................................................20




SHF-IST-507913                                                                                                                                                  page 2
SHF Public Report Version 1.0                                                               WP2


        1 WP2 Overview

        1.1 Objectives

The goal of the WP2 indexing work-package is to develop algorithms/techniques and provide
modules for the automatic extraction of signal features allowing a set of specific
functionalities. The targeted functionalities are music segmentation/ summarization/
visualization, browsing/ searching music using high-level music descriptors (rhythm / tonality
/ timbre features), or using a generic scheme for extracting high-level audio descriptors from
audio signal, music remixing (by source separation using score alignment), browsing by lyrics
(using lyrics to score to audio alignment), automated identification of audio signal, tempo and
phase detection.


        1.2 Partners’ roles

Ircam AS: WP leader
•   Ircam AS:         music segmentation/ summarization/ visualization; browsing/ searching
    music using high-level music descriptors (rhythm / tonality),
•   Sony CSL:           generic scheme for extracting high-level audio descriptors (EDS),
•   BGU:               music remixing (by source separation using score alignment), browsing
    by lyrics (using lyrics to score to audio alignment),
•   Fraunhofer IDMT: automated identification of audio signal,
•   Native Instrument: tempo and phase detection.


        1.3 WP2 contribution to the project

Modules developed in WP2 provides the necessary indexing information for WP3 Browsing,
the necessary segments information for WP5 Performing, audio identification module for
WP7 Sharing, and indexing information for the two applications developed (WP6 Authoring
Tools and WP8 HIFI system). The order of priority for the development of WP2 modules is
made therefore according to the interest of the application providers in specific
functionalities, and according to the dependence of modules on other modules.
Development of modules for:
•   music segmentation / summarization / visualization,
•   high-level features for browsing/searching (rhythm / tonal / timbre description), generic
    description inside EDS,
•   browsing by lyrics (based on score alignment),
•   music remixing (source separation based on score alignment),
•   automated identification of audio signal,



SHF-IST-507913                                                                              page 3
SHF Public Report Version 1.0                                                                 WP2
•   beat and phase detection.


        1.4 Synthesis of main achievements

Research has been accomplished and final functional prototypes have been developed.
Modules are available either in the forms of executable, or libraries. Integration into the
applications is being accomplished.



        2 WP2 Results and Achievements


        2.1 Segmentation/Audio Summary

Responsible partner : IrcamAS

        2.1.1    Functional description
Automatic music structure (segmentation) discovery aims at providing insights into the
temporal organization of a music track by analyzing its acoustical content in terms of
repetitions. It then represents a music track as either a set of states or as a set of sequences.
A state is defined as a set of contiguous times which contains similar acoustical information.
Example of this are the musical background of a verse segment or of a chorus segment which
is usually constant during the segment.
A sequence is defined as a set of successive times which is similar to another set of successive
times but the times inside a set are not necessarily identical to each other. It is therefore a
specific case of a state. Example of this are the various melodies repeated in a music track.
Once extracted, the structure can be used for intra-document browsing (forward/backward
insight a music track by verse/chorus, … by melodies) and visualization.
Automatic audio summary generation aims at creating a short audio extract (usually 30
seconds) which is representative of the various contents of a music track. It uses a beat-
synchronous concatenation of various parts of a music track according to the parts estimated
during the structure process.
The module developed in SHF performs the three following tasks:
1) estimates the large scale structure of a music,
2) provide a visual map of a music track,
3) provide an audio summary of a music track.

        2.1.2    Scientific and technological breakthroughs
An algorithm for automatic structure extraction and automatic summary generation had been
previously developed in the framework of the European IST Project CUIDADO. This
technology has been further developed in the framework of the SemanticHIFI project.




SHF-IST-507913                                                                                page 4
SHF Public Report Version 1.0                                                           WP2




We describe the improvement of the technology made in the Semantic HIFI project (see
Figure).
The feature extraction front-end has been extended to represent harmonic features and
combination of timbre and harmonic features. This is essential for classical and jazz music.
For the structure extraction based on state representation, a time-constrained hierarchical
agglomerative clustering is now used which allows to get rid off of the noisy frames (non
repeated frames). For the structure extraction based on sequence representation, the sequence
detection and connection is now made by a likelihood approach. Each time segment is
considered as a mother sequence and it’s likelihood to represent the track duration is
computed by comparing its logical occurrences in the track to the observed occurrences in the
track. A negative weighting is applied in order to avoid two segments to overlap. The audio
summary is now directly computed using the original audio signal (stereo, 44.1KHz),
estimated beat markers (WP rhythm description) are used for beat-synchronous overlap of the
various parts. The algorithm now considers a total duration of the summary constrain.

        2.1.3    Hardware and software outlook
The module consists of two executables.
1) An extraction module which extracts the multi-level structure of a music track (stored as
   an xml file) and creates an audio summary (stored as an audio file and as an xml file
   containing the structure of the audio summary).
2) A music players [see below] which allows the user to interactively listen to the various
   parts of a music track based on its estimated structure. The player allows the user to skip
   from one chorus to the next one, from the beginning of the verse to the beginning of the
   chorus, …




SHF-IST-507913                                                                          page 5
SHF Public Report Version 1.0                                                             WP2




                                                                            chorus

                                                                            bridge

                                                                            verse




                     Media Player mockup for music structure browsing


        2.2 Rhythm description

Responsible partner : IrcamAS

        2.2.1    Functional description
This module aims at providing high-level features derived directly form audio signal analysis
related to rhythm characteristics in order to allow browsing in a music track database. The
module (tempo and phase extraction) developed by Native Instrument targets specifically
percussive based music (dance and electronic music). The module developed here by
IrcamAS targets the description of rhythm for the general class of music including non-
percussive based music (jazz, classical, variety music). In this case, note onset detection is a
crucial factor, as well as quick variation of the tempo detection. The algorithms developed
performs first the extraction of a time-variable tempo, estimation of beat marker positions,
and extract a set of high-level features for browsing: global tempo, global meter, percussivity
features, periodicity features, and a rhythmical pattern that can be used for search by similar
rhythm in a database.




SHF-IST-507913                                                                            page 6
SHF Public Report Version 1.0                                                             WP2
        2.2.2    Scientific and technological breakthroughs




                                Rhythm features extraction module


The technology has been entirely developed in the framework of the Semantic HIFI project. It
has been designed considering the usual drawback of other techniques:
•   weak onset detection
•   tempo ambiguities
The module first estimates the onset of the signal over time. Onsets are defined here as any
meaningful starting of music events (percussions, notes, note transitions, periodic variation of
notes). The onset detection is based on a spectral flux measures of a time and frequency
reassigned spectrogram. The use of the later allows more precise detection of onset even in
case of non-percussive instrument. Most other methods are based on either sub-band energy
functions, or spectrum flux on spectrogram.
The periodicity of the signal is then measure by a proposed combination of Discrete Fourier
Transform and Auto-Correlation Function. Since both functions have inverse octave
ambiguities, the combination of both allows to reduce octave ambiguities. Most other
methods are based either on DFT, or ACF, or inter-onset-interval histogram.
The tempo is related to the periodicity through rhythm templates. So far; three templates are
considered which represents the 2/2, 2/3, 3/2 meter/beat subdivision characteristics. The
probability to observe a specific tempo in a specific rhythm template is computed based on
the observed periodicities. Most other methods consider directly the main periodicity as the
tempo which does not allows to distinguish between 2/4, 3/4 , 6/8 meters.
Tracking of the tempo changes over time is done by formulating the problem as a Viterbi
decoding problem. The tempo and meter/beat subdivision are then estimated simultaneously
as the best temporal path through the observations. Most other methods are based on short
time memory of the past detected periodicities.


SHF-IST-507913                                                                            page 7
SHF Public Report Version 1.0                                                              WP2
The rhythm charcaterization for search by similarity is based on rhythm templates in the
spectral domain which allows to avoid the usual drawback of other techniques: difficulty to
have a robust estimation for non-percussive based music, difficulty to average the description
over the full length of the files, length of the description, computation time of the comparison.

        2.2.3    Hardware and software outlook
The module consists of a single executable which performs both tempo/phase estimation and
rhythm characterization. The module output two xml files:
1) time variable tempo and beat position estimation (which can be used latter for
   performing),
2) the global tempo/meter estimates and rhythm characteristics (which can be used latter for
   browsing/searching).


        2.3 Tonality description

Responsible partner : IrcamAS

        2.3.1    Functional description
This module aims at providing high-level features derived directly form audio signal analysis
related to tonality characteristics in order to allow browsing in a music track database. These
features are especially important for music based on tonal information (classical music).
Tonality extraction from detected notes (transcription) requires a previous estimation of
multi-pitches and a previous segmentation step. While this technology can be used for small
polyphony, it still requires a large computation time which makes it difficult to use in a real
task application of large music collection. The module developed is based on a less
computation time demanding technology: HPCP/Chroma vector estimation. While this
technology does not allow the extraction of exact notes (melody), it is sufficient for the
extraction of the global key (C,C#,D,…), mode (Major, minor) and chords of a music track.
We add to this, the extraction of an harmonic pattern which can be used for search by
similarity.
For each music track, the module provides the following descriptions:
1) Global key (C,C#,D,…) and mode (Major, minor) of the music track,
2) Harmonic pattern: which can be used for search by similarity.




SHF-IST-507913                                                                             page 8
SHF Public Report Version 1.0                                                             WP2
        2.3.2    Scientific and technological breakthroughs




                                Tonality features extraction module
The technology has been entirely developed in the framework of the Semantic HIFI project.
The algorithm operates in two separate stages: one off-line (learning), one on-line
(evaluation).
In the offline stage, templates of tonality and modes are learnt for each possible pairs of key
(C,Db,D,…) and tonality (Major/minor). For this, we follow a similar approach than the one
proposed by Gomez. The tonality and modes templates are based on Krumhansl profiles, the
polyphonic (chords) is modeled using three mains triads (Gomez) and the harmonic structure
of the instrument pitches is modeled as k^[0:H-1] with k<1 (Gomez).
In the online stage, for an unknown given music track, its audio signal is first smoothed in the
time and frequency plan in order to remove noise and transients part. Its spectrum is then
computed and converted to the chroma (Wakefield) scale (also called HPCP (Fujishima)). For
this the energy of the peaks of the spectrum are summed inside frequency bands
corresponding to the chroma scale. Median filtering is applied to smooth chroma over time.
The resulting time/frequency representation is a called a chromagram. The key/mode
estimation is performed by estimating the most likely key/mode template according to the
observed chroma over time. A similar approach than the ones from Izmirli is used. The
chroma vectors are progressively accumulated along time in a forward way. At each time the
most likely key/mode template is estimated from the accumulated chroma vector, and a
salience is assigned to it based on its distance to the second most likely templates. The global
key/mode assign to the music track is the global most salient key/mode template.




SHF-IST-507913                                                                            page 9
SHF Public Report Version 1.0                                                              WP2
        2.3.3    Hardware and software outlook
The module consists of a single executable which performs both tonality/mode estimation and
harmonic pattern estimation. The module output the results in a single xml file.


        2.4 Extractor Discovery System

Responsible partner : SONY CSL

        2.4.1    Functional description
EDS (Extractor Discovery System) is a generic scheme for extracting arbitrary high-level
audio descriptors from audio signals. It is able to automatically produce a fully-fledged audio
extractor (an executable) from a database of labeled audio examples. It uses a supervised
learning approach. Its main characteristics is that it finds automatically optimal audio features
adapted to the problem at work. Descriptors are traditionally designed by combining Low-
Level Descriptors (LLDs) using machine-learning algorithms. The key idea of EDS is to
substitute the basic LLDs with arbitrary complex compositions of signal processing operators:
EDS composes automatically operators to build features as signal processing functions that
are optimal for a given descriptor extraction task. The search for specific features is based on
genetic programming, a well-known technique for exploring search spaces of function
compositions. Resulting features are then fed to a learning model such as a GMM or SVM to
produce a fully-fledged extractor program.




                                Screenshot of the EDS system v1



SHF-IST-507913                                                                           page 10
SHF Public Report Version 1.0                                                              WP2
The global architecture of EDS, illustrated in Figure 2, consists in two parts: modeling of the
descriptor and synthesis of the extractor. Both parts are fully automatic and lead eventually to
an extractor for the descriptor.
The modeling of the descriptor is the main part of EDS. It consists in searching automatically
for a set of relevant features using the genetic search algorithm, and then to search
automatically for the optimal model for the descriptor, that combines these features. The
search for specific features is based on genetic programming, a well-known technique for
exploring search spaces of function compositions. The genetic programming engine composes
automatically signal processing operators to build arbitrarily complex functions. Each built
function is given a fitness value which represents how well the function performs to extract a
given descriptor on a given learning database. The evaluation of a function is very costly, as it
involves complex signal processing on whole audio databases. Therefore, to limit the search,
a set of heuristics are introduced to improve the a priori relevance of the created functions, as
well as rewriting rules to simplify functions before their evaluation. Once the system has
found relevant features, it combines them to feed them into various machine learning models,
and then optimizes the model parameters.
The synthesis part consists in generating an executable file to compute the best model on any
audio signal. This program allows computing this model on arbitrary audio signals, to predict
their value for the modeled descriptor.




                                Architecture of the EDS system




SHF-IST-507913                                                                           page 11
SHF Public Report Version 1.0                                                           WP2
        2.4.2    Scientific and technological breakthroughs
EDS is a very novel concept. Genetic algorithms have been employed for algorithm
discovery, e.g. for efficient compilation of Signal Processing transform, or radar-image
analysis, however, EDS is the first application to audio signal processing and metadata
extraction. The idea of incorporating expert music signal processing knowledge is also novel.
The EDS technique originated in the Cuidado project, however it has been much researched
and improved in the scope of the Semantic Hifi project :
• All machine learning agglomeration models can now be parameterized, and a complete
   search now allows to optimize automatically these parameters, in order to improve the
   descriptor performance, for a given set of features.
• New models have been added : Gaussian Mixture Models and Hidden Markov Models.
• Automatization of the whole process : given a description problem defined by a database
   of labeled signals, the system is now able, automatically, to solve the problem and
   produce a program to compute the descriptor for an external signals, by:
   - building a set of relevant features,
   - selecting and building a model by using optimized machine learning techniques,
   - generating a model file and an executable file for its computation.
• Specific study of the new following descriptors : percussivity, pitchness,
   Happiness/sadness, Danceability, Density, Complexity. Each of these are difficult
   problems that have received less satisfactory solutions in the research communities.

        2.4.3    Hardware and software outlook
The GUI of the EDS system has undergone a complete redesign to make it more easily
extensible and more-user friendly. The system now includes visualization possibilities, to
monitor the increasing fitness of the features and the precision of classification tasks. The
EDS system is now more user-friendly, features and extractors can be edited, copied/pasted,
saved in readable xml format, etc.




 Screenshot of EDS v2.1 which features notably visualization of classification precision
We have abstracted the EDS system into a non-GUI API, which can be called from within
MCM (see WP3.1). The EDS feature is integrated in the final WP3 prototype. A “generalize”


SHF-IST-507913                                                                        page 12
SHF Public Report Version 1.0                                                              WP2
button allows the user to:
• create a new field (e.g. the happiness of a song),
• input a few values manually,
•   have EDS generalize these values into an algorithm, which can be used to describe
    unlabelled examples.


        2.5 Tempo and phase detection

Responsible partner : Native Instrument
Native Instruments compiled a collection of not copy-righted music representative of Native
Instruments customers. We evaluated various state of the art tempo detection algorithms with
the help of this collection.

        2.5.1    Functional description

        2.5.2    Scientific and technological breakthroughs
•   Compile a collection of not copy-righted music representative of our customers
•   Evaluate various state of the art tempo detection algorithms with the help of this collection

        2.5.3    Hardware and software outlook


        2.6 Browsing by Lyrics

Responsible partner : BGU

        2.6.1    Functional description
The user loads the audio file and the attached metadata to the HIFI system. The HIFI system
plays the song and at the same time shows the user the lyrics. The user must have the XML
file containing the synchronized lyrics data to use this tool. The file may be available through
the sharing system. A simple use case of the browsing feature would be to use the traditional
skip button. The user will have the option to use the skip button in the traditional way. The
user will have the option to do ‘Semantic Skip’ – while he presses the skip button he will see
the lyrics of the sound at the location of the skip. A more sophisticated use case is the search
option – the user enters text and the song skips to the location of the words within the song.

        2.6.2    Scientific and technological breakthroughs
The Semantic HIFI browsing by lyrics option is aimed to be equal to other commercial and
state-of-the-art lyrics synchronization tools.

        2.6.3    Hardware and software outlook
The algorithm has several steps:
Lyrics Alignment
1) Find MIDI file that contains the lyrics as MIDI text-events




SHF-IST-507913                                                                            page 13
SHF Public Report Version 1.0                                                            WP2
2) Use score alignment (currently we used the score alignment algorithm that we have from
   CUIDADO) to align the MIDI to the score. Usually, the MIDI contains a track that
   corresponds to the singer. We use this track to align the sound file.
3) Output the result of the previous step to a file (XML metadata file) that contains time and
   text information.
Lyrics Display
1) When the player plays the track, it reads the file, get the appropriate time to text
   information and display the appropriate text.
The windows media player plug-in was build using windows media player visualization SDK.
We implemented COM interface of the visualization object with the algorithm. The figure
below shows the plug-in interface. While the song plays, the player display the appropriate
lyrics as a visualization.




                           Browsing by Lyrics in Windows Media Player


        2.7 Sound Source Separation / Score Alignment

Responsible partner : BGU

        2.7.1    Functional description
This advanced audio processing tool performs the separation of parts or instruments from
within a multi-track audio recording. The main goal of this tool is to give the listener the
ability to manipulate audio in ways not available previously and to enable artistic liberty
normally available only in the studio.
Originally, source separation was based upon alignment to the existing score of the recording
available as a MIDI file. This enabled the direct harmonic separation of the instruments based
on the information appearing in the score. A more advanced approach is now available which


SHF-IST-507913                                                                         page 14
SHF Public Report Version 1.0                                                              WP2
does not require the use of score alignment. Even so, the score alignment may be used for
melody or instrument identification. This tool is developed in collaboration with IRCAM
Room Acoustics for the purpose of spatially re-mixing multi-track audio.

        2.7.2     Scientific and technological breakthroughs
The source separation tool presents a high-level challenge, requiring the use of complex
statistical sound processing techniques. These techniques, to be published in the coming year,
employ a model-based pitch and voicing (or harmonicity) tracking algorithm. The problem of
multi-track recording separation has been approached from many directions since the turn of
the century but has not yet been applied in a working user-based system such as Semantic
HiFi.

            100


                                 Bass                                                 3
            150
                                    Pizzicato(A
                                    )
            200
                                                                                      2

            250
                                    Pizzicato(B
Frequency                           )
   [Hz]                                                                               1
            300


            350                     Pizzicato(A) -                                    0
                                    Harmonic

            400

                                                                                      -1
            450


            500                                                                       -2
                     20     40     60      80      100   120   140   160
                                            Frames


The above figure presents the harmonic log-likelihood function of one second of multi-track
audio (the intro to Stan Getz’ Round Midnight). This function gives clear information as to
the existence of harmonic instruments, in this case – bass and violins. The non-harmonic
information contains the sound of a cymbal.
The source separation algorithm gathers this information and performs a harmonic projection
for extracting the basic sound of each instrument. Then, source quality is improved using
harmonic sharing techniques.

        2.7.3     Hardware and software outlook
The software is currently written in MATLAB and may be optimized for specific use and
compiled as a plug-in. It is currently automatic and separates up to four intelligible and
audibly pleasing parts from a given recording. These tracks can be re-mixed to mono, stereo
or surround.




SHF-IST-507913                                                                            page 15
SHF Public Report Version 1.0                                                             WP2

        2.8 AudioID

Responsible partner : Fraunhofer IDMT

        2.8.1    Functional description
AudioID is a system that performs an automated identification of audio signals. The essential
property of AudioID lies in the fact that it does not rely on the availability of metadata
information that is attached to the audio signal itself. It will, however, identify all incoming
audio signals by means of a database of works that are known to the system. This
functionality can be considered the algorithmic equivalent of human recognition of a song
from the memory of the recognizing person. Before querying for audio items it is required to
extract fingerprints from all songs that are to be classified. A fingerprint contains the
“essence” of an audio item and its extraction algorithm has been standardized within the
MPEG-7 standard. To identify music, a fingerprint should be extracted from the query item
and compared to the database. On a positive comparison result, the gathered metadata are
replied and further processed.

        2.8.2    Scientific and technological breakthroughs
During the project Fraunhofer IDMT reached a major improvement in the area of recognizing
extremely distorted signals. Therefore it is now possible to recognize extremely distorted
music signals (for example GSM-coded signals). Furthermore it has been a significant
improvement in the classification speed, so that broadcast monitoring with 100 or more
channels is possible in real-time.

        2.8.3    Hardware and software outlook
There are already several available applications for identifying music. The most important
software tool is the classification server. This program contains a database of all previously
extracted audio fingerprints. When identifying music, a client extracts the fingerprint from the
actual music data and sends it to the server. The server looks up the database and replies the
classification result. The client will thereafter preprocess the result and work with it.
There are several clients for different ways of application. Among them is a multi-channel
broadcast monitoring system, a tool for cleaning up the user hard disk at home and a client
whish records music from cell phones and identifies them.
During the project furthermore a java plugin for an AudioId client has been developed in
order to integrate the system into the existing Semantic Hifi demonstrator.
The system runs in real time and the most performant tool is the classification server. A
todays average PC can contain up to 100000 fingerprints.


        3 Methodologies Employed

        3.1 WP Management and Co-ordination

WP management consists in:
•   Detailed planning of each sub-wp and coherence with integration constrains


SHF-IST-507913                                                                           page 16
SHF Public Report Version 1.0                                                             WP2
•   Ensure coherence between each sub-wps
•   Ensure coherence between sub-wps targeted functionalities and targeted applications
•   Ensure deliverables, modules and documentation are provided for milestones
•   Initiate corrective actions


        3.2 Market following

        3.2.1    User scenarios
Audio segmentation:
Interaction – Intra-document browsing
•   The user listen to a song and wants to skip to the next part of the song, to the next chorus
    of the song
•   The user wants to have a visual representation of the structure of the song
Creation - Remix
•   The user wants to remix a piece of music by its structure (exchanging verse, chorus
    positions).
•   The user wants a specific visual or automatic accompaniment during verse / chorus, …
Audio summary
Inter-document browsing
•   The user wants a quick preview of a song item in a play-list, in a catalogue, or in a
    database.
Rhythm description
•   The user wants to query a database according to the tempo of the songs, the meter, the
    percussivity/periodicity features
•   The user wants music title with similar tempo and rhythm pattern than a target music title
•   The user wants a play-list with tracks of increasing tempo, constant tempo
Creation
•   The user wants to synchronize two tracks according to their tempo, wants to align them
    according to their beat location.
•   The user wants to remix a track using segments defined as beats.
Tonality description
Inter-document browsing
•   The user wants to query a database according to the key, mode
•   The user wants music title with similar key, mode or harmonic pattern




SHF-IST-507913                                                                           page 17
SHF Public Report Version 1.0                                                          WP2

        3.3 Scientific research methodologies

For a specific functionality:
1) proof a feasibility in a given research time
2) study on state of the art, study on protected technologies
3) development of a starting technology
4) development of an evaluation database (must be representative on the market targeted by
   the application) and development of corresponding annotations (according to the targeted
   functionality)
5) test of the starting technology on the evaluation corpus
6) improvement of the starting technology according to the failures
7) increase of the database size in order to reach realistic conditions
8) test of the technology, computation time evaluation, decrease of algorithms complexity
9) development of prototype and module


        3.4 Software development practices

Re-implementation of Matlab(c) source code into C/C++ source code.
1) Implementation of unit testing for each part of the Matlab source
code.
2) Validation of C++ implementation with the Unit testing suite.
3) Usage of valgrind (http://www.valgrind.org) in order to avoid memory
leak and memory access violation.
FUTUR:
4) Integration of C++ module within the J2EE server application
5) Packaging of C++ module for an easy deployment on each partners
platform.


        4 Dissemination

        4.1 Scientific publications

Peeters, G. (2004). Deriving Musical Structures from Signal Analysis for Music Audio
Summary Generation: "Sequence" and "State" Approach. CMMR 2003 (LNCS 2771). U. K.
Wiil, Springer-Verlag Berlin Heidelberg 2004: 142-165
Peeters, G. (2005). Time Variable Tempo Detection and Beat Marking. ICMC, Barcelone,
Spain.


SHF-IST-507913                                                                       page 18
SHF Public Report Version 1.0                                                             WP2
Peeters, G. (2005). Rhythm Classification using spectral rhythm patterns. ISMIR, London,
UK.
Peeters, G. (2005). «Indexation et accès au contenu musical.» Les Nouveaux Dossiers de
l'Audiovisuel 3.
Peeters, G. (2005). MIREX 2005: Tempo detection and beat marking for perceptual tempo
induction. ISMIR, London, UK.


Monceaux Jérôme; Pachet, François; Amadu, Frédéric; Roy, Pierre and Aymeric Zils
Descriptor-based spatialization. Proceedings of AES Conference 2005, 2005.
 Giordano Cabral, François Pachet, and Jean-Pierre Briot. Recognizing chords with EDS: Part
one.. Proceedings of the 6thComputer Music Modeling and Retrieval (CMMR'2005), Pisa,
Italy, September 2005.
Giordano Cabral, François Pachet, and Jean-Pierre Briot. Automatic X traditional descriptor
extraction: The case of chord recognition.. Proceedings of the 6th International Conference on
Music Information Retrieval (ISMIR'2005), London, U.K., September 2005.
Zils, A. and Pachet, F. Automatic Extraction of Music Descriptors from Acoustic Signals
using EDS. Proceedings of the 116th AES Convention, May 2004.
Pachet, F. and Zils, A. Automatic Extraction of Music Descriptors from Acoustic Signals.
Proceedings of ISMIR 2004, 2004.


        4.2 Other scientific dissemination actions

Peeters, G. (2005). Naviguer dans sa discothèque. La semaine du son, Paris, France.


        4.3 Contribution to clustering and standardisation

Organization of a half-day workshop on the practical use of MPEG-7 audio in audio
application at the AES 25th Int. Conf. Metadata for audio, London, UK. Invited speakers : G.
Peeters, M. Jacob, E. Gomez (UPF), J. Herre (FHG), M. Casey (King’s College).
Peeters, G. (2004). Workshop on MPEG-7 audio. AES 25th Int. Conf. Metadata for audio,
London, UK.


        4.4 Professional communications

Peeters, G. (2005). Description automatique et classification des sons instrumentaux. Journées
de la SFA (Description automatique et perception de la musique), ENST, Paris.




SHF-IST-507913                                                                         page 19
SHF Public Report Version 1.0                                                              WP2

        4.5 Press articles and interviews

Vinet, H., V. Puig, et al. (2005). Le magazine du multimedia, RFI. Paris.
Peeters, G. (2005). "SemanticHIFI, chaîne fidèle", Télérama.fr.
Vinet, H., G. Peeters, et al. (2005). "Je t'MP3 moi non plus" /" Le futur fait ses gammes",
Télérama.fr.
Vinet, H., G. Peeters, et al. (2004). L'ircam recherche la chaine HIFI du futur. 01Net.


        5 Outlook

        5.1 Information produced by the WP

Reusable use cases?
yes for all developed modules
R&D organisation?
target specific functionalities according to application provider markets,
knowledge exchange and library exchange (when possible) among partners


        5.2 New methods

For music structure extraction, summary generation
For beat detection, rhythm characterization


        5.3 Scientific breakthrought

New features for structure detection:         dynamic features based on tempo
New algorithm for sequences detection:        maximum likelihood approach
New algorithm for onset detection:            reassigned spectral energy flux
New algorithm periodicity estimation:         combined DFT/FM-ACF
New algorithm for tempo tracking:             Viterbi rhythm based template decoding


        6 Conclusion

Semantic HIFI workpackage on indexing provides research and modules for the two
applications of the project: the authoring tools and the HIFI system. It therefore covers
researches on all the major indexing fields of music information retrieval: from the
identification of the audio track (audio identification) in order to allow linking the audio to


SHF-IST-507913                                                                            page 20
SHF Public Report Version 1.0                                                           WP2
textual metadata, to the indexing of the content (rhythm, tonality, timbre) in order to allow
search by content or by similarity, to the system learning of user-defined descriptions (EDS),
to innovative listening/performing paradigms allowed by music structure extraction, lyrics
synchronization, re-mixing by source separation.




SHF-IST-507913                                                                         page 21

Weitere ähnliche Inhalte

Ähnlich wie SHFpublicReportfinal_WP2.doc

PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototype
pathsproject
 
Urd dioscuri kbna_v1_1_en_2
Urd dioscuri kbna_v1_1_en_2Urd dioscuri kbna_v1_1_en_2
Urd dioscuri kbna_v1_1_en_2
seakquechchhan
 
ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2
Minh Tuan Nguyen
 
EMPOWERING_D4.1_update-1
EMPOWERING_D4.1_update-1EMPOWERING_D4.1_update-1
EMPOWERING_D4.1_update-1
Giovanni Pede
 
D7.2 Data Deployment Stage 1
D7.2 Data Deployment Stage 1D7.2 Data Deployment Stage 1
D7.2 Data Deployment Stage 1
plan4all
 
D6.2 pan european_plan4all_platform
D6.2 pan european_plan4all_platformD6.2 pan european_plan4all_platform
D6.2 pan european_plan4all_platform
Karel Charvat
 
D6.2 Pan European Plan4all Platform
D6.2 Pan European Plan4all PlatformD6.2 Pan European Plan4all Platform
D6.2 Pan European Plan4all Platform
plan4all
 
PATHS system architecture
PATHS system architecturePATHS system architecture
PATHS system architecture
pathsproject
 

Ähnlich wie SHFpublicReportfinal_WP2.doc (20)

D5.1. LinkedTV Platform and Architecture
D5.1. LinkedTV Platform and ArchitectureD5.1. LinkedTV Platform and Architecture
D5.1. LinkedTV Platform and Architecture
 
PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototype
 
Urd dioscuri kbna_v1_1_en_2
Urd dioscuri kbna_v1_1_en_2Urd dioscuri kbna_v1_1_en_2
Urd dioscuri kbna_v1_1_en_2
 
ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2
 
D4.2. User Profile Schema and Profile Capturing
D4.2. User Profile Schema and Profile CapturingD4.2. User Profile Schema and Profile Capturing
D4.2. User Profile Schema and Profile Capturing
 
Video compression techniques & standards lama mahmoud_report#1
Video compression techniques & standards lama mahmoud_report#1Video compression techniques & standards lama mahmoud_report#1
Video compression techniques & standards lama mahmoud_report#1
 
Ft Benning EDP.PDF
Ft Benning EDP.PDFFt Benning EDP.PDF
Ft Benning EDP.PDF
 
Part 6
Part 6Part 6
Part 6
 
Scope
ScopeScope
Scope
 
D1.1. State of The Art and Requirements Analysis for Hypervideo
D1.1. State of The Art and Requirements Analysis for HypervideoD1.1. State of The Art and Requirements Analysis for Hypervideo
D1.1. State of The Art and Requirements Analysis for Hypervideo
 
EMPOWERING_D4.1_update-1
EMPOWERING_D4.1_update-1EMPOWERING_D4.1_update-1
EMPOWERING_D4.1_update-1
 
D7.2 Data Deployment Stage 1
D7.2 Data Deployment Stage 1D7.2 Data Deployment Stage 1
D7.2 Data Deployment Stage 1
 
iXblue - DELPH Sonar advanced notes
iXblue - DELPH Sonar advanced notesiXblue - DELPH Sonar advanced notes
iXblue - DELPH Sonar advanced notes
 
D6.2 pan european_plan4all_platform
D6.2 pan european_plan4all_platformD6.2 pan european_plan4all_platform
D6.2 pan european_plan4all_platform
 
D6.2 Pan European Plan4all Platform
D6.2 Pan European Plan4all PlatformD6.2 Pan European Plan4all Platform
D6.2 Pan European Plan4all Platform
 
Automatic Subtitle Generation for Sound in Videos
Automatic Subtitle Generation for Sound in VideosAutomatic Subtitle Generation for Sound in Videos
Automatic Subtitle Generation for Sound in Videos
 
PATHS system architecture
PATHS system architecturePATHS system architecture
PATHS system architecture
 
Automatic Subtitle Generation For Sound In Videos
Automatic Subtitle Generation For Sound In VideosAutomatic Subtitle Generation For Sound In Videos
Automatic Subtitle Generation For Sound In Videos
 
X41-TUF-Audit-2022-Final-Report-PUBLIC.pdf
X41-TUF-Audit-2022-Final-Report-PUBLIC.pdfX41-TUF-Audit-2022-Final-Report-PUBLIC.pdf
X41-TUF-Audit-2022-Final-Report-PUBLIC.pdf
 
PEPPOL Test Guidelines
PEPPOL Test GuidelinesPEPPOL Test Guidelines
PEPPOL Test Guidelines
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

SHFpublicReportfinal_WP2.doc

  • 1. SHF Public Report Version 1.0 WP2 SemanticHIFI IST-507913 Public Report of WP2 “Indexing” Covering period: December 2003 – October 2005 Report version: 1.0 Report preparation date: Writers: Geoffroy Peeters, Jean-Julien Aucouturier, Florian Plenge, Matthias Gruhne, Christian Sailer, Etan Fisher Control: Hugues Vinet, Francis Rousseaux, IRCAM Classification: Public Contract start date: December, 1st 2003 Duration: 36 months Project co-ordinator: Hugues Vinet, IRCAM Involved Partners: Ircam-AS (WP Co-ordinator), SonyCSL, Fraunhofer IDMT, BGU, Native Instrument Project funded by the European Community under the "Information Society Technology" Program SHF-IST-507913 page 1
  • 2. SHF Public Report Version 1.0 WP2 Table of contents 1 WP2 OVERVIEW................................................................................................................................................3 1.1 OBJECTIVES....................................................................................................................................................... 3 1.2 PARTNERS’ ROLES............................................................................................................................................... 3 1.3 WP2 CONTRIBUTION TO THE PROJECT.................................................................................................................... 3 1.4 SYNTHESIS OF MAIN ACHIEVEMENTS....................................................................................................................... 4 2 WP2 RESULTS AND ACHIEVEMENTS........................................................................................................ 4 2.1 SEGMENTATION/AUDIO SUMMARY.........................................................................................................................4 2.2 RHYTHM DESCRIPTION......................................................................................................................................... 6 2.3 TONALITY DESCRIPTION........................................................................................................................................8 2.4 EXTRACTOR DISCOVERY SYSTEM........................................................................................................................ 10 2.5 TEMPO AND PHASE DETECTION............................................................................................................................ 13 2.6 BROWSING BY LYRICS....................................................................................................................................... 13 2.7 SOUND SOURCE SEPARATION / SCORE ALIGNMENT................................................................................................14 2.8 AUDIOID........................................................................................................................................................ 16 3 METHODOLOGIES EMPLOYED ................................................................................................................ 16 3.1 WP MANAGEMENT AND CO-ORDINATION............................................................................................................ 16 3.2 MARKET FOLLOWING.........................................................................................................................................17 3.3 SCIENTIFIC RESEARCH METHODOLOGIES.................................................................................................................18 3.4 SOFTWARE DEVELOPMENT PRACTICES................................................................................................................... 18 4 DISSEMINATION.............................................................................................................................................18 4.1 SCIENTIFIC PUBLICATIONS................................................................................................................................... 18 4.2 OTHER SCIENTIFIC DISSEMINATION ACTIONS...........................................................................................................19 4.3 CONTRIBUTION TO CLUSTERING AND STANDARDISATION...........................................................................................19 4.4 PROFESSIONAL COMMUNICATIONS........................................................................................................................ 19 4.5 PRESS ARTICLES AND INTERVIEWS........................................................................................................................ 20 5 OUTLOOK......................................................................................................................................................... 20 5.1 INFORMATION PRODUCED BY THE WP.................................................................................................................. 20 5.2 NEW METHODS................................................................................................................................................. 20 5.3 SCIENTIFIC BREAKTHROUGHT...............................................................................................................................20 6 CONCLUSION...................................................................................................................................................20 SHF-IST-507913 page 2
  • 3. SHF Public Report Version 1.0 WP2 1 WP2 Overview 1.1 Objectives The goal of the WP2 indexing work-package is to develop algorithms/techniques and provide modules for the automatic extraction of signal features allowing a set of specific functionalities. The targeted functionalities are music segmentation/ summarization/ visualization, browsing/ searching music using high-level music descriptors (rhythm / tonality / timbre features), or using a generic scheme for extracting high-level audio descriptors from audio signal, music remixing (by source separation using score alignment), browsing by lyrics (using lyrics to score to audio alignment), automated identification of audio signal, tempo and phase detection. 1.2 Partners’ roles Ircam AS: WP leader • Ircam AS: music segmentation/ summarization/ visualization; browsing/ searching music using high-level music descriptors (rhythm / tonality), • Sony CSL: generic scheme for extracting high-level audio descriptors (EDS), • BGU: music remixing (by source separation using score alignment), browsing by lyrics (using lyrics to score to audio alignment), • Fraunhofer IDMT: automated identification of audio signal, • Native Instrument: tempo and phase detection. 1.3 WP2 contribution to the project Modules developed in WP2 provides the necessary indexing information for WP3 Browsing, the necessary segments information for WP5 Performing, audio identification module for WP7 Sharing, and indexing information for the two applications developed (WP6 Authoring Tools and WP8 HIFI system). The order of priority for the development of WP2 modules is made therefore according to the interest of the application providers in specific functionalities, and according to the dependence of modules on other modules. Development of modules for: • music segmentation / summarization / visualization, • high-level features for browsing/searching (rhythm / tonal / timbre description), generic description inside EDS, • browsing by lyrics (based on score alignment), • music remixing (source separation based on score alignment), • automated identification of audio signal, SHF-IST-507913 page 3
  • 4. SHF Public Report Version 1.0 WP2 • beat and phase detection. 1.4 Synthesis of main achievements Research has been accomplished and final functional prototypes have been developed. Modules are available either in the forms of executable, or libraries. Integration into the applications is being accomplished. 2 WP2 Results and Achievements 2.1 Segmentation/Audio Summary Responsible partner : IrcamAS 2.1.1 Functional description Automatic music structure (segmentation) discovery aims at providing insights into the temporal organization of a music track by analyzing its acoustical content in terms of repetitions. It then represents a music track as either a set of states or as a set of sequences. A state is defined as a set of contiguous times which contains similar acoustical information. Example of this are the musical background of a verse segment or of a chorus segment which is usually constant during the segment. A sequence is defined as a set of successive times which is similar to another set of successive times but the times inside a set are not necessarily identical to each other. It is therefore a specific case of a state. Example of this are the various melodies repeated in a music track. Once extracted, the structure can be used for intra-document browsing (forward/backward insight a music track by verse/chorus, … by melodies) and visualization. Automatic audio summary generation aims at creating a short audio extract (usually 30 seconds) which is representative of the various contents of a music track. It uses a beat- synchronous concatenation of various parts of a music track according to the parts estimated during the structure process. The module developed in SHF performs the three following tasks: 1) estimates the large scale structure of a music, 2) provide a visual map of a music track, 3) provide an audio summary of a music track. 2.1.2 Scientific and technological breakthroughs An algorithm for automatic structure extraction and automatic summary generation had been previously developed in the framework of the European IST Project CUIDADO. This technology has been further developed in the framework of the SemanticHIFI project. SHF-IST-507913 page 4
  • 5. SHF Public Report Version 1.0 WP2 We describe the improvement of the technology made in the Semantic HIFI project (see Figure). The feature extraction front-end has been extended to represent harmonic features and combination of timbre and harmonic features. This is essential for classical and jazz music. For the structure extraction based on state representation, a time-constrained hierarchical agglomerative clustering is now used which allows to get rid off of the noisy frames (non repeated frames). For the structure extraction based on sequence representation, the sequence detection and connection is now made by a likelihood approach. Each time segment is considered as a mother sequence and it’s likelihood to represent the track duration is computed by comparing its logical occurrences in the track to the observed occurrences in the track. A negative weighting is applied in order to avoid two segments to overlap. The audio summary is now directly computed using the original audio signal (stereo, 44.1KHz), estimated beat markers (WP rhythm description) are used for beat-synchronous overlap of the various parts. The algorithm now considers a total duration of the summary constrain. 2.1.3 Hardware and software outlook The module consists of two executables. 1) An extraction module which extracts the multi-level structure of a music track (stored as an xml file) and creates an audio summary (stored as an audio file and as an xml file containing the structure of the audio summary). 2) A music players [see below] which allows the user to interactively listen to the various parts of a music track based on its estimated structure. The player allows the user to skip from one chorus to the next one, from the beginning of the verse to the beginning of the chorus, … SHF-IST-507913 page 5
  • 6. SHF Public Report Version 1.0 WP2 chorus bridge verse Media Player mockup for music structure browsing 2.2 Rhythm description Responsible partner : IrcamAS 2.2.1 Functional description This module aims at providing high-level features derived directly form audio signal analysis related to rhythm characteristics in order to allow browsing in a music track database. The module (tempo and phase extraction) developed by Native Instrument targets specifically percussive based music (dance and electronic music). The module developed here by IrcamAS targets the description of rhythm for the general class of music including non- percussive based music (jazz, classical, variety music). In this case, note onset detection is a crucial factor, as well as quick variation of the tempo detection. The algorithms developed performs first the extraction of a time-variable tempo, estimation of beat marker positions, and extract a set of high-level features for browsing: global tempo, global meter, percussivity features, periodicity features, and a rhythmical pattern that can be used for search by similar rhythm in a database. SHF-IST-507913 page 6
  • 7. SHF Public Report Version 1.0 WP2 2.2.2 Scientific and technological breakthroughs Rhythm features extraction module The technology has been entirely developed in the framework of the Semantic HIFI project. It has been designed considering the usual drawback of other techniques: • weak onset detection • tempo ambiguities The module first estimates the onset of the signal over time. Onsets are defined here as any meaningful starting of music events (percussions, notes, note transitions, periodic variation of notes). The onset detection is based on a spectral flux measures of a time and frequency reassigned spectrogram. The use of the later allows more precise detection of onset even in case of non-percussive instrument. Most other methods are based on either sub-band energy functions, or spectrum flux on spectrogram. The periodicity of the signal is then measure by a proposed combination of Discrete Fourier Transform and Auto-Correlation Function. Since both functions have inverse octave ambiguities, the combination of both allows to reduce octave ambiguities. Most other methods are based either on DFT, or ACF, or inter-onset-interval histogram. The tempo is related to the periodicity through rhythm templates. So far; three templates are considered which represents the 2/2, 2/3, 3/2 meter/beat subdivision characteristics. The probability to observe a specific tempo in a specific rhythm template is computed based on the observed periodicities. Most other methods consider directly the main periodicity as the tempo which does not allows to distinguish between 2/4, 3/4 , 6/8 meters. Tracking of the tempo changes over time is done by formulating the problem as a Viterbi decoding problem. The tempo and meter/beat subdivision are then estimated simultaneously as the best temporal path through the observations. Most other methods are based on short time memory of the past detected periodicities. SHF-IST-507913 page 7
  • 8. SHF Public Report Version 1.0 WP2 The rhythm charcaterization for search by similarity is based on rhythm templates in the spectral domain which allows to avoid the usual drawback of other techniques: difficulty to have a robust estimation for non-percussive based music, difficulty to average the description over the full length of the files, length of the description, computation time of the comparison. 2.2.3 Hardware and software outlook The module consists of a single executable which performs both tempo/phase estimation and rhythm characterization. The module output two xml files: 1) time variable tempo and beat position estimation (which can be used latter for performing), 2) the global tempo/meter estimates and rhythm characteristics (which can be used latter for browsing/searching). 2.3 Tonality description Responsible partner : IrcamAS 2.3.1 Functional description This module aims at providing high-level features derived directly form audio signal analysis related to tonality characteristics in order to allow browsing in a music track database. These features are especially important for music based on tonal information (classical music). Tonality extraction from detected notes (transcription) requires a previous estimation of multi-pitches and a previous segmentation step. While this technology can be used for small polyphony, it still requires a large computation time which makes it difficult to use in a real task application of large music collection. The module developed is based on a less computation time demanding technology: HPCP/Chroma vector estimation. While this technology does not allow the extraction of exact notes (melody), it is sufficient for the extraction of the global key (C,C#,D,…), mode (Major, minor) and chords of a music track. We add to this, the extraction of an harmonic pattern which can be used for search by similarity. For each music track, the module provides the following descriptions: 1) Global key (C,C#,D,…) and mode (Major, minor) of the music track, 2) Harmonic pattern: which can be used for search by similarity. SHF-IST-507913 page 8
  • 9. SHF Public Report Version 1.0 WP2 2.3.2 Scientific and technological breakthroughs Tonality features extraction module The technology has been entirely developed in the framework of the Semantic HIFI project. The algorithm operates in two separate stages: one off-line (learning), one on-line (evaluation). In the offline stage, templates of tonality and modes are learnt for each possible pairs of key (C,Db,D,…) and tonality (Major/minor). For this, we follow a similar approach than the one proposed by Gomez. The tonality and modes templates are based on Krumhansl profiles, the polyphonic (chords) is modeled using three mains triads (Gomez) and the harmonic structure of the instrument pitches is modeled as k^[0:H-1] with k<1 (Gomez). In the online stage, for an unknown given music track, its audio signal is first smoothed in the time and frequency plan in order to remove noise and transients part. Its spectrum is then computed and converted to the chroma (Wakefield) scale (also called HPCP (Fujishima)). For this the energy of the peaks of the spectrum are summed inside frequency bands corresponding to the chroma scale. Median filtering is applied to smooth chroma over time. The resulting time/frequency representation is a called a chromagram. The key/mode estimation is performed by estimating the most likely key/mode template according to the observed chroma over time. A similar approach than the ones from Izmirli is used. The chroma vectors are progressively accumulated along time in a forward way. At each time the most likely key/mode template is estimated from the accumulated chroma vector, and a salience is assigned to it based on its distance to the second most likely templates. The global key/mode assign to the music track is the global most salient key/mode template. SHF-IST-507913 page 9
  • 10. SHF Public Report Version 1.0 WP2 2.3.3 Hardware and software outlook The module consists of a single executable which performs both tonality/mode estimation and harmonic pattern estimation. The module output the results in a single xml file. 2.4 Extractor Discovery System Responsible partner : SONY CSL 2.4.1 Functional description EDS (Extractor Discovery System) is a generic scheme for extracting arbitrary high-level audio descriptors from audio signals. It is able to automatically produce a fully-fledged audio extractor (an executable) from a database of labeled audio examples. It uses a supervised learning approach. Its main characteristics is that it finds automatically optimal audio features adapted to the problem at work. Descriptors are traditionally designed by combining Low- Level Descriptors (LLDs) using machine-learning algorithms. The key idea of EDS is to substitute the basic LLDs with arbitrary complex compositions of signal processing operators: EDS composes automatically operators to build features as signal processing functions that are optimal for a given descriptor extraction task. The search for specific features is based on genetic programming, a well-known technique for exploring search spaces of function compositions. Resulting features are then fed to a learning model such as a GMM or SVM to produce a fully-fledged extractor program. Screenshot of the EDS system v1 SHF-IST-507913 page 10
  • 11. SHF Public Report Version 1.0 WP2 The global architecture of EDS, illustrated in Figure 2, consists in two parts: modeling of the descriptor and synthesis of the extractor. Both parts are fully automatic and lead eventually to an extractor for the descriptor. The modeling of the descriptor is the main part of EDS. It consists in searching automatically for a set of relevant features using the genetic search algorithm, and then to search automatically for the optimal model for the descriptor, that combines these features. The search for specific features is based on genetic programming, a well-known technique for exploring search spaces of function compositions. The genetic programming engine composes automatically signal processing operators to build arbitrarily complex functions. Each built function is given a fitness value which represents how well the function performs to extract a given descriptor on a given learning database. The evaluation of a function is very costly, as it involves complex signal processing on whole audio databases. Therefore, to limit the search, a set of heuristics are introduced to improve the a priori relevance of the created functions, as well as rewriting rules to simplify functions before their evaluation. Once the system has found relevant features, it combines them to feed them into various machine learning models, and then optimizes the model parameters. The synthesis part consists in generating an executable file to compute the best model on any audio signal. This program allows computing this model on arbitrary audio signals, to predict their value for the modeled descriptor. Architecture of the EDS system SHF-IST-507913 page 11
  • 12. SHF Public Report Version 1.0 WP2 2.4.2 Scientific and technological breakthroughs EDS is a very novel concept. Genetic algorithms have been employed for algorithm discovery, e.g. for efficient compilation of Signal Processing transform, or radar-image analysis, however, EDS is the first application to audio signal processing and metadata extraction. The idea of incorporating expert music signal processing knowledge is also novel. The EDS technique originated in the Cuidado project, however it has been much researched and improved in the scope of the Semantic Hifi project : • All machine learning agglomeration models can now be parameterized, and a complete search now allows to optimize automatically these parameters, in order to improve the descriptor performance, for a given set of features. • New models have been added : Gaussian Mixture Models and Hidden Markov Models. • Automatization of the whole process : given a description problem defined by a database of labeled signals, the system is now able, automatically, to solve the problem and produce a program to compute the descriptor for an external signals, by: - building a set of relevant features, - selecting and building a model by using optimized machine learning techniques, - generating a model file and an executable file for its computation. • Specific study of the new following descriptors : percussivity, pitchness, Happiness/sadness, Danceability, Density, Complexity. Each of these are difficult problems that have received less satisfactory solutions in the research communities. 2.4.3 Hardware and software outlook The GUI of the EDS system has undergone a complete redesign to make it more easily extensible and more-user friendly. The system now includes visualization possibilities, to monitor the increasing fitness of the features and the precision of classification tasks. The EDS system is now more user-friendly, features and extractors can be edited, copied/pasted, saved in readable xml format, etc. Screenshot of EDS v2.1 which features notably visualization of classification precision We have abstracted the EDS system into a non-GUI API, which can be called from within MCM (see WP3.1). The EDS feature is integrated in the final WP3 prototype. A “generalize” SHF-IST-507913 page 12
  • 13. SHF Public Report Version 1.0 WP2 button allows the user to: • create a new field (e.g. the happiness of a song), • input a few values manually, • have EDS generalize these values into an algorithm, which can be used to describe unlabelled examples. 2.5 Tempo and phase detection Responsible partner : Native Instrument Native Instruments compiled a collection of not copy-righted music representative of Native Instruments customers. We evaluated various state of the art tempo detection algorithms with the help of this collection. 2.5.1 Functional description 2.5.2 Scientific and technological breakthroughs • Compile a collection of not copy-righted music representative of our customers • Evaluate various state of the art tempo detection algorithms with the help of this collection 2.5.3 Hardware and software outlook 2.6 Browsing by Lyrics Responsible partner : BGU 2.6.1 Functional description The user loads the audio file and the attached metadata to the HIFI system. The HIFI system plays the song and at the same time shows the user the lyrics. The user must have the XML file containing the synchronized lyrics data to use this tool. The file may be available through the sharing system. A simple use case of the browsing feature would be to use the traditional skip button. The user will have the option to use the skip button in the traditional way. The user will have the option to do ‘Semantic Skip’ – while he presses the skip button he will see the lyrics of the sound at the location of the skip. A more sophisticated use case is the search option – the user enters text and the song skips to the location of the words within the song. 2.6.2 Scientific and technological breakthroughs The Semantic HIFI browsing by lyrics option is aimed to be equal to other commercial and state-of-the-art lyrics synchronization tools. 2.6.3 Hardware and software outlook The algorithm has several steps: Lyrics Alignment 1) Find MIDI file that contains the lyrics as MIDI text-events SHF-IST-507913 page 13
  • 14. SHF Public Report Version 1.0 WP2 2) Use score alignment (currently we used the score alignment algorithm that we have from CUIDADO) to align the MIDI to the score. Usually, the MIDI contains a track that corresponds to the singer. We use this track to align the sound file. 3) Output the result of the previous step to a file (XML metadata file) that contains time and text information. Lyrics Display 1) When the player plays the track, it reads the file, get the appropriate time to text information and display the appropriate text. The windows media player plug-in was build using windows media player visualization SDK. We implemented COM interface of the visualization object with the algorithm. The figure below shows the plug-in interface. While the song plays, the player display the appropriate lyrics as a visualization. Browsing by Lyrics in Windows Media Player 2.7 Sound Source Separation / Score Alignment Responsible partner : BGU 2.7.1 Functional description This advanced audio processing tool performs the separation of parts or instruments from within a multi-track audio recording. The main goal of this tool is to give the listener the ability to manipulate audio in ways not available previously and to enable artistic liberty normally available only in the studio. Originally, source separation was based upon alignment to the existing score of the recording available as a MIDI file. This enabled the direct harmonic separation of the instruments based on the information appearing in the score. A more advanced approach is now available which SHF-IST-507913 page 14
  • 15. SHF Public Report Version 1.0 WP2 does not require the use of score alignment. Even so, the score alignment may be used for melody or instrument identification. This tool is developed in collaboration with IRCAM Room Acoustics for the purpose of spatially re-mixing multi-track audio. 2.7.2 Scientific and technological breakthroughs The source separation tool presents a high-level challenge, requiring the use of complex statistical sound processing techniques. These techniques, to be published in the coming year, employ a model-based pitch and voicing (or harmonicity) tracking algorithm. The problem of multi-track recording separation has been approached from many directions since the turn of the century but has not yet been applied in a working user-based system such as Semantic HiFi. 100 Bass 3 150 Pizzicato(A ) 200 2 250 Pizzicato(B Frequency ) [Hz] 1 300 350 Pizzicato(A) - 0 Harmonic 400 -1 450 500 -2 20 40 60 80 100 120 140 160 Frames The above figure presents the harmonic log-likelihood function of one second of multi-track audio (the intro to Stan Getz’ Round Midnight). This function gives clear information as to the existence of harmonic instruments, in this case – bass and violins. The non-harmonic information contains the sound of a cymbal. The source separation algorithm gathers this information and performs a harmonic projection for extracting the basic sound of each instrument. Then, source quality is improved using harmonic sharing techniques. 2.7.3 Hardware and software outlook The software is currently written in MATLAB and may be optimized for specific use and compiled as a plug-in. It is currently automatic and separates up to four intelligible and audibly pleasing parts from a given recording. These tracks can be re-mixed to mono, stereo or surround. SHF-IST-507913 page 15
  • 16. SHF Public Report Version 1.0 WP2 2.8 AudioID Responsible partner : Fraunhofer IDMT 2.8.1 Functional description AudioID is a system that performs an automated identification of audio signals. The essential property of AudioID lies in the fact that it does not rely on the availability of metadata information that is attached to the audio signal itself. It will, however, identify all incoming audio signals by means of a database of works that are known to the system. This functionality can be considered the algorithmic equivalent of human recognition of a song from the memory of the recognizing person. Before querying for audio items it is required to extract fingerprints from all songs that are to be classified. A fingerprint contains the “essence” of an audio item and its extraction algorithm has been standardized within the MPEG-7 standard. To identify music, a fingerprint should be extracted from the query item and compared to the database. On a positive comparison result, the gathered metadata are replied and further processed. 2.8.2 Scientific and technological breakthroughs During the project Fraunhofer IDMT reached a major improvement in the area of recognizing extremely distorted signals. Therefore it is now possible to recognize extremely distorted music signals (for example GSM-coded signals). Furthermore it has been a significant improvement in the classification speed, so that broadcast monitoring with 100 or more channels is possible in real-time. 2.8.3 Hardware and software outlook There are already several available applications for identifying music. The most important software tool is the classification server. This program contains a database of all previously extracted audio fingerprints. When identifying music, a client extracts the fingerprint from the actual music data and sends it to the server. The server looks up the database and replies the classification result. The client will thereafter preprocess the result and work with it. There are several clients for different ways of application. Among them is a multi-channel broadcast monitoring system, a tool for cleaning up the user hard disk at home and a client whish records music from cell phones and identifies them. During the project furthermore a java plugin for an AudioId client has been developed in order to integrate the system into the existing Semantic Hifi demonstrator. The system runs in real time and the most performant tool is the classification server. A todays average PC can contain up to 100000 fingerprints. 3 Methodologies Employed 3.1 WP Management and Co-ordination WP management consists in: • Detailed planning of each sub-wp and coherence with integration constrains SHF-IST-507913 page 16
  • 17. SHF Public Report Version 1.0 WP2 • Ensure coherence between each sub-wps • Ensure coherence between sub-wps targeted functionalities and targeted applications • Ensure deliverables, modules and documentation are provided for milestones • Initiate corrective actions 3.2 Market following 3.2.1 User scenarios Audio segmentation: Interaction – Intra-document browsing • The user listen to a song and wants to skip to the next part of the song, to the next chorus of the song • The user wants to have a visual representation of the structure of the song Creation - Remix • The user wants to remix a piece of music by its structure (exchanging verse, chorus positions). • The user wants a specific visual or automatic accompaniment during verse / chorus, … Audio summary Inter-document browsing • The user wants a quick preview of a song item in a play-list, in a catalogue, or in a database. Rhythm description • The user wants to query a database according to the tempo of the songs, the meter, the percussivity/periodicity features • The user wants music title with similar tempo and rhythm pattern than a target music title • The user wants a play-list with tracks of increasing tempo, constant tempo Creation • The user wants to synchronize two tracks according to their tempo, wants to align them according to their beat location. • The user wants to remix a track using segments defined as beats. Tonality description Inter-document browsing • The user wants to query a database according to the key, mode • The user wants music title with similar key, mode or harmonic pattern SHF-IST-507913 page 17
  • 18. SHF Public Report Version 1.0 WP2 3.3 Scientific research methodologies For a specific functionality: 1) proof a feasibility in a given research time 2) study on state of the art, study on protected technologies 3) development of a starting technology 4) development of an evaluation database (must be representative on the market targeted by the application) and development of corresponding annotations (according to the targeted functionality) 5) test of the starting technology on the evaluation corpus 6) improvement of the starting technology according to the failures 7) increase of the database size in order to reach realistic conditions 8) test of the technology, computation time evaluation, decrease of algorithms complexity 9) development of prototype and module 3.4 Software development practices Re-implementation of Matlab(c) source code into C/C++ source code. 1) Implementation of unit testing for each part of the Matlab source code. 2) Validation of C++ implementation with the Unit testing suite. 3) Usage of valgrind (http://www.valgrind.org) in order to avoid memory leak and memory access violation. FUTUR: 4) Integration of C++ module within the J2EE server application 5) Packaging of C++ module for an easy deployment on each partners platform. 4 Dissemination 4.1 Scientific publications Peeters, G. (2004). Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: "Sequence" and "State" Approach. CMMR 2003 (LNCS 2771). U. K. Wiil, Springer-Verlag Berlin Heidelberg 2004: 142-165 Peeters, G. (2005). Time Variable Tempo Detection and Beat Marking. ICMC, Barcelone, Spain. SHF-IST-507913 page 18
  • 19. SHF Public Report Version 1.0 WP2 Peeters, G. (2005). Rhythm Classification using spectral rhythm patterns. ISMIR, London, UK. Peeters, G. (2005). «Indexation et accès au contenu musical.» Les Nouveaux Dossiers de l'Audiovisuel 3. Peeters, G. (2005). MIREX 2005: Tempo detection and beat marking for perceptual tempo induction. ISMIR, London, UK. Monceaux Jérôme; Pachet, François; Amadu, Frédéric; Roy, Pierre and Aymeric Zils Descriptor-based spatialization. Proceedings of AES Conference 2005, 2005. Giordano Cabral, François Pachet, and Jean-Pierre Briot. Recognizing chords with EDS: Part one.. Proceedings of the 6thComputer Music Modeling and Retrieval (CMMR'2005), Pisa, Italy, September 2005. Giordano Cabral, François Pachet, and Jean-Pierre Briot. Automatic X traditional descriptor extraction: The case of chord recognition.. Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR'2005), London, U.K., September 2005. Zils, A. and Pachet, F. Automatic Extraction of Music Descriptors from Acoustic Signals using EDS. Proceedings of the 116th AES Convention, May 2004. Pachet, F. and Zils, A. Automatic Extraction of Music Descriptors from Acoustic Signals. Proceedings of ISMIR 2004, 2004. 4.2 Other scientific dissemination actions Peeters, G. (2005). Naviguer dans sa discothèque. La semaine du son, Paris, France. 4.3 Contribution to clustering and standardisation Organization of a half-day workshop on the practical use of MPEG-7 audio in audio application at the AES 25th Int. Conf. Metadata for audio, London, UK. Invited speakers : G. Peeters, M. Jacob, E. Gomez (UPF), J. Herre (FHG), M. Casey (King’s College). Peeters, G. (2004). Workshop on MPEG-7 audio. AES 25th Int. Conf. Metadata for audio, London, UK. 4.4 Professional communications Peeters, G. (2005). Description automatique et classification des sons instrumentaux. Journées de la SFA (Description automatique et perception de la musique), ENST, Paris. SHF-IST-507913 page 19
  • 20. SHF Public Report Version 1.0 WP2 4.5 Press articles and interviews Vinet, H., V. Puig, et al. (2005). Le magazine du multimedia, RFI. Paris. Peeters, G. (2005). "SemanticHIFI, chaîne fidèle", Télérama.fr. Vinet, H., G. Peeters, et al. (2005). "Je t'MP3 moi non plus" /" Le futur fait ses gammes", Télérama.fr. Vinet, H., G. Peeters, et al. (2004). L'ircam recherche la chaine HIFI du futur. 01Net. 5 Outlook 5.1 Information produced by the WP Reusable use cases? yes for all developed modules R&D organisation? target specific functionalities according to application provider markets, knowledge exchange and library exchange (when possible) among partners 5.2 New methods For music structure extraction, summary generation For beat detection, rhythm characterization 5.3 Scientific breakthrought New features for structure detection: dynamic features based on tempo New algorithm for sequences detection: maximum likelihood approach New algorithm for onset detection: reassigned spectral energy flux New algorithm periodicity estimation: combined DFT/FM-ACF New algorithm for tempo tracking: Viterbi rhythm based template decoding 6 Conclusion Semantic HIFI workpackage on indexing provides research and modules for the two applications of the project: the authoring tools and the HIFI system. It therefore covers researches on all the major indexing fields of music information retrieval: from the identification of the audio track (audio identification) in order to allow linking the audio to SHF-IST-507913 page 20
  • 21. SHF Public Report Version 1.0 WP2 textual metadata, to the indexing of the content (rhythm, tonality, timbre) in order to allow search by content or by similarity, to the system learning of user-defined descriptions (EDS), to innovative listening/performing paradigms allowed by music structure extraction, lyrics synchronization, re-mixing by source separation. SHF-IST-507913 page 21