Audio Processing and Music Recognition

(Autonomous Institution, approved by UGC and Accredited by NAAC with ‘A’ Grade)
TECHNICAL SEMINAR
Presented by…
Mrinmoy Dalal
CSE A (13311A0506)
16 February 2016

AUDIO PROCESSING
AND MUSIC
RECOGNITION

WHAT IS SOUND
DEFINITION
 Physical - sound as a disturbance in the air
 Psychophysical - sound as perceived by the
ear
 Sound as stimulus (physical event) & sound
as a sensation.
 Pressures changes (in band from 20 Hz to 20
kHz)
 ACOUSTICS is the study of sound.
PHYSICAL TERMS
 Amplitude
 Frequency
 Spectrum
416 February 2016

HOW DO WE HEAR
Ear connected to the brain
left brain: speech
right brain: music
Ear's sensitivity to frequency is logarithmic
Varying frequency response
Dynamic range is about 120 dB (at 3-4 kHz)
Frequency discrimination 2 Hz (at 1 kHz)
Intensity change of 1 dB can be detected.
16 February 2016 5

DIGITAL AUDIO
16 February 2016 6

FUNDA - MENTALS
• Digital audio is sound reproduction using pulse-code modulation and digital
signals
• Digital audio systems include analog-to-digital conversion (ADC), digital-to-
analog conversion (DAC), digital storage, processing and transmission
components
• A primary benefit of digital audio is in its convenience of storage,
transmission and retrieval
• Digital audio is useful in the recording, manipulation, mass-production, and
distribution of sound
• Modern distribution of music across the Internet via on-line stores depends
on digital recording and digital compression algorithms
16 February 2016 7

SOUND : PHYSICAL TO DIGITAL
16 February 2016 8

PULSE CODE MODULATION
PCM consists of three steps to digitize an analog signal
Sampling
Quantization
Binary encoding
16 February 2016 9

PULSE CODE MODULATION
16 February 2016 10

16 February 2016 12
1 song = 27.2 MB
1 GB Hard Drive
($899 in 1995)
Would hold
35 songs

AUDIO COMPRESSION
• Audio data compression, as distinguished from dynamic range
compression, has the potential to reduce the transmission bandwidth
and storage requirements of audio data.
• Audio compression algorithms are implemented in software as audio
codecs.
• Lossy audio compression algorithms provide higher compression at
the cost of fidelity and are used in numerous audio applications.
• Lossless audio compression produces a representation of digital data
that decompress to an exact digital duplicate of the original audio
stream
16 February 2016 13

AUDIO FILE FORMATS
• RIFF (Resource Interchange File
Format)
• MS WAV and .AVI
• MPEGAudio Layer (MPEG) [.mpa,
.mp3]
• AIFC (Apple, SGI) [.aiff, .aif]
• HCOM (Mac) [.hcom]
• SND (Sun, NeXT) [.snd]
• VOC (SoundBlaster card proprietary
standard) [.voc]
• AND MANY OTHERS!
16 February 2016 14

WHAT’S IN A SOUND FILE
• Header Information
• Magic Cookie
• Sampling Rate
• Bits/Sample
• Channels
• Byte Order
• Endian
• Compression type
• Data
16 February 2016 15

AUDIO PROCESSING
16 February 2016 17

AUDIO PROCESSING
• Audio signal processing, sometimes referred to as audio processing,
is the intentional alteration of auditory signals, or sound, often
through an audio effect or effects unit.
• As audio signals may be electronically represented in either digital or
analog format, signal processing may occur in either domain.
• Analog processors operate directly on the electrical signal, while
digital processors operate mathematically on the digital
representation of that signal.
• Processing methods and application areas include storage, level
compression, data compression, transmission, etc.
16 February 2016 18

AUDIO PROCESSING TECHNIQUES
• Equalization
• Modulation
• Delay
• Chorus
• Flanger
• Phaser
• Pitch Shifting
• Time Stretching
• Active Noise Control
16 February 2016 20

AUDIO FINGERPRINTING
16 February 2016 22

16 February 2016 23
An audio fingerprint is essentially a hash function that maps an audio object of a
large number of bits to a ‘fingerprint’ of only a limited number of bits. The audio
object can be uniquely identified from this bit string.
AUDIO FINGERPRINT DEFINITION
F5MB 100KB

AUDIO FINGERPRINTING ARCHITECTURE
16 February 2016 24

CODEC LAYER
16 February 2016 25
i) Samples (unsigned char* samples)
A buffer of the actual data samples (2 bytes or 16 bits per sample)
ii) Byte Order (int byteOrder)
The byte order of the samples in.This can be CONST_LITTLE_ENDIAN or CONST_BIG_ENDIAN
iii) Number of samples (long size)
Number of samples read.
iv) Sample rate (int sRate)
The number of samples per second of audio (samples/sec)
v) Stereo (bool stereo)
Boolean value indicating whether the audio is stereo
Vi) Duration
Duration of the original audio regardless of the number of samples.
Vii) Format
Format of the original audio.This will be expressed as file extensions - .mp3, .wav etc.

FINGERPRINTING LAYER
16 February 2016 26
WAV
(5MB)
fea690b1-b11dce98-a…
(100KB)
Fingerprint layer carries out the core mathematical analysis of the audio, thereby
converting a 5MB audio file into a 100KB fingerprint (bit string)

16 February 2016 27
POST /path/script.cgi HTTP/1.0
From: XYZ@abc.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
client_id=42&fingerprint=fea690b1b11dce98a…
HTTP
POST
Database
XML
<xml version=“1.0” version=“UTF-8” ?>
<metadata fp=“fea690b1b11dce98a…” id=“42”>
<album>Dark Side of the moon</album>
<song>Comfortably Numb</song>
<artist>Pink Floyd</artist>
</metadata>
XML Parser
Album Dark Side of the moon
Song Comfortably Numb
Artist Pink Floyd
PROTOCOL
LAYER

HOW SHAZAM WORKS
1. Beforehand, Shazam fingerprints a comprehensive catalog of
music, and stores the fingerprints in a database.
2. A user “tags” a song they hear, which fingerprints a 10 second
sample of audio.
3. The Shazam app uploads the fingerprint to Shazam’s service, which
runs a search for a matching fingerprint in their database.
4. If a match is found, the song info is returned to the user, otherwise
an error is returned.
16 February 2016 30

SPECTROGRAM FINGERPRINTING
• You can think of any piece of music as a time-
frequency graph called a spectrogram.
• On one axis is time, on another is frequency,
and on the 3rd is intensity.
• Each point on the graph represents the
intensity of a given frequency at a specific
point in time.
• Assuming time is on the x-axis and frequency
is on the y-axis, a horizontal line would
represent a continuous pure tone and a
vertical line would represent an instantaneous
burst of white noise.
16 February 2016 31

SPECTROGRAM FINGERPRINTING
• The Shazam algorithm fingerprints a
song by generating this 3d graph, and
identifying frequencies of “peak
intensity.”
• For each of these peak points it keeps
track of the frequency and the amount
of time from the beginning
16 February 2016 32
Frequency in
Hz
Time in
seconds
823.44 1.054
1892.31 1.321
712.84 1.703
. . . . . .
819.71 9.943

SPECTOGRAM FINGERPRINTING
• Shazam builds their fingerprint
catalog out as a hash table,
where the key is the frequency.
• When Shazam receives a
fingerprint like the one above, it
uses the first key (in this case
823.44), and it searches for all
matching songs. of the track.
16 February 2016 33
Frequency in Hz
Time in seconds,
song information
823.43
53.352, “SongA”
by Artist 1
823.44
34.678, “Song B”
by Artist 2
823.45
108.65, “Song C’
by Artist 3
. . . . . .
1892.31
34.945, “Song B”
by Artist 2

SPECTOGRAM FINGERPRINTING
• If a specific song is hit multiple times, it then checks
to see if these frequencies correspond in time.
• They create a 2d plot of frequency hits, on one axis
is the time from the beginning of the track those
frequencies appear in the song, on the other axis is
the time those frequencies appear in the sample.
• If there is a temporal relation between the sets of
points, then the points will align along a diagonal.
• They use another signal processing method to find
this line, and if it exists with some certainty, then
they label the song a match.
16 February 2016 34

Audio Processing and Music Recognition

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Audio Processing and Music Recognition

Ähnlich wie Audio Processing and Music Recognition (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Audio Processing and Music Recognition