SlideShare ist ein Scribd-Unternehmen logo
1 von 37
(Autonomous Institution, approved by UGC and Accredited by NAAC with ‘A’ Grade)
TECHNICAL SEMINAR
Presented by…
Mrinmoy Dalal
CSE A (13311A0506)
16 February 2016
AUDIO PROCESSING
AND MUSIC
RECOGNITION
SOUND
316 February 2016
WHAT IS SOUND
DEFINITION
 Physical - sound as a disturbance in the air
 Psychophysical - sound as perceived by the
ear
 Sound as stimulus (physical event) & sound
as a sensation.
 Pressures changes (in band from 20 Hz to 20
kHz)
 ACOUSTICS is the study of sound.
PHYSICAL TERMS
 Amplitude
 Frequency
 Spectrum
416 February 2016
HOW DO WE HEAR
Ear connected to the brain
left brain: speech
right brain: music
Ear's sensitivity to frequency is logarithmic
Varying frequency response
Dynamic range is about 120 dB (at 3-4 kHz)
Frequency discrimination 2 Hz (at 1 kHz)
Intensity change of 1 dB can be detected.
16 February 2016 5
DIGITAL AUDIO
16 February 2016 6
FUNDA - MENTALS
• Digital audio is sound reproduction using pulse-code modulation and digital
signals
• Digital audio systems include analog-to-digital conversion (ADC), digital-to-
analog conversion (DAC), digital storage, processing and transmission
components
• A primary benefit of digital audio is in its convenience of storage,
transmission and retrieval
• Digital audio is useful in the recording, manipulation, mass-production, and
distribution of sound
• Modern distribution of music across the Internet via on-line stores depends
on digital recording and digital compression algorithms
16 February 2016 7
SOUND : PHYSICAL TO DIGITAL
16 February 2016 8
PULSE CODE MODULATION
PCM consists of three steps to digitize an analog signal
Sampling
Quantization
Binary encoding
16 February 2016 9
PULSE CODE MODULATION
16 February 2016 10
SAMPLING
16 February 2016 11
16 February 2016 12
1 song = 27.2 MB
1 GB Hard Drive
($899 in 1995)
Would hold
35 songs
AUDIO COMPRESSION
• Audio data compression, as distinguished from dynamic range
compression, has the potential to reduce the transmission bandwidth
and storage requirements of audio data.
• Audio compression algorithms are implemented in software as audio
codecs.
• Lossy audio compression algorithms provide higher compression at
the cost of fidelity and are used in numerous audio applications.
• Lossless audio compression produces a representation of digital data
that decompress to an exact digital duplicate of the original audio
stream
16 February 2016 13
AUDIO FILE FORMATS
• RIFF (Resource Interchange File
Format)
• MS WAV and .AVI
• MPEGAudio Layer (MPEG) [.mpa,
.mp3]
• AIFC (Apple, SGI) [.aiff, .aif]
• HCOM (Mac) [.hcom]
• SND (Sun, NeXT) [.snd]
• VOC (SoundBlaster card proprietary
standard) [.voc]
• AND MANY OTHERS!
16 February 2016 14
WHAT’S IN A SOUND FILE
• Header Information
• Magic Cookie
• Sampling Rate
• Bits/Sample
• Channels
• Byte Order
• Endian
• Compression type
• Data
16 February 2016 15
16 February 2016 16
AUDIO PROCESSING
16 February 2016 17
AUDIO PROCESSING
• Audio signal processing, sometimes referred to as audio processing,
is the intentional alteration of auditory signals, or sound, often
through an audio effect or effects unit.
• As audio signals may be electronically represented in either digital or
analog format, signal processing may occur in either domain.
• Analog processors operate directly on the electrical signal, while
digital processors operate mathematically on the digital
representation of that signal.
• Processing methods and application areas include storage, level
compression, data compression, transmission, etc.
16 February 2016 18
16 February 2016 19
AUDIO PROCESSING TECHNIQUES
• Equalization
• Modulation
• Delay
• Chorus
• Flanger
• Phaser
• Pitch Shifting
• Time Stretching
• Active Noise Control
16 February 2016 20
MUSIC
RECOGNITION
AUDIO FINGERPRINTING
16 February 2016 22
16 February 2016 23
An audio fingerprint is essentially a hash function that maps an audio object of a
large number of bits to a ‘fingerprint’ of only a limited number of bits. The audio
object can be uniquely identified from this bit string.
AUDIO FINGERPRINT DEFINITION
F5MB 100KB
AUDIO FINGERPRINTING ARCHITECTURE
16 February 2016 24
CODEC LAYER
16 February 2016 25
i) Samples (unsigned char* samples)
A buffer of the actual data samples (2 bytes or 16 bits per sample)
ii) Byte Order (int byteOrder)
The byte order of the samples in.This can be CONST_LITTLE_ENDIAN or CONST_BIG_ENDIAN
iii) Number of samples (long size)
Number of samples read.
iv) Sample rate (int sRate)
The number of samples per second of audio (samples/sec)
v) Stereo (bool stereo)
Boolean value indicating whether the audio is stereo
Vi) Duration
Duration of the original audio regardless of the number of samples.
Vii) Format
Format of the original audio.This will be expressed as file extensions - .mp3, .wav etc.
FINGERPRINTING LAYER
16 February 2016 26
WAV
(5MB)
fea690b1-b11dce98-a…
(100KB)
Fingerprint layer carries out the core mathematical analysis of the audio, thereby
converting a 5MB audio file into a 100KB fingerprint (bit string)
16 February 2016 27
POST /path/script.cgi HTTP/1.0
From: XYZ@abc.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
client_id=42&fingerprint=fea690b1b11dce98a…
HTTP
POST
Database
XML
<xml version=“1.0” version=“UTF-8” ?>
<metadata fp=“fea690b1b11dce98a…” id=“42”>
<album>Dark Side of the moon</album>
<song>Comfortably Numb</song>
<artist>Pink Floyd</artist>
</metadata>
XML Parser
Album Dark Side of the moon
Song Comfortably Numb
Artist Pink Floyd
PROTOCOL
LAYER
16 February 2016 29
HOW SHAZAM WORKS
1. Beforehand, Shazam fingerprints a comprehensive catalog of
music, and stores the fingerprints in a database.
2. A user “tags” a song they hear, which fingerprints a 10 second
sample of audio.
3. The Shazam app uploads the fingerprint to Shazam’s service, which
runs a search for a matching fingerprint in their database.
4. If a match is found, the song info is returned to the user, otherwise
an error is returned.
16 February 2016 30
SPECTROGRAM FINGERPRINTING
• You can think of any piece of music as a time-
frequency graph called a spectrogram.
• On one axis is time, on another is frequency,
and on the 3rd is intensity.
• Each point on the graph represents the
intensity of a given frequency at a specific
point in time.
• Assuming time is on the x-axis and frequency
is on the y-axis, a horizontal line would
represent a continuous pure tone and a
vertical line would represent an instantaneous
burst of white noise.
16 February 2016 31
SPECTROGRAM FINGERPRINTING
• The Shazam algorithm fingerprints a
song by generating this 3d graph, and
identifying frequencies of “peak
intensity.”
• For each of these peak points it keeps
track of the frequency and the amount
of time from the beginning
16 February 2016 32
Frequency in
Hz
Time in
seconds
823.44 1.054
1892.31 1.321
712.84 1.703
. . . . . .
819.71 9.943
SPECTOGRAM FINGERPRINTING
• Shazam builds their fingerprint
catalog out as a hash table,
where the key is the frequency.
• When Shazam receives a
fingerprint like the one above, it
uses the first key (in this case
823.44), and it searches for all
matching songs. of the track.
16 February 2016 33
Frequency in Hz
Time in seconds,
song information
823.43
53.352, “SongA”
by Artist 1
823.44
34.678, “Song B”
by Artist 2
823.45
108.65, “Song C’
by Artist 3
. . . . . .
1892.31
34.945, “Song B”
by Artist 2
SPECTOGRAM FINGERPRINTING
• If a specific song is hit multiple times, it then checks
to see if these frequencies correspond in time.
• They create a 2d plot of frequency hits, on one axis
is the time from the beginning of the track those
frequencies appear in the song, on the other axis is
the time those frequencies appear in the sample.
• If there is a temporal relation between the sets of
points, then the points will align along a diagonal.
• They use another signal processing method to find
this line, and if it exists with some certainty, then
they label the song a match.
16 February 2016 34
16 February 2016 35
16 February 2016 36
16 February 2016 37

Weitere ähnliche Inhalte

Was ist angesagt?

Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 
Microphones-Type, Characteristics, design, Applications, techniques, Patterns
Microphones-Type, Characteristics, design, Applications, techniques, PatternsMicrophones-Type, Characteristics, design, Applications, techniques, Patterns
Microphones-Type, Characteristics, design, Applications, techniques, PatternsSankaranarayanan K B
 
Analogue & Digital
Analogue & DigitalAnalogue & Digital
Analogue & Digitalk13086
 
Ppt on audio file formats
Ppt on audio file formatsPpt on audio file formats
Ppt on audio file formatsIshank Ranjan
 
signal and channel bandwidth
signal and channel bandwidthsignal and channel bandwidth
signal and channel bandwidthAparnaLal2
 
Audio compression
Audio compressionAudio compression
Audio compressionSahil Garg
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IAmr E. Mohamed
 
Musical sound processing
Musical sound processingMusical sound processing
Musical sound processingHasnainRabby1
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression StandardsAjay
 
Audio format
Audio formatAudio format
Audio formatavid
 

Was ist angesagt? (20)

Audio and Video Streaming
Audio and Video StreamingAudio and Video Streaming
Audio and Video Streaming
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Digital video
Digital videoDigital video
Digital video
 
Microphones-Type, Characteristics, design, Applications, techniques, Patterns
Microphones-Type, Characteristics, design, Applications, techniques, PatternsMicrophones-Type, Characteristics, design, Applications, techniques, Patterns
Microphones-Type, Characteristics, design, Applications, techniques, Patterns
 
Analogue & Digital
Analogue & DigitalAnalogue & Digital
Analogue & Digital
 
Ppt on audio file formats
Ppt on audio file formatsPpt on audio file formats
Ppt on audio file formats
 
Analog Video
Analog Video Analog Video
Analog Video
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
signal and channel bandwidth
signal and channel bandwidthsignal and channel bandwidth
signal and channel bandwidth
 
Video Compression
Video CompressionVideo Compression
Video Compression
 
Mpeg 2
Mpeg 2Mpeg 2
Mpeg 2
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Line coding
Line codingLine coding
Line coding
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
Digital Audio in Multimedia
Digital Audio in MultimediaDigital Audio in Multimedia
Digital Audio in Multimedia
 
Audio file format
Audio file formatAudio file format
Audio file format
 
Musical sound processing
Musical sound processingMusical sound processing
Musical sound processing
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression Standards
 
Digital Audio
Digital AudioDigital Audio
Digital Audio
 
Audio format
Audio formatAudio format
Audio format
 

Ähnlich wie Audio Processing and Music Recognition

Recording Devices
Recording DevicesRecording Devices
Recording Devicesbsutton
 
Audio Mastering
Audio MasteringAudio Mastering
Audio MasteringJoe Nasr
 
Archiving and disseminating sound archives – 3. Analysis and treatment of sou...
Archiving and disseminating sound archives – 3. Analysis and treatment of sou...Archiving and disseminating sound archives – 3. Analysis and treatment of sou...
Archiving and disseminating sound archives – 3. Analysis and treatment of sou...Phonothèque MMSH
 
JoeWangVisitingScholarProjectSummary
JoeWangVisitingScholarProjectSummaryJoeWangVisitingScholarProjectSummary
JoeWangVisitingScholarProjectSummaryJoe Wang
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheetluisfvazquez1
 
Digital Electronics for Audio
Digital Electronics for AudioDigital Electronics for Audio
Digital Electronics for Audiojazztothebone
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsstephlizahawkins123
 
Future Proof Surround Sound Mixing using Ambisonics
Future Proof Surround Sound Mixing using AmbisonicsFuture Proof Surround Sound Mixing using Ambisonics
Future Proof Surround Sound Mixing using AmbisonicsBruce Wiggins
 
How to play audio from a microcontroller
How to play audio from a microcontrollerHow to play audio from a microcontroller
How to play audio from a microcontrollerMahadev Gopalakrishnan
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preiviousPhillipWynne12281991
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUNDazira96
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 

Ähnlich wie Audio Processing and Music Recognition (20)

Soundpres
SoundpresSoundpres
Soundpres
 
CHAPTER – 5 Audio
CHAPTER – 5     AudioCHAPTER – 5     Audio
CHAPTER – 5 Audio
 
Recording Devices
Recording DevicesRecording Devices
Recording Devices
 
Audio Mastering
Audio MasteringAudio Mastering
Audio Mastering
 
Optical recording and reproduction
Optical recording and reproductionOptical recording and reproduction
Optical recording and reproduction
 
Chap65
Chap65Chap65
Chap65
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
 
Archiving and disseminating sound archives – 3. Analysis and treatment of sou...
Archiving and disseminating sound archives – 3. Analysis and treatment of sou...Archiving and disseminating sound archives – 3. Analysis and treatment of sou...
Archiving and disseminating sound archives – 3. Analysis and treatment of sou...
 
Audio recordings
Audio recordingsAudio recordings
Audio recordings
 
JoeWangVisitingScholarProjectSummary
JoeWangVisitingScholarProjectSummaryJoeWangVisitingScholarProjectSummary
JoeWangVisitingScholarProjectSummary
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Digital Electronics for Audio
Digital Electronics for AudioDigital Electronics for Audio
Digital Electronics for Audio
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkins
 
Future Proof Surround Sound Mixing using Ambisonics
Future Proof Surround Sound Mixing using AmbisonicsFuture Proof Surround Sound Mixing using Ambisonics
Future Proof Surround Sound Mixing using Ambisonics
 
How to play audio from a microcontroller
How to play audio from a microcontrollerHow to play audio from a microcontroller
How to play audio from a microcontroller
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preivious
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUND
 
audio digital.pdf
audio digital.pdfaudio digital.pdf
audio digital.pdf
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 

Kürzlich hochgeladen

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Kürzlich hochgeladen (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Audio Processing and Music Recognition

  • 1. (Autonomous Institution, approved by UGC and Accredited by NAAC with ‘A’ Grade) TECHNICAL SEMINAR Presented by… Mrinmoy Dalal CSE A (13311A0506) 16 February 2016
  • 4. WHAT IS SOUND DEFINITION  Physical - sound as a disturbance in the air  Psychophysical - sound as perceived by the ear  Sound as stimulus (physical event) & sound as a sensation.  Pressures changes (in band from 20 Hz to 20 kHz)  ACOUSTICS is the study of sound. PHYSICAL TERMS  Amplitude  Frequency  Spectrum 416 February 2016
  • 5. HOW DO WE HEAR Ear connected to the brain left brain: speech right brain: music Ear's sensitivity to frequency is logarithmic Varying frequency response Dynamic range is about 120 dB (at 3-4 kHz) Frequency discrimination 2 Hz (at 1 kHz) Intensity change of 1 dB can be detected. 16 February 2016 5
  • 7. FUNDA - MENTALS • Digital audio is sound reproduction using pulse-code modulation and digital signals • Digital audio systems include analog-to-digital conversion (ADC), digital-to- analog conversion (DAC), digital storage, processing and transmission components • A primary benefit of digital audio is in its convenience of storage, transmission and retrieval • Digital audio is useful in the recording, manipulation, mass-production, and distribution of sound • Modern distribution of music across the Internet via on-line stores depends on digital recording and digital compression algorithms 16 February 2016 7
  • 8. SOUND : PHYSICAL TO DIGITAL 16 February 2016 8
  • 9. PULSE CODE MODULATION PCM consists of three steps to digitize an analog signal Sampling Quantization Binary encoding 16 February 2016 9
  • 10. PULSE CODE MODULATION 16 February 2016 10
  • 12. 16 February 2016 12 1 song = 27.2 MB 1 GB Hard Drive ($899 in 1995) Would hold 35 songs
  • 13. AUDIO COMPRESSION • Audio data compression, as distinguished from dynamic range compression, has the potential to reduce the transmission bandwidth and storage requirements of audio data. • Audio compression algorithms are implemented in software as audio codecs. • Lossy audio compression algorithms provide higher compression at the cost of fidelity and are used in numerous audio applications. • Lossless audio compression produces a representation of digital data that decompress to an exact digital duplicate of the original audio stream 16 February 2016 13
  • 14. AUDIO FILE FORMATS • RIFF (Resource Interchange File Format) • MS WAV and .AVI • MPEGAudio Layer (MPEG) [.mpa, .mp3] • AIFC (Apple, SGI) [.aiff, .aif] • HCOM (Mac) [.hcom] • SND (Sun, NeXT) [.snd] • VOC (SoundBlaster card proprietary standard) [.voc] • AND MANY OTHERS! 16 February 2016 14
  • 15. WHAT’S IN A SOUND FILE • Header Information • Magic Cookie • Sampling Rate • Bits/Sample • Channels • Byte Order • Endian • Compression type • Data 16 February 2016 15
  • 18. AUDIO PROCESSING • Audio signal processing, sometimes referred to as audio processing, is the intentional alteration of auditory signals, or sound, often through an audio effect or effects unit. • As audio signals may be electronically represented in either digital or analog format, signal processing may occur in either domain. • Analog processors operate directly on the electrical signal, while digital processors operate mathematically on the digital representation of that signal. • Processing methods and application areas include storage, level compression, data compression, transmission, etc. 16 February 2016 18
  • 20. AUDIO PROCESSING TECHNIQUES • Equalization • Modulation • Delay • Chorus • Flanger • Phaser • Pitch Shifting • Time Stretching • Active Noise Control 16 February 2016 20
  • 23. 16 February 2016 23 An audio fingerprint is essentially a hash function that maps an audio object of a large number of bits to a ‘fingerprint’ of only a limited number of bits. The audio object can be uniquely identified from this bit string. AUDIO FINGERPRINT DEFINITION F5MB 100KB
  • 25. CODEC LAYER 16 February 2016 25 i) Samples (unsigned char* samples) A buffer of the actual data samples (2 bytes or 16 bits per sample) ii) Byte Order (int byteOrder) The byte order of the samples in.This can be CONST_LITTLE_ENDIAN or CONST_BIG_ENDIAN iii) Number of samples (long size) Number of samples read. iv) Sample rate (int sRate) The number of samples per second of audio (samples/sec) v) Stereo (bool stereo) Boolean value indicating whether the audio is stereo Vi) Duration Duration of the original audio regardless of the number of samples. Vii) Format Format of the original audio.This will be expressed as file extensions - .mp3, .wav etc.
  • 26. FINGERPRINTING LAYER 16 February 2016 26 WAV (5MB) fea690b1-b11dce98-a… (100KB) Fingerprint layer carries out the core mathematical analysis of the audio, thereby converting a 5MB audio file into a 100KB fingerprint (bit string)
  • 27. 16 February 2016 27 POST /path/script.cgi HTTP/1.0 From: XYZ@abc.com User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 32 client_id=42&fingerprint=fea690b1b11dce98a… HTTP POST Database XML <xml version=“1.0” version=“UTF-8” ?> <metadata fp=“fea690b1b11dce98a…” id=“42”> <album>Dark Side of the moon</album> <song>Comfortably Numb</song> <artist>Pink Floyd</artist> </metadata> XML Parser Album Dark Side of the moon Song Comfortably Numb Artist Pink Floyd PROTOCOL LAYER
  • 28.
  • 30. HOW SHAZAM WORKS 1. Beforehand, Shazam fingerprints a comprehensive catalog of music, and stores the fingerprints in a database. 2. A user “tags” a song they hear, which fingerprints a 10 second sample of audio. 3. The Shazam app uploads the fingerprint to Shazam’s service, which runs a search for a matching fingerprint in their database. 4. If a match is found, the song info is returned to the user, otherwise an error is returned. 16 February 2016 30
  • 31. SPECTROGRAM FINGERPRINTING • You can think of any piece of music as a time- frequency graph called a spectrogram. • On one axis is time, on another is frequency, and on the 3rd is intensity. • Each point on the graph represents the intensity of a given frequency at a specific point in time. • Assuming time is on the x-axis and frequency is on the y-axis, a horizontal line would represent a continuous pure tone and a vertical line would represent an instantaneous burst of white noise. 16 February 2016 31
  • 32. SPECTROGRAM FINGERPRINTING • The Shazam algorithm fingerprints a song by generating this 3d graph, and identifying frequencies of “peak intensity.” • For each of these peak points it keeps track of the frequency and the amount of time from the beginning 16 February 2016 32 Frequency in Hz Time in seconds 823.44 1.054 1892.31 1.321 712.84 1.703 . . . . . . 819.71 9.943
  • 33. SPECTOGRAM FINGERPRINTING • Shazam builds their fingerprint catalog out as a hash table, where the key is the frequency. • When Shazam receives a fingerprint like the one above, it uses the first key (in this case 823.44), and it searches for all matching songs. of the track. 16 February 2016 33 Frequency in Hz Time in seconds, song information 823.43 53.352, “SongA” by Artist 1 823.44 34.678, “Song B” by Artist 2 823.45 108.65, “Song C’ by Artist 3 . . . . . . 1892.31 34.945, “Song B” by Artist 2
  • 34. SPECTOGRAM FINGERPRINTING • If a specific song is hit multiple times, it then checks to see if these frequencies correspond in time. • They create a 2d plot of frequency hits, on one axis is the time from the beginning of the track those frequencies appear in the song, on the other axis is the time those frequencies appear in the sample. • If there is a temporal relation between the sets of points, then the points will align along a diagonal. • They use another signal processing method to find this line, and if it exists with some certainty, then they label the song a match. 16 February 2016 34