SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Ramin Anushiravani – CS 543 Spring 2015
Department of Electrical and Computer Engineering, College of Engineering, University of Illinois at Urbana-Champaign
Audio Enhancement: A Computer Vision Approach
Aim
Introduction
Many audio enhancements projects can be simplified by some sort
of user interface. One example is removing a specific desired noise
from a recording, which was studied in this project. To illustrate the
goal of this project, imagine having a recording of a live concert or
a lecture and in the middle of your recording someone’s cellphone
rang. There is no easy way of identifying the ringtone as an
undesired noise. One heuristic way of removing the ringtone is to
identify every time-frequency bin of the ringtone in the spectrogram
remove them. You can think of this process as editing an image on
Adobe Photoshop. However, with some help from the user, we can
automate the process of removing/transferring most sounds from
most recordings based on the similarity between the two in the
image spectrogram. The algorithm developed in this project would
ask the user to mimic the noise in the recording which he wants to
remove. The algorithm would then look for the closest match to
users’ input in the time-frequency spectrogram of the noisy
recording.
Motivation
Since spectrogram show us how a sound looks like in time-
frequency domain, we can think of editing the spectrogram of a
sound as editing an image. Being inspired by this idea, I decided to
apply computer vision methods, object recognition, to de-noise a
recording from a desired noise. The problem of finding a specific
noise in a noisy recording is therefore analogous to the problem of
finding a cat in an image with cat/s in it. This is illustrated in the
next section.
(1) (2)
Both of these methods can be speed up using the following trick,
which comes close to the idea of Viola-Jones features. By
convolving the image with the noise object image, we would have a
rough idea of where the image is and so we can limit the scanning
of the image to those areas (white areas in the figure below).
Basically ignoring lots of the patches using a weak classifier first.
• HOG Features
HOG features are descriptors that captures the edge orientation of
an image in a defined sized cell and it is invariant to the scale
transform. HOG features are mainly known for object detection
applications in computer vision. Since they require very careful
tuning and normalizing, I used an outside library VLFeat [2] to
compute HOG features. In this project I used a cell size of 8 and
extract the HOG features of a gray colored image (instead of RGB
color).
After extracting the HOG from each
window in the noisy image and from
defined noise object, we must check
to see which patches are most similar
to the noise object.
Classification
In order to classify each patch of the image, I used two different
methods.
1- K-Nearest Neighbor. Vectorize all the HOG features of the image
into one big matrix. The error function used in K-NN is a Euclidean
distance,
𝑒𝑟𝑟𝑜𝑟 = (𝑣𝑒𝑐 𝑛𝑜𝑖𝑠𝑒ℎ𝑜𝑔
2
−𝑣𝑒𝑐 𝑖𝑚𝑎𝑔𝑒ℎ𝑜𝑔
2
)
This error function seems to give a lot of misclassifications and so I
purpose the following error function for better accuracy.
2- The modified error function is as follows,
𝑒𝑟𝑟𝑜𝑟 = | 𝑛𝑜𝑖𝑠𝑒ℎ𝑜𝑔 − 𝑖𝑚𝑎𝑔𝑒ℎ𝑜𝑔 | / | 𝑛𝑜𝑖𝑠𝑒ℎ𝑜𝑔 |
The latter error function seems to give a much better accuracy in
localizing the noise object.
For example, even though
the audio samples are still
in the spectrogram, we can
barely see the pixels of the
clean signal or the desired
noise.
Where 𝑖𝑛𝑑 𝑦 is a 2 elements vector with the start and end y-position
of the spectrogram, w is the width of the image and the (‘) operator
corresponds to taking the gradient of the image with respect to x
and y positions. α is a threshold factor bigger than one for
determining the major peaks in the mean gradient. The same
procedure can be done over the transpose of the image and sum
over the height of the image to extract the start and end x-position.
I chose a window size of 1024 samples using Hanning window,
with 25% overlap to construct the STFTs and overlap-add for
inverse STFT. I chose “hot” to the power of 0.35 as my colormap.
Object Extraction
When a user is asked to mimic the noise in a noisy signal, there
might be some background noise and most probably many
frequencies that does not correspond to the actual desired noise. In
order to create a better object, stationery noise of the mimicked
noise is removed using a very strong Spectral Subtraction
algorithm [1]. A threshold is then defined to extract just enough
pixel information from the mimicked sound to use as an object. This
is illustrated below.
The resulting objects for
the case of 50% overlap
is shown here. The score
on the top shows the value
of the latter error function.
The resulting object for the
12.5% overlap scanning is
similar
Non Maximum Suppression
The purpose of NMS is to see if the objects found in the image
overlaps or not. If they do, then we pick the one with the highest
score and if they don’t overlap as much we pick both. The figure
below shows the amount of overlap between each patch and the
resulting object. The ones on diagonals are the patch itself.
Example
Object
Noisy Image
We are given an example object by the user, in the case of
images, an example image and in the case of sounds, an example
sound (which can also be mimicked by the user). We can then
localize the noise in the desired noisy signal using object
recognition algorithms.
Noise
Mimicked
By the user
Noisy Spectrogram “Image”
When saving an image on Matlab,
a white area around the image including the
titles are also saved. In order to extract the
spectrogram we can do the following.
User mimicked
noise
After Spectral
Subtraction
Final Object
• Vectorized method
There is also a vectorized way
of finding the most likely
object without having to scan
the image using integral image
and 2D Fourier transform to
speed up the recognition.
This is discussed in details in
the paper.
Pre-processing
From Sound Samples to Image Pixels
When visualizing an audio signal, a time domain representation will
not tell us much about what is going on in the signal. A better
visualization of an audio signal can be done through Short Time
Fourier Transform (STFT). Since the purpose of this project is to
treat an audio as just another image, we should choose a colormap
that makes sense visually.
𝑖𝑛𝑑 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 (
𝑖𝑚𝑎𝑔𝑒′𝑤
𝑖=1
𝑤
>
𝑖𝑚𝑎𝑔𝑒′𝑤
𝑖=1
𝛼𝑤
)
Object Recognition
A common object recognition follows these steps,
• Scan the image with a fixed window at different scales.
• Extract Histogram of Gradients (HOG) features from each
patch.
• Score each patch by comparing it to the object HOG features.
• Perform Non-Maximum Suppression.
The object recognition algorithm in this project also follows these
steps, but because of the user interface we have a few
advantages. Since the user is asked to mimic the noise in the noisy
signal, we know how long the signal is and approximately know the
most important frequencies (fundamental frequency hopefully). As
a result, we know the size of the search window (w, h) and do not
need to search the image spectrogram at different scales.
Scanning the Image
Scanning the image with overlaps can be a very time consuming
task given the implementation and can also affect the accuracy of
the algorithm greatly. I’ve tried multiple ways for scanning the
image spectrogram listed below.
1- At each position, extract four windows with 50% overlap.
2- Extracting windows in a row from an image with 12.5% overlap.
One patch of the noisy signal
Synthesize and Voila!
When resynthesizing the sound, we can either multiply the mask
with the spectrogram of the sound and get rid of the whole
object(right), or we can only subtract the noise template within the
mask from the signal(left).
Ideally, we would hope to subtract
all the noise without subtracting
any of the signal. For future work,
I suggest looking into ways to predict
the most likely pixels inside the
removed noise object. In addition, when localizing a deformed
object (when the user cannot mimicked the noise accurately), it is
important to look for techniques that take this matter into
consideration as well.
ℓ2 ℓ2
I then extracted the object with
The highest overlap (they already
have the highest score).The
resulting object and its mask is
shown below.
This results was improved with the
12.5% overlap and a stronger NMS
Which is discussed in the paper.
Time Domain:
Spectrogram:
Reference
[1] Y. Ephraim and D. Malah “Speech enhancement using a minimum
mean-square error short-time spectral amplitude estimator" // IEEE
Trans. Acoustics, Speech, Signal Processing, vol. 32, pp. 1109- 1121,
Dec. 1984
[2] A. Vedaldi and B. Fulkerso, VLFeat, “An Open and Portable Library of
Computer Vision Algorithms”, 2008, http://www.vlfeat.org/

Weitere ähnliche Inhalte

Was ist angesagt?

Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Zihao(Gerald) Zhang
 

Was ist angesagt? (20)

DIP - Image Restoration
DIP - Image RestorationDIP - Image Restoration
DIP - Image Restoration
 
Cp31608611
Cp31608611Cp31608611
Cp31608611
 
Curved Wavelet Transform For Image Denoising using MATLAB.
Curved Wavelet Transform For Image Denoising using MATLAB.Curved Wavelet Transform For Image Denoising using MATLAB.
Curved Wavelet Transform For Image Denoising using MATLAB.
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Wavelet
WaveletWavelet
Wavelet
 
Image Restoration Using Particle Filters By Improving The Scale Of Texture Wi...
Image Restoration Using Particle Filters By Improving The Scale Of Texture Wi...Image Restoration Using Particle Filters By Improving The Scale Of Texture Wi...
Image Restoration Using Particle Filters By Improving The Scale Of Texture Wi...
 
A novel method for character segmentation of vehicle
A novel method for character segmentation of vehicleA novel method for character segmentation of vehicle
A novel method for character segmentation of vehicle
 
Wavelet Applications in Image Denoising Using MATLAB
Wavelet Applications in Image Denoising Using MATLABWavelet Applications in Image Denoising Using MATLAB
Wavelet Applications in Image Denoising Using MATLAB
 
23 an investigation on image 233 241
23 an investigation on image 233 24123 an investigation on image 233 241
23 an investigation on image 233 241
 
IMAGE SEGMENTATION.
IMAGE SEGMENTATION.IMAGE SEGMENTATION.
IMAGE SEGMENTATION.
 
Lc3618931897
Lc3618931897Lc3618931897
Lc3618931897
 
Image restoration recent_advances_and_applications
Image restoration recent_advances_and_applicationsImage restoration recent_advances_and_applications
Image restoration recent_advances_and_applications
 
Image restoration yogesh 201410048
Image restoration yogesh 201410048Image restoration yogesh 201410048
Image restoration yogesh 201410048
 
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
 
Reduced Ordering Based Approach to Impulsive Noise Suppression in Color Images
Reduced Ordering Based Approach to Impulsive Noise Suppression in Color ImagesReduced Ordering Based Approach to Impulsive Noise Suppression in Color Images
Reduced Ordering Based Approach to Impulsive Noise Suppression in Color Images
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
 
Image processing
Image processingImage processing
Image processing
 
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
 
Chap. 10 computational photography
Chap. 10 computational photographyChap. 10 computational photography
Chap. 10 computational photography
 
Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...
Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...
Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...
 

Ähnlich wie Poster cs543

Paper id 212014133
Paper id 212014133Paper id 212014133
Paper id 212014133
IJRAT
 
Ijarcet vol-2-issue-4-1415-1419
Ijarcet vol-2-issue-4-1415-1419Ijarcet vol-2-issue-4-1415-1419
Ijarcet vol-2-issue-4-1415-1419
Editor IJARCET
 
V2 i2087
V2 i2087V2 i2087
V2 i2087
Rucku
 
noise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logicnoise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logic
Rucku
 

Ähnlich wie Poster cs543 (20)

A computer vision approach to speech enhancement
A computer vision approach to speech enhancementA computer vision approach to speech enhancement
A computer vision approach to speech enhancement
 
Enhanced Optimization of Edge Detection for High Resolution Images Using Veri...
Enhanced Optimization of Edge Detection for High Resolution Images Using Veri...Enhanced Optimization of Edge Detection for High Resolution Images Using Veri...
Enhanced Optimization of Edge Detection for High Resolution Images Using Veri...
 
Road signs detection using voila jone's algorithm with the help of opencv
Road signs detection using voila jone's algorithm with the help of opencvRoad signs detection using voila jone's algorithm with the help of opencv
Road signs detection using voila jone's algorithm with the help of opencv
 
3 ijaems nov-2015-6-development of an advanced technique for historical docum...
3 ijaems nov-2015-6-development of an advanced technique for historical docum...3 ijaems nov-2015-6-development of an advanced technique for historical docum...
3 ijaems nov-2015-6-development of an advanced technique for historical docum...
 
J010245458
J010245458J010245458
J010245458
 
Paper id 212014133
Paper id 212014133Paper id 212014133
Paper id 212014133
 
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
 
Hardware Unit for Edge Detection with Comparative Analysis of Different Edge ...
Hardware Unit for Edge Detection with Comparative Analysis of Different Edge ...Hardware Unit for Edge Detection with Comparative Analysis of Different Edge ...
Hardware Unit for Edge Detection with Comparative Analysis of Different Edge ...
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
 
Ijarcet vol-2-issue-4-1415-1419
Ijarcet vol-2-issue-4-1415-1419Ijarcet vol-2-issue-4-1415-1419
Ijarcet vol-2-issue-4-1415-1419
 
N046047780
N046047780N046047780
N046047780
 
IRJET- A Review on Various Restoration Techniques in Digital Image Processing
IRJET- A Review on Various Restoration Techniques in Digital Image ProcessingIRJET- A Review on Various Restoration Techniques in Digital Image Processing
IRJET- A Review on Various Restoration Techniques in Digital Image Processing
 
reducing noises in images
reducing noises in imagesreducing noises in images
reducing noises in images
 
Survey Paper on Image Denoising Using Spatial Statistic son Pixel
Survey Paper on Image Denoising Using Spatial Statistic son PixelSurvey Paper on Image Denoising Using Spatial Statistic son Pixel
Survey Paper on Image Denoising Using Spatial Statistic son Pixel
 
FPGA Implementation of Decision Based Algorithm for Removal of Impulse Noise
FPGA Implementation of Decision Based Algorithm for Removal of Impulse NoiseFPGA Implementation of Decision Based Algorithm for Removal of Impulse Noise
FPGA Implementation of Decision Based Algorithm for Removal of Impulse Noise
 
motion and feature based person tracking in survillance videos
motion and feature based person tracking in survillance videosmotion and feature based person tracking in survillance videos
motion and feature based person tracking in survillance videos
 
V2 i2087
V2 i2087V2 i2087
V2 i2087
 
noise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logicnoise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logic
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
 

Mehr von Ramin Anushiravani (8)

recommender_systems
recommender_systemsrecommender_systems
recommender_systems
 
Techfest jan17
Techfest jan17Techfest jan17
Techfest jan17
 
Sound Source Localization with microphone arrays
Sound Source Localization with microphone arraysSound Source Localization with microphone arrays
Sound Source Localization with microphone arrays
 
Beamforming and microphone arrays
Beamforming and microphone arraysBeamforming and microphone arrays
Beamforming and microphone arrays
 
3D audio
3D audio3D audio
3D audio
 
3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cues3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cues
 
3D Spatial Response
3D Spatial Response3D Spatial Response
3D Spatial Response
 
example based audio editing
example based audio editingexample based audio editing
example based audio editing
 

Kürzlich hochgeladen

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 

Poster cs543

  • 1. Ramin Anushiravani – CS 543 Spring 2015 Department of Electrical and Computer Engineering, College of Engineering, University of Illinois at Urbana-Champaign Audio Enhancement: A Computer Vision Approach Aim Introduction Many audio enhancements projects can be simplified by some sort of user interface. One example is removing a specific desired noise from a recording, which was studied in this project. To illustrate the goal of this project, imagine having a recording of a live concert or a lecture and in the middle of your recording someone’s cellphone rang. There is no easy way of identifying the ringtone as an undesired noise. One heuristic way of removing the ringtone is to identify every time-frequency bin of the ringtone in the spectrogram remove them. You can think of this process as editing an image on Adobe Photoshop. However, with some help from the user, we can automate the process of removing/transferring most sounds from most recordings based on the similarity between the two in the image spectrogram. The algorithm developed in this project would ask the user to mimic the noise in the recording which he wants to remove. The algorithm would then look for the closest match to users’ input in the time-frequency spectrogram of the noisy recording. Motivation Since spectrogram show us how a sound looks like in time- frequency domain, we can think of editing the spectrogram of a sound as editing an image. Being inspired by this idea, I decided to apply computer vision methods, object recognition, to de-noise a recording from a desired noise. The problem of finding a specific noise in a noisy recording is therefore analogous to the problem of finding a cat in an image with cat/s in it. This is illustrated in the next section. (1) (2) Both of these methods can be speed up using the following trick, which comes close to the idea of Viola-Jones features. By convolving the image with the noise object image, we would have a rough idea of where the image is and so we can limit the scanning of the image to those areas (white areas in the figure below). Basically ignoring lots of the patches using a weak classifier first. • HOG Features HOG features are descriptors that captures the edge orientation of an image in a defined sized cell and it is invariant to the scale transform. HOG features are mainly known for object detection applications in computer vision. Since they require very careful tuning and normalizing, I used an outside library VLFeat [2] to compute HOG features. In this project I used a cell size of 8 and extract the HOG features of a gray colored image (instead of RGB color). After extracting the HOG from each window in the noisy image and from defined noise object, we must check to see which patches are most similar to the noise object. Classification In order to classify each patch of the image, I used two different methods. 1- K-Nearest Neighbor. Vectorize all the HOG features of the image into one big matrix. The error function used in K-NN is a Euclidean distance, 𝑒𝑟𝑟𝑜𝑟 = (𝑣𝑒𝑐 𝑛𝑜𝑖𝑠𝑒ℎ𝑜𝑔 2 −𝑣𝑒𝑐 𝑖𝑚𝑎𝑔𝑒ℎ𝑜𝑔 2 ) This error function seems to give a lot of misclassifications and so I purpose the following error function for better accuracy. 2- The modified error function is as follows, 𝑒𝑟𝑟𝑜𝑟 = | 𝑛𝑜𝑖𝑠𝑒ℎ𝑜𝑔 − 𝑖𝑚𝑎𝑔𝑒ℎ𝑜𝑔 | / | 𝑛𝑜𝑖𝑠𝑒ℎ𝑜𝑔 | The latter error function seems to give a much better accuracy in localizing the noise object. For example, even though the audio samples are still in the spectrogram, we can barely see the pixels of the clean signal or the desired noise. Where 𝑖𝑛𝑑 𝑦 is a 2 elements vector with the start and end y-position of the spectrogram, w is the width of the image and the (‘) operator corresponds to taking the gradient of the image with respect to x and y positions. α is a threshold factor bigger than one for determining the major peaks in the mean gradient. The same procedure can be done over the transpose of the image and sum over the height of the image to extract the start and end x-position. I chose a window size of 1024 samples using Hanning window, with 25% overlap to construct the STFTs and overlap-add for inverse STFT. I chose “hot” to the power of 0.35 as my colormap. Object Extraction When a user is asked to mimic the noise in a noisy signal, there might be some background noise and most probably many frequencies that does not correspond to the actual desired noise. In order to create a better object, stationery noise of the mimicked noise is removed using a very strong Spectral Subtraction algorithm [1]. A threshold is then defined to extract just enough pixel information from the mimicked sound to use as an object. This is illustrated below. The resulting objects for the case of 50% overlap is shown here. The score on the top shows the value of the latter error function. The resulting object for the 12.5% overlap scanning is similar Non Maximum Suppression The purpose of NMS is to see if the objects found in the image overlaps or not. If they do, then we pick the one with the highest score and if they don’t overlap as much we pick both. The figure below shows the amount of overlap between each patch and the resulting object. The ones on diagonals are the patch itself. Example Object Noisy Image We are given an example object by the user, in the case of images, an example image and in the case of sounds, an example sound (which can also be mimicked by the user). We can then localize the noise in the desired noisy signal using object recognition algorithms. Noise Mimicked By the user Noisy Spectrogram “Image” When saving an image on Matlab, a white area around the image including the titles are also saved. In order to extract the spectrogram we can do the following. User mimicked noise After Spectral Subtraction Final Object • Vectorized method There is also a vectorized way of finding the most likely object without having to scan the image using integral image and 2D Fourier transform to speed up the recognition. This is discussed in details in the paper. Pre-processing From Sound Samples to Image Pixels When visualizing an audio signal, a time domain representation will not tell us much about what is going on in the signal. A better visualization of an audio signal can be done through Short Time Fourier Transform (STFT). Since the purpose of this project is to treat an audio as just another image, we should choose a colormap that makes sense visually. 𝑖𝑛𝑑 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 ( 𝑖𝑚𝑎𝑔𝑒′𝑤 𝑖=1 𝑤 > 𝑖𝑚𝑎𝑔𝑒′𝑤 𝑖=1 𝛼𝑤 ) Object Recognition A common object recognition follows these steps, • Scan the image with a fixed window at different scales. • Extract Histogram of Gradients (HOG) features from each patch. • Score each patch by comparing it to the object HOG features. • Perform Non-Maximum Suppression. The object recognition algorithm in this project also follows these steps, but because of the user interface we have a few advantages. Since the user is asked to mimic the noise in the noisy signal, we know how long the signal is and approximately know the most important frequencies (fundamental frequency hopefully). As a result, we know the size of the search window (w, h) and do not need to search the image spectrogram at different scales. Scanning the Image Scanning the image with overlaps can be a very time consuming task given the implementation and can also affect the accuracy of the algorithm greatly. I’ve tried multiple ways for scanning the image spectrogram listed below. 1- At each position, extract four windows with 50% overlap. 2- Extracting windows in a row from an image with 12.5% overlap. One patch of the noisy signal Synthesize and Voila! When resynthesizing the sound, we can either multiply the mask with the spectrogram of the sound and get rid of the whole object(right), or we can only subtract the noise template within the mask from the signal(left). Ideally, we would hope to subtract all the noise without subtracting any of the signal. For future work, I suggest looking into ways to predict the most likely pixels inside the removed noise object. In addition, when localizing a deformed object (when the user cannot mimicked the noise accurately), it is important to look for techniques that take this matter into consideration as well. ℓ2 ℓ2 I then extracted the object with The highest overlap (they already have the highest score).The resulting object and its mask is shown below. This results was improved with the 12.5% overlap and a stronger NMS Which is discussed in the paper. Time Domain: Spectrogram: Reference [1] Y. Ephraim and D. Malah “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator" // IEEE Trans. Acoustics, Speech, Signal Processing, vol. 32, pp. 1109- 1121, Dec. 1984 [2] A. Vedaldi and B. Fulkerso, VLFeat, “An Open and Portable Library of Computer Vision Algorithms”, 2008, http://www.vlfeat.org/