SlideShare ist ein Scribd-Unternehmen logo
1 von 18
DNN-based permutation solver for
frequency-domain independent component
analysis in two-source mixture case
Shuhei Yamaji and Daichi Kitamura
National Institute of Technology, Kagawa College
Japan
12th Asia-Pacific Signal and Information Processing
Association (APSIPA)
1
Introduction
 About audio source separation
 Applications of audio source separation
– Speech recognition
– Noise canceling
– Voice command device etc.
Nice to
meet you...
Hello…
Hello…
Nice to
meet you...
Audio
source
separation
2
Blind Source Separation
 Independent component analysis (ICA) [Comon, 1994]
⁃ Assumes independence between source signals
⁃ Estimates demixing matrix without knowing mixing matrix
Actual audio mixing in reverberant environment
⁃ Convolution with room impulse responses between sources mics
⁃ Extend ICA to the frequency domain
Source signal Mixture signal Estimated signal
3
Frequency-Domain ICA
 Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– Apply ICA in each frequency bin
Spectrogram
ICA1
ICA2
ICA3
…
…
ICA
Frequency
bin
Time frame
…
Inverse matrix
Frequency-wise
mixing matrix
Frequency-wise
demixing matrix
4
Permutation Problem in FDICA
 Permutation problem in frequency-domain ICA
– Order of separated signals in each frequency is messed up
– Separated components must be aligned along the frequency axis
FDICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Estimated signal 1
Estimated signal 2
Non-aligned signal
Permutation
Solver
Time
5
 Popular permutation solvers
– Based on Temporal Structures
• FDICA + correlation-based alignment between adjacent
frequencies [Murata+, 2001]
– Based on direction of arrival (DOA)
• Frequency-domain ICA + DOA alignment [Saruwatari+, 2006]
– Based on a relative correlation among frequencies
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim+, 2006]
– Based on a low-rank modeling of each source
• Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016]
Conventional Permutation Solvers
Time
…
…
Sort
Non-aligned signal Non-aligned signal
6
 Problems of conventional permutation solvers
– Correlation-based method sometimes
fails to align components
– Even in IVA and ILRMA,
block permutation problem arise
 Proposed method: DNN-based permutation solver
– The permutation problems can be simulated by shuffling the
frequency components of source signals
– Training data for DNN are easy to produce
Motivation of Proposed Method
Non-aligned
signal
Non-aligned
signal
Time
Separated
signal
Separated
signal
DNN
DNN
7
Proposed method: DNN input and label
 Input and label
– Extract two short-time activations of reference and another
frequencies from the separated signal
– DNN predicts whether the permutation of input two frequencies is
correct (correct=0 and incorrect=1)
8
DNN
Correct permutation case Incorrect permutation case
DNN
Reference
Another
Reference
Another
 Simple Model
– 6 hidden layers with ReLU or Sigmoid functions
Proposed method: DNN Architecture
Hidden
Layer
1
(128
units)
ReLU
Hidden
Layer
2
(128
units)
ReLU
Hidden
Layer
3
(128
units)
ReLU
Hidden
Layer
4
(64
units)
ReLU
Hidden
Layer
5
(64
units)
ReLU
Hidden
Layer
6
(1
units)
Sigmoid
Output
Layer
(1
units)
Target
label
(1
units)
Input
Layer
(160
units)
Minimum
MSE
0
or
1
9
 Apply DNN in subband frequency (local time-frequency area)
– Subband: Reference (center) frequency several frequencies
 Take majority decision along time frames
– to determine the subband permutation vector
Proposed method: DNN predictions in subband frequency bins
DNN output
Input vector
1 : Different sound source
1 : Different sound source
0 : Same sound source
1 : Different sound source
0 : Same sound source
10
Subband
permutation
vectorにして
おく
Proposed method: construct a fullband permutation vector
 Alignment among subbands
– When the subband slides along frequency axis, the reference
(center) frequency component changes
• The meanings of “0 (same)” and “1 (different)” labels are not
shared among subbands
– The orders of source components in all subbands must be aligned
after the DNN prediction in all subbands
11
Proposed method: construct a fullband permutation vector
 Objective
– Estimate “fullband permutation vector” that corresponds the two
sources to “0” and “1”
 Step1
– The subband permutation vector of the lowest frequency subband is
simply set to the corresponding frequency bins in the fullband
permutation vector
Time
Frequency
1
1
0
1
0
1
1
0
1
0
1
1
0
1
0
1. Set
Fullband
permutation
vector
2. Set
12
 Step2
– Slide the subband frequencies
– Obtain the subband permutation vector of the current subband and
its binary complement vector
– The similarity between subband and fullband permutation vectors are
measured by mean squared error (MSE)
– Set the subband vector that minimize MSE to the memory
– Update fullband permutation vector by taking majority decision
Proposed method: construct a fullband permutation vector
Time
Frequency
1
0
0
1
0
1
1
0
1
0
0
1
1
0
1
0
1
1
0
1
0
2. Set
0
1
1
0
1
1. Similarity comparison
3.
Majority
decision
Fullband
permutation
vector
13
Proposed method: construct a fullband permutation vector
 Step3
– Iterate step2 up to the highest frequency subband
– Replace the components based on the fullband permutation vector
– Obtain permutation-aligned estimated signals
1
1
0
1
0
0
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
1
0
1
0
Majority
decision
Time
Frequency
Replace
Fullband
permutation
vector
Fullband
Vector
14
Experimental conditions
Training speech
signals
Dry sources: JVS corpus [Takamichi+, 2019] (Japanese speech)
Mixture: Convolve dry sources with RWCP impulse responses [Nakamura+, 2000]
Permutation: apply FDICA and randomly shuffling the components
Test speech
signals
Speech signals obtained from SiSEC2011 UND task [Araki+, 2012]
FFT length 8192 (512 ms, Humming window)
Shift length 2048
Subjective
evaluation
Average improvement of signal-to-distortion ratio (SDR)
Reverberation Time
15
Results
 Findings
– Proposed method achieves an improvement of about 8 dB
– ILRMA's separation performance is about 4dB
– The proposed method is close to the upper-limit performance
0
2
4
6
8
10
12
FDICA
with IPS
ILRMA
(2 bases)
ILRMA
(3 bases)
ILRMA
(4 bases)
Proposed
method
SDR
improvement
[dB]
Good
Poor
ILRMA
(2 bases)
FDICA with
ideal
permutation
solver
(reference score)
ILRMA
(3 bases)
ILRMA
(4 bases)
FDICA with
DNN-based
permutation
solver
(proposed)
16
Conclusion
 In this paper
– We proposed a new DNN-based permutation solver for determined
audio source separation using FDICA
– An SDR improvement of about 8 dB was achieved in experiments
with a highly reverberant speech mixture signal
 Future work
– The proposed method creates a combinatorial explosion for three or
more separated signals
17
Thank you for your attention!
Demonstration
Original
Mixture
FDICA with IPS
FDICA with
proposed method
18

Weitere ähnliche Inhalte

Was ist angesagt?

Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
Davide Nardone
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
NAVER Engineering
 
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
NAVER Engineering
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
chintanajoshi
 

Was ist angesagt? (20)

Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency Filtering
 
Ibfd presentation
Ibfd presentationIbfd presentation
Ibfd presentation
 
Timing synchronization F Ling_v1.2
Timing synchronization F Ling_v1.2Timing synchronization F Ling_v1.2
Timing synchronization F Ling_v1.2
 
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
 
Sampling
SamplingSampling
Sampling
 
Sampling and Reconstruction of Signal using Aliasing
Sampling and Reconstruction of Signal using AliasingSampling and Reconstruction of Signal using Aliasing
Sampling and Reconstruction of Signal using Aliasing
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
 

Ähnlich wie DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

Applications of ann_in_microwave_engineering
Applications of ann_in_microwave_engineeringApplications of ann_in_microwave_engineering
Applications of ann_in_microwave_engineering
prasadhegdegn
 
BEC604 -COMMUNICATION ENG II.pdf
BEC604 -COMMUNICATION ENG II.pdfBEC604 -COMMUNICATION ENG II.pdf
BEC604 -COMMUNICATION ENG II.pdf
Sunil Manjani
 
Final presentation
Final presentationFinal presentation
Final presentation
Rohan Lad
 
Lte course
Lte courseLte course
Lte course
Ali Kamil
 
2015-04 PhD defense
2015-04 PhD defense2015-04 PhD defense
2015-04 PhD defense
Nil Garcia
 

Ähnlich wie DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case (20)

DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
Applications of ann_in_microwave_engineering
Applications of ann_in_microwave_engineeringApplications of ann_in_microwave_engineering
Applications of ann_in_microwave_engineering
 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdf
 
10-cdma_mobile_communication_and_IS-95.ppt
10-cdma_mobile_communication_and_IS-95.ppt10-cdma_mobile_communication_and_IS-95.ppt
10-cdma_mobile_communication_and_IS-95.ppt
 
10-cdma.ppt
10-cdma.ppt10-cdma.ppt
10-cdma.ppt
 
10-cdma.ppt
10-cdma.ppt10-cdma.ppt
10-cdma.ppt
 
Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...
 
Introduction to adaptive filtering and its applications.ppt
Introduction to adaptive filtering and its applications.pptIntroduction to adaptive filtering and its applications.ppt
Introduction to adaptive filtering and its applications.ppt
 
BEC604 -COMMUNICATION ENG II.pdf
BEC604 -COMMUNICATION ENG II.pdfBEC604 -COMMUNICATION ENG II.pdf
BEC604 -COMMUNICATION ENG II.pdf
 
Lecture 9
Lecture 9Lecture 9
Lecture 9
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Pulse Code Modulation
Pulse Code ModulationPulse Code Modulation
Pulse Code Modulation
 
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
 
Equalization with the help of Non OMA.pptx
Equalization with the help of Non OMA.pptxEqualization with the help of Non OMA.pptx
Equalization with the help of Non OMA.pptx
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Performance Comparison of Modified Variable Step Size Leaky LMS Algorithm for...
Performance Comparison of Modified Variable Step Size Leaky LMS Algorithm for...Performance Comparison of Modified Variable Step Size Leaky LMS Algorithm for...
Performance Comparison of Modified Variable Step Size Leaky LMS Algorithm for...
 
Lte course
Lte courseLte course
Lte course
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
 
2015-04 PhD defense
2015-04 PhD defense2015-04 PhD defense
2015-04 PhD defense
 

Mehr von Kitamura Laboratory

Mehr von Kitamura Laboratory (20)

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズム
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離
 
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure models
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
 

KĂźrzlich hochgeladen

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
SUHANI PANDEY
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
KreezheaRecto
 

KĂźrzlich hochgeladen (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 

DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

  • 1. DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case Shuhei Yamaji and Daichi Kitamura National Institute of Technology, Kagawa College Japan 12th Asia-Pacific Signal and Information Processing Association (APSIPA) 1
  • 2. Introduction  About audio source separation  Applications of audio source separation – Speech recognition – Noise canceling – Voice command device etc. Nice to meet you... Hello… Hello… Nice to meet you... Audio source separation 2
  • 3. Blind Source Separation  Independent component analysis (ICA) [Comon, 1994] ⁃ Assumes independence between source signals ⁃ Estimates demixing matrix without knowing mixing matrix Actual audio mixing in reverberant environment ⁃ Convolution with room impulse responses between sources mics ⁃ Extend ICA to the frequency domain Source signal Mixture signal Estimated signal 3
  • 4. Frequency-Domain ICA  Frequency-domain ICA (FDICA) [Smaragdis, 1998] – Apply ICA in each frequency bin Spectrogram ICA1 ICA2 ICA3 … … ICA Frequency bin Time frame … Inverse matrix Frequency-wise mixing matrix Frequency-wise demixing matrix 4
  • 5. Permutation Problem in FDICA  Permutation problem in frequency-domain ICA – Order of separated signals in each frequency is messed up – Separated components must be aligned along the frequency axis FDICA All frequency components Source 1 Source 2 Observed 1 Observed 2 Estimated signal 1 Estimated signal 2 Non-aligned signal Permutation Solver Time 5
  • 6.  Popular permutation solvers – Based on Temporal Structures • FDICA + correlation-based alignment between adjacent frequencies [Murata+, 2001] – Based on direction of arrival (DOA) • Frequency-domain ICA + DOA alignment [Saruwatari+, 2006] – Based on a relative correlation among frequencies • Independent vector analysis (IVA) [Hiroe, 2006], [Kim+, 2006] – Based on a low-rank modeling of each source • Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016] Conventional Permutation Solvers Time … … Sort Non-aligned signal Non-aligned signal 6
  • 7.  Problems of conventional permutation solvers – Correlation-based method sometimes fails to align components – Even in IVA and ILRMA, block permutation problem arise  Proposed method: DNN-based permutation solver – The permutation problems can be simulated by shuffling the frequency components of source signals – Training data for DNN are easy to produce Motivation of Proposed Method Non-aligned signal Non-aligned signal Time Separated signal Separated signal DNN DNN 7
  • 8. Proposed method: DNN input and label  Input and label – Extract two short-time activations of reference and another frequencies from the separated signal – DNN predicts whether the permutation of input two frequencies is correct (correct=0 and incorrect=1) 8 DNN Correct permutation case Incorrect permutation case DNN Reference Another Reference Another
  • 9.  Simple Model – 6 hidden layers with ReLU or Sigmoid functions Proposed method: DNN Architecture Hidden Layer 1 (128 units) ReLU Hidden Layer 2 (128 units) ReLU Hidden Layer 3 (128 units) ReLU Hidden Layer 4 (64 units) ReLU Hidden Layer 5 (64 units) ReLU Hidden Layer 6 (1 units) Sigmoid Output Layer (1 units) Target label (1 units) Input Layer (160 units) Minimum MSE 0 or 1 9
  • 10.  Apply DNN in subband frequency (local time-frequency area) – Subband: Reference (center) frequency several frequencies  Take majority decision along time frames – to determine the subband permutation vector Proposed method: DNN predictions in subband frequency bins DNN output Input vector 1 : Different sound source 1 : Different sound source 0 : Same sound source 1 : Different sound source 0 : Same sound source 10 Subband permutation vectorにして おく
  • 11. Proposed method: construct a fullband permutation vector  Alignment among subbands – When the subband slides along frequency axis, the reference (center) frequency component changes • The meanings of “0 (same)” and “1 (different)” labels are not shared among subbands – The orders of source components in all subbands must be aligned after the DNN prediction in all subbands 11
  • 12. Proposed method: construct a fullband permutation vector  Objective – Estimate “fullband permutation vector” that corresponds the two sources to “0” and “1”  Step1 – The subband permutation vector of the lowest frequency subband is simply set to the corresponding frequency bins in the fullband permutation vector Time Frequency 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1. Set Fullband permutation vector 2. Set 12
  • 13.  Step2 – Slide the subband frequencies – Obtain the subband permutation vector of the current subband and its binary complement vector – The similarity between subband and fullband permutation vectors are measured by mean squared error (MSE) – Set the subband vector that minimize MSE to the memory – Update fullband permutation vector by taking majority decision Proposed method: construct a fullband permutation vector Time Frequency 1 0 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 2. Set 0 1 1 0 1 1. Similarity comparison 3. Majority decision Fullband permutation vector 13
  • 14. Proposed method: construct a fullband permutation vector  Step3 – Iterate step2 up to the highest frequency subband – Replace the components based on the fullband permutation vector – Obtain permutation-aligned estimated signals 1 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 1 0 Majority decision Time Frequency Replace Fullband permutation vector Fullband Vector 14
  • 15. Experimental conditions Training speech signals Dry sources: JVS corpus [Takamichi+, 2019] (Japanese speech) Mixture: Convolve dry sources with RWCP impulse responses [Nakamura+, 2000] Permutation: apply FDICA and randomly shuffling the components Test speech signals Speech signals obtained from SiSEC2011 UND task [Araki+, 2012] FFT length 8192 (512 ms, Humming window) Shift length 2048 Subjective evaluation Average improvement of signal-to-distortion ratio (SDR) Reverberation Time 15
  • 16. Results  Findings – Proposed method achieves an improvement of about 8 dB – ILRMA's separation performance is about 4dB – The proposed method is close to the upper-limit performance 0 2 4 6 8 10 12 FDICA with IPS ILRMA (2 bases) ILRMA (3 bases) ILRMA (4 bases) Proposed method SDR improvement [dB] Good Poor ILRMA (2 bases) FDICA with ideal permutation solver (reference score) ILRMA 3 bases) ILRMA 4 bases) FDICA with DNN-based permutation solver (proposed) 16
  • 17. Conclusion  In this paper – We proposed a new DNN-based permutation solver for determined audio source separation using FDICA – An SDR improvement of about 8 dB was achieved in experiments with a highly reverberant speech mixture signal  Future work – The proposed method creates a combinatorial explosion for three or more separated signals 17 Thank you for your attention!

Hinweis der Redaktion

  1. Hello everyone, I’m Shuei Yamaji at National Institute of Technology, Kagawa College, Japan. In this presentation, we talk about DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case.
  2. This presentation deals with audio source separation, / which is a technique to separate sounds from a mixture signal / into individual audio sources. This technology can be used to many audio applications, / such as / speech recognition, / noise canceling, / voice command device, / and so on.
  3. The popular approach for audio source separation is / independent component analysis, / ICA in short. ICA assumes independence between sources / and estimates demixing matrix W / without knowing mixing matrix A. This is represented in this figure. The source signals, / s1 and s2, / are mixed by A, / then / observed as x1 and x2. W / can separate the sources in x / if W is an inverse matrix of A / as y1 and y2. Of cource we don’t know the mixing matrix A, / so / ICA estimates W using statistical independence between sources. In actual situation (シテュエイション), audio signals are mixed with room reverberations as a convolutive mixture, / and simple ICA cannot separate in that situation. To solve this problem, frequency-domain ICA, / FDICA in short, / was proposed. 01:00
  4. This figure represents the mixture signals in time-frequency domain, / which are obtained by short-time Fourier transform. In FDICA, / simple ICA / is applied to each frequency bin / like this figure. Therefore, / the demixing matrix W must be estimated in each frequency bin / to achieve the source separation.
  5. However, / since ICA cannot determine the order of the separated signals, / the output components of FDICA are not aligned like this, / and we have to re-order these separated red and blue components along frequency axis. This is the so-called permutation problem. Thus, a permutation solver must be applied after FDICA as post processing. In this presentation, / we aim to solve the permutation problem over all frequency bins / using a new, / data-driven approach.
  6. A major approach to solving the permutation problem / is based on temporal structures of the separated components We can re-order the components based on the correlation values between adjacent frequencies. (しっかり間を開ける) When the positions of microphones are known, / the direction of arrivals of the sources / can also be utilized / for solving the permutation problem. In recent years, / algorithms without encountering the permutation problem / have been proposed. For example, both independent vector analysis, / IVA, / and independent low-rank matrix analysis, / ILRMA (アイルーマ), / estimate the frequency-wise demixing matrices / avoiding the permutation problem. ILRMA(アイルーマ) is a state-of-the-art algorithm for blind audio source separation.
  7. OK, let’s talk about our proposed method. This slide explains our motivation. The conventional correlation-based permutation solver / sometimes fails to align components correctly. Even in IVA or ILRMA (アイルーマ), / the components are sometimes misaligned in blocks, / which is called the block permutation problem, / like this figure. To achieve a stable and accurate permutation solver, / in this presentation, / we propose a DNN-based permutation solver, / where the training data for DNN permutation solver / can easily be obtained. This is because the permutation problem can be simulated by randomly shuffling the frequency components of source signals.
  8. In this slide, / we explain the input vector for the proposed DNN model(マドー). In our DNN model (マドー), / first, / we extract / two short-time activations of reference / and another frequencies / from the separated signal. These activations are concatenated (カンカーテネイテッド)as a single vector like this, / and input to the DNN. Then, / DNN predicts whether the permutation of input two frequencies is correct, / where “zero”(ジロー)means that the current permutation is correct, / and “one” means they are inverted. In the left-side figure, / the reference frequency is red and blue, / and another frequency is also red and blue. So, / the current permutation is correct, / and its label should be zero(ジロー). In the right-side figure, / the reference frequency is red and blue, / but another frequency is blue and red. Therefore, / the current permutation is wrong, / and its label(レイブーゥ)should be one.
  9. This figure depicts an architecture of DNN used in the proposed permutation solver. This DNN model has full-connected 6 hidden layers, / and its structure is very simple.
  10. Hereafter, / we consider the process in a sort-time subband frequency, / where the subband consists of reference frequency and plus-minus several frequencies. In the proposed method, / we perform the DNN-based permutation prediction for all the combinations of reference and another frequencies, / where the reference frequency is fixed to the center of the subband. In this figure, / the reference frequency is f3, / and fixed. Another frequency is chosen from f1 to f5, / and all the combinations are input to DNN like this. Thus, / we obtain these DNN outputs. Since the correct permutation / does not depend on time, / we stride this short-time subband in time axis, / and collect DNN outputs like this figure. Finally, we take a majority decision with the collected DNN outputs, / and obtain a subband permutation vector.
  11. After the estimation of subband permutation vector, / we slide the subband along the frequency axis / like this figure. However, / since the center frequency of the subband is always set to the reference frequency, / the meanings of the labels (レイブーゥス) “zero” and “one” are not shared / among subbands. This is because the DNN outputs mean that / the components of reference and another frequencies are the same or different. For this reason, / even if the subband components are aligned by the subband permutation vector, / the order of sources / could be different among the subbands / like this figure. To solve this problem, it is necessary to unify the results for all the subband vectors, / for example, / 0 indicates a red source and 1 indicates a blue source in all the subbands.
  12. This label(レイブーゥ) unification / can be achieved by the following 3 steps. The objective of the following steps is that / we estimate a fullband permutation vector, / which corresponds the red and blue sources / to “zero” and “one,” respectively. In the first step, / as shown in this figure, / the subband permutation vector in the lowest subband is simply set to the corresponding frequency bins / in the fullband permutation vector.
  13. In step 2, / we slide the subband from the previous one / and obtain the subband permutation vector in that subband. We also calculate the binary complement vector of the subband permutation vector / like this. These two vectors are compared with the corresponding parts of the fullband vectors using mean square error, / then the vector that minimizes the error is selected and stored in the memory. The fullband permutation vector is updated by taking a majority decision / using the vectors stored in the memory.
  14. By repeating the process of the step 2, the complete fullband vector can be obtained. Finally, / the permutation problem can be solved by replacing the frequency-wise source components based on the estimated fullband vector.
  15. Let’s move on / to the experiments. This table(テイボーゥ)shows the conditions. In this experiment, / as a training dataset, / we used JVS corpus, / which is a Japanese speech dataset, / as dry sources, / and we mix them using impulse responses. The permutation problem is simulated by randomly shuffling the frequency-wise components of the sources. The test speech dataset is obtained from SiSEC UND task. The bottom figure shows the impulse responses / used in this experiment, / where the reverberation time is 470 ms.
  16. Here is the result of the experiment. The vertical axis shows an average SDR improvement, / which shows the accuracy of the source separation. The leftmost one is an FDICA with ideal permutation solver, / namely, / the permutation is perfectly solved by using the completely separated source signals. So, this is an upper-bound score of the FDICA-based methods. ILRMA(アイルーマ) is the state-of-the-art blind source separation method. Since the reverberation time is long in this experiment, / the performance of ILRMA is not so high. The rightmost one is our proposed method, / where the DNN-based permutation solver is applied after FDICA. The proposed method achieves 8 dB improvement in SDR, / which is close to the upper-limit.
  17. This is the conclusion (カンクルージョン). Thank you for your attention.