SlideShare a Scribd company logo
1 of 47
Blind source separation based on
independent low-rank matrix analysis and
its extension to Student's t-distribution
Télécom ParisTech Visiting
September 4th
The University of Tokyo, Japan
Project Research Associate
Daichi Kitamura
• Name: Daichi Kitamura
• Age: 27 (born in 1990)
– Kagawa Pref. in Japan
• Background:
– NAIST, Japan
• Master degree (received in 2014)
– SOKENDAI, Japan
• Ph.D. degree (received in 2017)
– The University of Tokyo, Japan
• Project Research Associate
• Research topics
– Acoustic signal processing, statistical signal processing,
audio source separation, etc.
Self introduction
2
Japan
Kagawa
Tokyo
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Related Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Student’s t source model with TF-varying scale parameters
• Conclusion
3
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Related Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Student’s t source model with TF-varying scale parameters
• Conclusion
4
• Blind source separation (BSS) for audio signals
– separates original audio sources
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
– can be available for many audio app.
• Consider only “determined” situation
Background
5
Recording mixture Separated guitar
BSS
Sources Observed Estimated
Mixing system Demixing system
# of mics
# of sources
• Basic theories and their evolution
History of BSS for audio signals
6
1994
1998
2013
1999
2012
Age
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA
Nonnegative matrix factorization (NMF)
Motivation of ILRMA
• Conventional BSS techniques based on ICA
–  Minimum distortion (linear demixing)
–  Relatively fast and stable optimization
• FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary
function technique [N. Ono+, 2010], [N. Ono, 2011]
–  Could not use “specific” assumption of sources
• Only assumes non-Gaussian p.d.f. for sources
–  Permutation problem is crucial and still difficult to solve
• IVA often fails causing a “block permutation problem” [Y. Liang+, 2012]
• Better to use a “specific source model” in TF domain
– Independent low-rank matrix analysis (ILRMA) employs
a low-rank property 7
: frequency bins
Observed
signal
Source signalsFrequency-wise mixing matrix
: time frames
Estimated
signal
Frequency-wise demixing matrix
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Related Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Student’s t source model with TF-varying scale parameters
• Conclusion
8
• Independent component analysis (ICA)[P. Comon, 1994]
– estimates without knowing
– Source model (scalar)
• is non-Gaussian and mutually independent
– Spatial model
• Mixing system is a time-invariant matrix
• Mixing system in audio signals
– Convolutive mixture with room reverberation
Related methods: ICA
9
Mixing
matrix
Demixing
matrix
Source model
Sources Observed Estimated
Spatial model
• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
– estimates frequency-wise demixing matrix
– Source model (scalar)
• is complex-valued,
non-Gaussian, and
mutually independent
– Spatial model
• Frequency-wise mixing
matrix is time-invariant
– Instantaneous mixture in each frequency band
– A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010]
• Permutation problem?
– Order of estimated signals cannot be determined by ICA
– Alignment of frequency-wise estimated signals is required
• Many permutation solvers were proposed
Related methods: FDICA
10
Spectrograms
ICA1
…
Frequencybin
Time frame
…
ICA2
ICA I
• FDICA requires signal alignment for all frequency
– Order of estimated signals cannot be determined by ICA*
Permutation problem
11
ICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Permutation
Solver
Estimated signal 1
Estimated signal 2
Time
*Signal scale should also be restored by a back-projection technique
Related methods: IVA
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
– extends ICA to multivariate probabilistic model to consider
sourcewise frequency vector as a variable
– Source model (vector)
• is multivariate, spherical, complex-valued, non-Gaussian, and
mutually independent
– Spatial model
• Mixing system is a time-invariant matrix (rank-1 spatial model) 12
…
…
Mixing matrix
…
…
…
Observed vector
Demixing matrix
Estimated vector
Multivariate non-
Gaussian dist.
Have higher-order
correlations
Permutation-free estimation of is achieved!
Source vector
• Spherical multivariate distribution[T. Kim, 2007]
• Why spherical distribution?
– Frequency bands that have similar activations will be merged
together as one source avoid permutation problem
Higher-order correlation assumed in IVA
13
x1 and x2 are mutually independent
Spherical
Laplace dist.
Mutually
independent two
Laplace dist.s
x1 and x2 have higher-order correlation
Probability depends on
only the norm
• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
Comparison of source models
14
Observed
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Estimated
Demixing
matrix
Current
empirical dist.
Non-Gaussian
source dist.
STFT
Frequency
Time
Frequency
Time
Observed Estimated
Current
empirical dist.
STFT
Frequency
Time
Frequency
Time
Non-Gaussian
spherical
source dist.
Scalar r.v.s
Vector
(multivariate) r.v.s
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Mixture is close to Gaussian
signal because of CLT
Source obeys non-
Gaussian dist.
Mutually
independent
Demixing
matrix Mutually
independent
Related method: NMF
• Nonnegative matrix factorization (NMF) [D. D. Lee, 1999]
– Low-rank decomposition with nonnegative constraint
• Limited number of nonnegative bases and their coefficients
– Spectrogram is decomposed in acoustic signal processing
• Frequently appearing spectral patterns and their activations
15
Amplitude Amplitude
Nonnegative matrix
(power spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(time-varying gains)
Time
: # of freq. bins
: # of time frames
: # of bases
Time
Frequency
Frequency
• ISNMF[C. Févotte, 2009]
– can be decomposed using “stable property” of
• If we define ,
Related method: ISNMF
16
Equivalent Circularly symmetric complex Gaussian dist.
Complex-valued observed signal
Nonnegative variance
Variance is also decomposed!
• Power spectrogram corresponds to variances in TF
plane
Related method: ISNMF
17
Frequencybin
Time frame
: Power spectrogram
Small value of power
Large value of power
Complex Gaussian distribution with TF-varying variance
If we marginalize in terms of time or frequency, the distribution
becomes non-Gaussian even though each TF grid is defined in
Gaussian distribution
Grayscale shows the
value of variance
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Related Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Student’s t source model with TF-varying scale parameters
• Conclusion
18
Extension of source model in IVA
• Source model in IVA
– has a frequency-uniform scale
• Multivariate Laplace with fixed scale
• Since scale cannot be determined, it is
not equivalent to the flat spectral basis
– Almost an NMF with only one basis
• Extend to ISNMF-based source model
– NMF with arbitrary number of bases
• can represent complicated TF structures
– can learn “co-occurrence” of each
source in TF domain
• Co-occurrence is captured as the variance
– The structure can easily be estimated
by NMF
19
Frequency
Time
Frequency
Time
• Spherical Laplace distribution in IVA
• Gaussian distribution with TF-varying variance in
ISNMF[C. Févotte+, 2009]
20
Frequency-uniform scale
Extension of source model in IVA
Complex-valued
Gaussian in each TF bin
Low-rank decomposition
with NMF
Spherical Laplace (bivariate)
Frequency vector
(I-dimensional)
Time-frequency-varying variance
Time-frequency matrix
(IJ-dimensional)
• Negative log-likelihood in ILRMA
Cost function in ILRMA and partitioning function
21
All the variables can easily be
optimized by an alternative update
Update rules in ICA
Update rules in ISNMF
Estimated signal:
Cost function in ICA
(estimates demixing matrix)
Cost function in ISNMF
(estimates low-rank source model)
Update rules of ILRMA
• ML-based iterative update rules
– Update rule for is based on iterative projection [N. Ono, 2011]
– Update rules for NMF variables is based on MM algorithm
– Pseudo code is available at
• http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 22
Spatial model
(demixing matrix)
Source model
(NMF source model)
where
and is a one-hot vector
that has 1 at th element
• ILRMA with partitioning function
– Appropriate number of bases for each source can
automatically be determined
– Useful when various types of sources are mixed
• Ex. drums are very low-rank but vocals are not so low-rank
Cost function in ILRMA and partitioning function
23
andwhere
Update rules of ILRMA
• ML-based iterative update rules
– Update rule for is based on iterative projection [N. Ono, 2011]
– Update rules for NMF variables is based on MM algorithm
24
Spatial model
(demixing matrix)
Source model
(NMF source model)
where
and is a one-hot vector
that has 1 at th element
Optimization process in ILRMA
• Demixing matrix and source model are alternatively
updated
– The precise modeling of low-rank TF structures will
improve the estimation accuracy of demixing matrix
25
Estimating
demixing matrix
Mixture
Separated
Source model
Update
NMF
NMF
Estimating
NMF variables
Comparison of source models
26
FDICA source model
Non-Gaussian scalar variable
IVA source model
Non-Gaussian vector variable
with higher-order correlation
ILRMA source model
Non-Gaussian matrix variable
with low-rank time-frequency
structure
Rank of TF matrix
of mixture
Rank of TF matrix
of each source
• Multichannel NMF[A. Ozerov+, 2010], [H. Sawada+, 2013]
Multichannel extension of NMF
27
Spatial covariances in
each time-frequency slot
Observed
multichannel signal
Spatial covariances
of each source Basis matrix Activation matrix
Spatial model Source model
Partitioning function
Spectral patterns
Gains
Spatial property of each source Timber patterns of all sources
Multichannel
vector
Simultaneous spatial covariance
Relationship b/w ILRMA and multichannel NMF
• Difference b/w ILRMA and multichannel NMF?
– Source distribution: complex Gaussian distribution (same)
– ILRMA assumes
– Multichannel NMF assumes full-rank spatial covariance
• Assumption: rank-1 spatial model
– Spatial covariance of each source is rank-1 matrix
– Equivalent to simultaneous mixing assumption
28
Sourcewise steering vector
,
Relationship b/w ILRMA and multichannel NMF
• Multichannel NMF with rank-1 spatial model
30
Substitute into the cost function
Transform the variables as
Relationship b/w MNMF, IVA, and ILRMA
• From multichannel NMF side,
– Rank-1 spatial model is introduced, transform the problem
from the estimation of mixing system to that of demixing
matrix
• From IVA side,
– Increase the number of spectral bases in source model
31
Source model
Spatialmodel
FlexibleLimited
FlexibleLimited
IVA
Multichannel
NMF
ILRMA
NMF source
model
Rank-1 spatial
model
Experimental evaluation
• Conditions
32
Source signals
Music signals obtained from SiSEC
Convolve impulse response, two microphones and two sources
Window length 512 ms of Hamming window
Shift length 128 ms (1/4 shift)
Number of bases
30 per each source (ILRMA w/o partitioning function)
60 for all source (ILRMA with partitioning function)
Evaluation score Improvement ot signal-to-distortion ratio (SDR)
2 m
Source 1
5.66cm
50 50
Source 2
2 m
Source 1
5.66cm
60 60
Source 2
Impulse response E2A
(reverberation time: 300 ms)
Impulse response JR2
(reverberation time: 470 ms)
Results: fort_minor-remember_the_name
33
16
12
8
4
0
-4
-8
SDRimprovement[dB]
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
16
12
8
4
0
-4
-8
SDRimprovement[dB]
Violin synth. Vocals
Violin synth. Vocals
E2A
(300 ms)
JR2
(470 ms)
Poor
Good
Poor
Good
Results: ultimate_nz_tour
34
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
20
15
10
5
0
-5
SDRimprovement[dB]
Sawada’s
MNMF
IVA Ozerov’s
MNMF
Ozerov’s
MNMF with
random
initialization
Sawada’s
MNMF
initialized
by ILRMA
ILRMA w/o
partitioning
function
ILRMA with
partitioning
function
Directional
clustering
20
15
10
5
0
-5
SDRimprovement[dB]
Guitar Synth.
Guitar Synth.
Poor
Good
Poor
Good
E2A
(300 ms)
JR2
(470 ms)
• Signal length: 14 s
12
10
8
6
4
2
0
-2
SDRimprovement[dB]
4003002001000
Iteration steps
IVA
MNMF
ILRMA
ILRMA
Results: bearlin-roads
35
without Z
with Z
11.5 s
15.1 s 60.7 s
7647.3 s
Poor
Good
Demonstration: music source separation
• Music source separation
36
Guitar
Vocal
Keyboard
Guitar
Vocal
Keyboard
Source
separation
Pay attention to listen
three parts in the mixture
Another demo is available at http://d-kitamura.net/en/index_en.html
• Source model based on Symmetric a-stable (SaS)
distribution[A. Liutkus+, 2015], [U. Şimşekli+, 2015], [S. Leglaive+, 2017], [M. Fontaine+, 2017]
– which can validate the decomposition of complex-valued
r.v.s as the decomposition of their parameters
– Heavy tail (sparse) when a approaches to 0
• Student’s t-distribution is also used as a source
model[C. Févotte+, 2006], [K. Yoshii+, 2016], [K. Kitamura+, 2016], [S. Leglaive+, 2017]
– that includes Cauchy distribution ( ) and Gaussian
distribution ( )
Stable and Student’s t-distributions
37
SaS (stable family)
Student’s t (partially stable)
Cauchy Gauss
Source model of Student’s t-distribution
• Degree-of-freedom parameter
– Heavy tail when
approaches to 0
• Complex Student’s t-dist.
– Circularly symmetric
– Student’s t NMF (t-NMF) [K. Yoshii+ 2016]
38
Defined in each TF slot
Scale corresponds to NMF model
Phase is assumed to be uniform
Motivation for using Student’s t-dist.
• Better separation with t-NMF was reported[K. Yoshii+, 2016]
– in a very simple experiment using
only C4, E4, and G4 piano tones
• NMF with heavy tail distribution
– tends to provide excessive low-rank
approximation
• Sparse components (which may increase
the rank of model data) are considered as
outliers
• ILRMA based on Student’s t source model (t-ILRMA)
– may improve the separation accuracy by forcing NMF
source model to be excessively low-rank
– will be presented at MLSP2017! (preprint is available on arXiv)
• https://arxiv.org/abs/1708.04795 39
• th power spectrogram corresponds to scales in TF
plane
Source model based on Student’s t-distribution
40
Frequencybin
Time frame
: th power spectrogram
Small value of power
Large value of power
Complex Student’s t-distribution with TF-varying scale
Grayscale shows the
value of scale
• Negative log-likelihood in ILRMA
Cost function in ILRMA based on Student’s t-dist.
41
Gaussian ILRMA
modeling power
spectrogram by variance
Student’s t ILRMA
modeling pth power
spectrogram by scale
Generalization
of p.d.f. and
model domain
Experimental results: randomized t-ILRMA
• Examples
– Improved when
– Stable when
but score
is not sufficient
– Root spectrogram
( ) is
preferable for
speech signals
• In the case of
– Source model is
over-fitted to mixture
42
Music signals
Speech signals
Tempering parameter
• Random initialization (previous result)
• Initialization based on Gaussian ILRMA
– (Tempering approach of parameter)
43
t-ILRMA
(iteration: 200)
Identity matrix
Uniform
random values
Gauss ILRMA
(iteration: 100)
Identity matrix
Uniform
random values
t-ILRMA
(iteration: 100)
t-NMF
(iteration: 100)Uniform
random values
arbitrary val.
Experimental results: initialized t-ILRMA
• Examples
– Improved for all
value of
– Could avoid over-
fitting problem in
the case
• Best parameter?
– Completely
depending on
data
44
Music signals
Speech signals
Average results: music signals
45
Average results: speech signals
46
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Related Methods
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Student’s t source model with TF-varying scale parameters
• Conclusion
47
Conclusion
• Independent low-rank matrix analysis (ILRMA)
– Assumption
• Statistical independence between sources
• Low-rank time-frequency structure of each source
– Equivalent to multichannel NMF
• when the mixing assumption is valid
• Student’s t-distribution is newly introduced
– including two symmetric a-stable distributions
• Complex Cauchy distribution ( )
• Complex Gaussian distribution ( )
• Further extensions
– Relaxation of rank-1 spatial model?
– Employ another distribution?
– Supervised ILRMA? User-guided ILRMA? 48

More Related Content

What's hot

【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...
【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...
【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...Deep Learning JP
 
複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査Tomoki Hayashi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka
 
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離奈良先端大 情報科学研究科
 
独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展Kitamura Laboratory
 
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法Daichi Kitamura
 
深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術NU_I_TODALAB
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)Daichi Kitamura
 
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...Daichi Kitamura
 
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...Daichi Kitamura
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audioDeep Learning JP
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamformingShinnosuke Takamichi
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversionYuki Saito
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離NU_I_TODALAB
 
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
 
WaveNetが音声合成研究に与える影響
WaveNetが音声合成研究に与える影響WaveNetが音声合成研究に与える影響
WaveNetが音声合成研究に与える影響NU_I_TODALAB
 

What's hot (20)

【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...
【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...
【DL輪読会】“Gestalt Principles Emerge When Learning Universal Sound Source Separa...
 
複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
 
独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展
 
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
 
深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
 
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
 
Ea2015 7for ss
Ea2015 7for ssEa2015 7for ss
Ea2015 7for ss
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure models
 
Kameoka2017 ieice03
Kameoka2017 ieice03Kameoka2017 ieice03
Kameoka2017 ieice03
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamforming
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversion
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離
 
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
 
WaveNetが音声合成研究に与える影響
WaveNetが音声合成研究に与える影響WaveNetが音声合成研究に与える影響
WaveNetが音声合成研究に与える影響
 

Viewers also liked

半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法Daichi Kitamura
 
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...Daichi Kitamura
 
Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...Daichi Kitamura
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
 
Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...Daichi Kitamura
 
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)Daichi Kitamura
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Daichi Kitamura
 
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)Daichi Kitamura
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceDaichi Kitamura
 
Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
 
TensorFlow を使った 機械学習ことはじめ (GDG京都 機械学習勉強会)
TensorFlow を使った機械学習ことはじめ (GDG京都 機械学習勉強会)TensorFlow を使った機械学習ことはじめ (GDG京都 機械学習勉強会)
TensorFlow を使った 機械学習ことはじめ (GDG京都 機械学習勉強会)徹 上野山
 

Viewers also liked (11)

半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
 
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
 
Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
 
Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...
 
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
 
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
 
TensorFlow を使った 機械学習ことはじめ (GDG京都 機械学習勉強会)
TensorFlow を使った機械学習ことはじめ (GDG京都 機械学習勉強会)TensorFlow を使った機械学習ことはじめ (GDG京都 機械学習勉強会)
TensorFlow を使った 機械学習ことはじめ (GDG京都 機械学習勉強会)
 

Similar to Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution

Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...Hiroki_Tanji
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Kitamura Laboratory
 
Digital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersNelson Anand
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Daichi Kitamura
 
Lecture_1 (1).pptx
Lecture_1 (1).pptxLecture_1 (1).pptx
Lecture_1 (1).pptxDavidHamxa
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Daichi Kitamura
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...Kitamura Laboratory
 

Similar to Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution (20)

Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
AMT overview
AMT overviewAMT overview
AMT overview
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
 
Digital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital Filters
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...
 
Introduction to DSP
Introduction to DSPIntroduction to DSP
Introduction to DSP
 
Aa04606162167
Aa04606162167Aa04606162167
Aa04606162167
 
Lecture_1 (1).pptx
Lecture_1 (1).pptxLecture_1 (1).pptx
Lecture_1 (1).pptx
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016
 
Research_Wu.pptx
Research_Wu.pptxResearch_Wu.pptx
Research_Wu.pptx
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...
 

More from Daichi Kitamura

スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価Daichi Kitamura
 
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Daichi Kitamura
 
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)Daichi Kitamura
 
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)Daichi Kitamura
 
Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...Daichi Kitamura
 
Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...Daichi Kitamura
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Daichi Kitamura
 

More from Daichi Kitamura (7)

スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
 
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
 
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
 
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
 
Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...
 
Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 

Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution

  • 1. Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution Télécom ParisTech Visiting September 4th The University of Tokyo, Japan Project Research Associate Daichi Kitamura
  • 2. • Name: Daichi Kitamura • Age: 27 (born in 1990) – Kagawa Pref. in Japan • Background: – NAIST, Japan • Master degree (received in 2014) – SOKENDAI, Japan • Ph.D. degree (received in 2017) – The University of Tokyo, Japan • Project Research Associate • Research topics – Acoustic signal processing, statistical signal processing, audio source separation, etc. Self introduction 2 Japan Kagawa Tokyo
  • 3. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Related Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Student’s t source model with TF-varying scale parameters • Conclusion 3
  • 4. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Related Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Student’s t source model with TF-varying scale parameters • Conclusion 4
  • 5. • Blind source separation (BSS) for audio signals – separates original audio sources – does not require prior information of recording conditions • locations of mics and sources, room geometry, timbres, etc. – can be available for many audio app. • Consider only “determined” situation Background 5 Recording mixture Separated guitar BSS Sources Observed Estimated Mixing system Demixing system # of mics # of sources
  • 6. • Basic theories and their evolution History of BSS for audio signals 6 1994 1998 2013 1999 2012 Age Many permutation solvers for FDICA Apply NMF to many tasks Generative models in NMF Many extensions of NMF Independent component analysis (ICA) Frequency-domain ICA (FDICA) Itakura–Saito NMF (ISNMF) Independent vector analysis (IVA) Multichannel NMF Independent low-rank matrix analysis (ILRMA) *Depicting only popular methods 2016 2009 2006 2011 Auxiliary-function-based IVA (AuxIVA) Time-varying Gaussian IVA Nonnegative matrix factorization (NMF)
  • 7. Motivation of ILRMA • Conventional BSS techniques based on ICA –  Minimum distortion (linear demixing) –  Relatively fast and stable optimization • FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary function technique [N. Ono+, 2010], [N. Ono, 2011] –  Could not use “specific” assumption of sources • Only assumes non-Gaussian p.d.f. for sources –  Permutation problem is crucial and still difficult to solve • IVA often fails causing a “block permutation problem” [Y. Liang+, 2012] • Better to use a “specific source model” in TF domain – Independent low-rank matrix analysis (ILRMA) employs a low-rank property 7 : frequency bins Observed signal Source signalsFrequency-wise mixing matrix : time frames Estimated signal Frequency-wise demixing matrix
  • 8. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Related Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Student’s t source model with TF-varying scale parameters • Conclusion 8
  • 9. • Independent component analysis (ICA)[P. Comon, 1994] – estimates without knowing – Source model (scalar) • is non-Gaussian and mutually independent – Spatial model • Mixing system is a time-invariant matrix • Mixing system in audio signals – Convolutive mixture with room reverberation Related methods: ICA 9 Mixing matrix Demixing matrix Source model Sources Observed Estimated Spatial model
  • 10. • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] – estimates frequency-wise demixing matrix – Source model (scalar) • is complex-valued, non-Gaussian, and mutually independent – Spatial model • Frequency-wise mixing matrix is time-invariant – Instantaneous mixture in each frequency band – A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010] • Permutation problem? – Order of estimated signals cannot be determined by ICA – Alignment of frequency-wise estimated signals is required • Many permutation solvers were proposed Related methods: FDICA 10 Spectrograms ICA1 … Frequencybin Time frame … ICA2 ICA I
  • 11. • FDICA requires signal alignment for all frequency – Order of estimated signals cannot be determined by ICA* Permutation problem 11 ICA All frequency components Source 1 Source 2 Observed 1 Observed 2 Permutation Solver Estimated signal 1 Estimated signal 2 Time *Signal scale should also be restored by a back-projection technique
  • 12. Related methods: IVA • Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006] – extends ICA to multivariate probabilistic model to consider sourcewise frequency vector as a variable – Source model (vector) • is multivariate, spherical, complex-valued, non-Gaussian, and mutually independent – Spatial model • Mixing system is a time-invariant matrix (rank-1 spatial model) 12 … … Mixing matrix … … … Observed vector Demixing matrix Estimated vector Multivariate non- Gaussian dist. Have higher-order correlations Permutation-free estimation of is achieved! Source vector
  • 13. • Spherical multivariate distribution[T. Kim, 2007] • Why spherical distribution? – Frequency bands that have similar activations will be merged together as one source avoid permutation problem Higher-order correlation assumed in IVA 13 x1 and x2 are mutually independent Spherical Laplace dist. Mutually independent two Laplace dist.s x1 and x2 have higher-order correlation Probability depends on only the norm
  • 14. • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] • Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006] Comparison of source models 14 Observed Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Estimated Demixing matrix Current empirical dist. Non-Gaussian source dist. STFT Frequency Time Frequency Time Observed Estimated Current empirical dist. STFT Frequency Time Frequency Time Non-Gaussian spherical source dist. Scalar r.v.s Vector (multivariate) r.v.s Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Mixture is close to Gaussian signal because of CLT Source obeys non- Gaussian dist. Mutually independent Demixing matrix Mutually independent
  • 15. Related method: NMF • Nonnegative matrix factorization (NMF) [D. D. Lee, 1999] – Low-rank decomposition with nonnegative constraint • Limited number of nonnegative bases and their coefficients – Spectrogram is decomposed in acoustic signal processing • Frequently appearing spectral patterns and their activations 15 Amplitude Amplitude Nonnegative matrix (power spectrogram) Basis matrix (spectral patterns) Activation matrix (time-varying gains) Time : # of freq. bins : # of time frames : # of bases Time Frequency Frequency
  • 16. • ISNMF[C. Févotte, 2009] – can be decomposed using “stable property” of • If we define , Related method: ISNMF 16 Equivalent Circularly symmetric complex Gaussian dist. Complex-valued observed signal Nonnegative variance Variance is also decomposed!
  • 17. • Power spectrogram corresponds to variances in TF plane Related method: ISNMF 17 Frequencybin Time frame : Power spectrogram Small value of power Large value of power Complex Gaussian distribution with TF-varying variance If we marginalize in terms of time or frequency, the distribution becomes non-Gaussian even though each TF grid is defined in Gaussian distribution Grayscale shows the value of variance
  • 18. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Related Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Student’s t source model with TF-varying scale parameters • Conclusion 18
  • 19. Extension of source model in IVA • Source model in IVA – has a frequency-uniform scale • Multivariate Laplace with fixed scale • Since scale cannot be determined, it is not equivalent to the flat spectral basis – Almost an NMF with only one basis • Extend to ISNMF-based source model – NMF with arbitrary number of bases • can represent complicated TF structures – can learn “co-occurrence” of each source in TF domain • Co-occurrence is captured as the variance – The structure can easily be estimated by NMF 19 Frequency Time Frequency Time
  • 20. • Spherical Laplace distribution in IVA • Gaussian distribution with TF-varying variance in ISNMF[C. Févotte+, 2009] 20 Frequency-uniform scale Extension of source model in IVA Complex-valued Gaussian in each TF bin Low-rank decomposition with NMF Spherical Laplace (bivariate) Frequency vector (I-dimensional) Time-frequency-varying variance Time-frequency matrix (IJ-dimensional)
  • 21. • Negative log-likelihood in ILRMA Cost function in ILRMA and partitioning function 21 All the variables can easily be optimized by an alternative update Update rules in ICA Update rules in ISNMF Estimated signal: Cost function in ICA (estimates demixing matrix) Cost function in ISNMF (estimates low-rank source model)
  • 22. Update rules of ILRMA • ML-based iterative update rules – Update rule for is based on iterative projection [N. Ono, 2011] – Update rules for NMF variables is based on MM algorithm – Pseudo code is available at • http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 22 Spatial model (demixing matrix) Source model (NMF source model) where and is a one-hot vector that has 1 at th element
  • 23. • ILRMA with partitioning function – Appropriate number of bases for each source can automatically be determined – Useful when various types of sources are mixed • Ex. drums are very low-rank but vocals are not so low-rank Cost function in ILRMA and partitioning function 23 andwhere
  • 24. Update rules of ILRMA • ML-based iterative update rules – Update rule for is based on iterative projection [N. Ono, 2011] – Update rules for NMF variables is based on MM algorithm 24 Spatial model (demixing matrix) Source model (NMF source model) where and is a one-hot vector that has 1 at th element
  • 25. Optimization process in ILRMA • Demixing matrix and source model are alternatively updated – The precise modeling of low-rank TF structures will improve the estimation accuracy of demixing matrix 25 Estimating demixing matrix Mixture Separated Source model Update NMF NMF Estimating NMF variables
  • 26. Comparison of source models 26 FDICA source model Non-Gaussian scalar variable IVA source model Non-Gaussian vector variable with higher-order correlation ILRMA source model Non-Gaussian matrix variable with low-rank time-frequency structure Rank of TF matrix of mixture Rank of TF matrix of each source
  • 27. • Multichannel NMF[A. Ozerov+, 2010], [H. Sawada+, 2013] Multichannel extension of NMF 27 Spatial covariances in each time-frequency slot Observed multichannel signal Spatial covariances of each source Basis matrix Activation matrix Spatial model Source model Partitioning function Spectral patterns Gains Spatial property of each source Timber patterns of all sources Multichannel vector Simultaneous spatial covariance
  • 28. Relationship b/w ILRMA and multichannel NMF • Difference b/w ILRMA and multichannel NMF? – Source distribution: complex Gaussian distribution (same) – ILRMA assumes – Multichannel NMF assumes full-rank spatial covariance • Assumption: rank-1 spatial model – Spatial covariance of each source is rank-1 matrix – Equivalent to simultaneous mixing assumption 28 Sourcewise steering vector ,
  • 29. Relationship b/w ILRMA and multichannel NMF • Multichannel NMF with rank-1 spatial model 30 Substitute into the cost function Transform the variables as
  • 30. Relationship b/w MNMF, IVA, and ILRMA • From multichannel NMF side, – Rank-1 spatial model is introduced, transform the problem from the estimation of mixing system to that of demixing matrix • From IVA side, – Increase the number of spectral bases in source model 31 Source model Spatialmodel FlexibleLimited FlexibleLimited IVA Multichannel NMF ILRMA NMF source model Rank-1 spatial model
  • 31. Experimental evaluation • Conditions 32 Source signals Music signals obtained from SiSEC Convolve impulse response, two microphones and two sources Window length 512 ms of Hamming window Shift length 128 ms (1/4 shift) Number of bases 30 per each source (ILRMA w/o partitioning function) 60 for all source (ILRMA with partitioning function) Evaluation score Improvement ot signal-to-distortion ratio (SDR) 2 m Source 1 5.66cm 50 50 Source 2 2 m Source 1 5.66cm 60 60 Source 2 Impulse response E2A (reverberation time: 300 ms) Impulse response JR2 (reverberation time: 470 ms)
  • 32. Results: fort_minor-remember_the_name 33 16 12 8 4 0 -4 -8 SDRimprovement[dB] Sawada’s MNMF IVA Ozerov’s MNMF Ozerov’s MNMF with random initialization Sawada’s MNMF initialized by ILRMA ILRMA w/o partitioning function ILRMA with partitioning function Directional clustering Sawada’s MNMF IVA Ozerov’s MNMF Ozerov’s MNMF with random initialization Sawada’s MNMF initialized by ILRMA ILRMA w/o partitioning function ILRMA with partitioning function Directional clustering 16 12 8 4 0 -4 -8 SDRimprovement[dB] Violin synth. Vocals Violin synth. Vocals E2A (300 ms) JR2 (470 ms) Poor Good Poor Good
  • 33. Results: ultimate_nz_tour 34 Sawada’s MNMF IVA Ozerov’s MNMF Ozerov’s MNMF with random initialization Sawada’s MNMF initialized by ILRMA ILRMA w/o partitioning function ILRMA with partitioning function Directional clustering 20 15 10 5 0 -5 SDRimprovement[dB] Sawada’s MNMF IVA Ozerov’s MNMF Ozerov’s MNMF with random initialization Sawada’s MNMF initialized by ILRMA ILRMA w/o partitioning function ILRMA with partitioning function Directional clustering 20 15 10 5 0 -5 SDRimprovement[dB] Guitar Synth. Guitar Synth. Poor Good Poor Good E2A (300 ms) JR2 (470 ms)
  • 34. • Signal length: 14 s 12 10 8 6 4 2 0 -2 SDRimprovement[dB] 4003002001000 Iteration steps IVA MNMF ILRMA ILRMA Results: bearlin-roads 35 without Z with Z 11.5 s 15.1 s 60.7 s 7647.3 s Poor Good
  • 35. Demonstration: music source separation • Music source separation 36 Guitar Vocal Keyboard Guitar Vocal Keyboard Source separation Pay attention to listen three parts in the mixture Another demo is available at http://d-kitamura.net/en/index_en.html
  • 36. • Source model based on Symmetric a-stable (SaS) distribution[A. Liutkus+, 2015], [U. Şimşekli+, 2015], [S. Leglaive+, 2017], [M. Fontaine+, 2017] – which can validate the decomposition of complex-valued r.v.s as the decomposition of their parameters – Heavy tail (sparse) when a approaches to 0 • Student’s t-distribution is also used as a source model[C. Févotte+, 2006], [K. Yoshii+, 2016], [K. Kitamura+, 2016], [S. Leglaive+, 2017] – that includes Cauchy distribution ( ) and Gaussian distribution ( ) Stable and Student’s t-distributions 37 SaS (stable family) Student’s t (partially stable) Cauchy Gauss
  • 37. Source model of Student’s t-distribution • Degree-of-freedom parameter – Heavy tail when approaches to 0 • Complex Student’s t-dist. – Circularly symmetric – Student’s t NMF (t-NMF) [K. Yoshii+ 2016] 38 Defined in each TF slot Scale corresponds to NMF model Phase is assumed to be uniform
  • 38. Motivation for using Student’s t-dist. • Better separation with t-NMF was reported[K. Yoshii+, 2016] – in a very simple experiment using only C4, E4, and G4 piano tones • NMF with heavy tail distribution – tends to provide excessive low-rank approximation • Sparse components (which may increase the rank of model data) are considered as outliers • ILRMA based on Student’s t source model (t-ILRMA) – may improve the separation accuracy by forcing NMF source model to be excessively low-rank – will be presented at MLSP2017! (preprint is available on arXiv) • https://arxiv.org/abs/1708.04795 39
  • 39. • th power spectrogram corresponds to scales in TF plane Source model based on Student’s t-distribution 40 Frequencybin Time frame : th power spectrogram Small value of power Large value of power Complex Student’s t-distribution with TF-varying scale Grayscale shows the value of scale
  • 40. • Negative log-likelihood in ILRMA Cost function in ILRMA based on Student’s t-dist. 41 Gaussian ILRMA modeling power spectrogram by variance Student’s t ILRMA modeling pth power spectrogram by scale Generalization of p.d.f. and model domain
  • 41. Experimental results: randomized t-ILRMA • Examples – Improved when – Stable when but score is not sufficient – Root spectrogram ( ) is preferable for speech signals • In the case of – Source model is over-fitted to mixture 42 Music signals Speech signals
  • 42. Tempering parameter • Random initialization (previous result) • Initialization based on Gaussian ILRMA – (Tempering approach of parameter) 43 t-ILRMA (iteration: 200) Identity matrix Uniform random values Gauss ILRMA (iteration: 100) Identity matrix Uniform random values t-ILRMA (iteration: 100) t-NMF (iteration: 100)Uniform random values arbitrary val.
  • 43. Experimental results: initialized t-ILRMA • Examples – Improved for all value of – Could avoid over- fitting problem in the case • Best parameter? – Completely depending on data 44 Music signals Speech signals
  • 46. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Related Methods – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Student’s t source model with TF-varying scale parameters • Conclusion 47
  • 47. Conclusion • Independent low-rank matrix analysis (ILRMA) – Assumption • Statistical independence between sources • Low-rank time-frequency structure of each source – Equivalent to multichannel NMF • when the mixing assumption is valid • Student’s t-distribution is newly introduced – including two symmetric a-stable distributions • Complex Cauchy distribution ( ) • Complex Gaussian distribution ( ) • Further extensions – Relaxation of rank-1 spatial model? – Employ another distribution? – Supervised ILRMA? User-guided ILRMA? 48

Editor's Notes

  1. This talk treats blind source separation problem, BSS, which is a separation technique of individual sources from the recorded mixture. The word “blind” means that the method does not require any prior information about the recording conditions, such as locations of microphones, sources, and room geometry. This kind of technique is very useful for many applications as a system front-end processing. In this talk, we only consider a “determined” situation, namely, the numbers of microphones and sources are equal.
  2. This is a history of basic theories in audio BSS field. For acoustic signals, independent component analysis, ICA, was applied to the frequency domain signals as FDICA. After that, many permutation solvers for FDICA have been proposed, but eventually, an elegant solution, independent vector analysis, IVA was proposed. It is still extended to more flexible models. On the other hand, nonnegative matrix factorization, NMF, is also developed and extended to a multichannel signals for source separation problems. Recently, we have developed a new framework, which unifies these two powerful theories, called independent low-rank matrix analysis, ILRMA. I will explain about the detail.
  3. I here explain the motivation of this talk.
  4. I briefly explain the separation mechanism in FDICA and IVA. In FDICA, ICA is applied to each frequency bin considering the scalar time-series as random variables, and we maximize its non-Gaussianity to estimate the frequency-wise demixing matrix. In IVA, we consider a vector time-series variable of all frequencies like this figure, then assume a multivariate non-Gaussian distribution, which has a spherical property. Since spherical property ensures higher-order correlation in frequency variable, namely among frequency bins, the permutation problem can be avoided.
  5. This is a graphical image of the source model in ISNMF. In each time-frequency slot, zero-mean complex Gaussian distribution is defined, and they are mutually independent in all time and frequency. Now, the variances of these Gaussians are corresponding to the power spectrogram. Therefore, the slot that has strong power, such as a spectral peak and its harmonics, the Gaussian with a large variance becomes the generative model. Note that, even though each slot is Gaussian, the marginal distribution is non-Gaussian, because the variance fluctuates. So, we can use this model as a source model in ICA-based method.
  6. This figure shows the difference of source models in IVA and ILRMA. Since IVA assumes frequency-uniform scale, it is almost an NMF with only one flat basis. On the other hand, ILRMA has more flexible source model with arbitrary number of spectral bases. So we can capture more precise TF structure of each source.
  7. The spherical source distribution in IVA can be extended to more flexible model. This is called local Gaussian model, which employs zero-mean complex Gaussian distribution with time-frequency-varying variances. Namely, in each time-frequency slot i and j, complex Gaussian is defined, and its variance varies depending on the time and frequency. This model is equivalent to Itakura-Saito NMF, and the variance can be decomposed into basis T and activation V.
  8. 提案手法ILRMAの対数尤度関数はこのように得られます.ここで(クリック)青丸で囲った空間分離フィルタWと,赤丸で囲ったNMF音源モデルTVが求めるべき変数になります.(クリック) さらにこの式は,(クリック)前半が従来のIVAのコスト関数と等価であり,(クリック)後半が従来のNMFのコスト関数と等価です.(クリック) したがって,IVAとNMFの反復更新式を交互に反復することで全変数を容易に推定できます. さらに,音源毎に適切なランク数を潜在変数で適応的に決定することも可能です. これは,冒頭で示した通り,音楽信号といえどもボーカルはあまり低ランクにならず,ドラム信号は低ランク,といったことが起こりえますので,音源毎の適切なランクが変わります. そのような状況に対して尤度最大化の基準で自動的に基底を割り振るのがこの潜在変数の役割です.
  9. ILRMAの反復更新式はこのように導出できます. 空間分離フィルタの更新と音源モデルの更新を交互に行うことで,全変数が最適化されます. これらの反復計算で尤度が単調増加することが保証されているので,初期値近傍の局所解への収束が保証されています.
  10. 提案手法ILRMAの対数尤度関数はこのように得られます.ここで(クリック)青丸で囲った空間分離フィルタWと,赤丸で囲ったNMF音源モデルTVが求めるべき変数になります.(クリック) さらにこの式は,(クリック)前半が従来のIVAのコスト関数と等価であり,(クリック)後半が従来のNMFのコスト関数と等価です.(クリック) したがって,IVAとNMFの反復更新式を交互に反復することで全変数を容易に推定できます. さらに,音源毎に適切なランク数を潜在変数で適応的に決定することも可能です. これは,冒頭で示した通り,音楽信号といえどもボーカルはあまり低ランクにならず,ドラム信号は低ランク,といったことが起こりえますので,音源毎の適切なランクが変わります. そのような状況に対して尤度最大化の基準で自動的に基底を割り振るのがこの潜在変数の役割です.
  11. ILRMAの反復更新式はこのように導出できます. 空間分離フィルタの更新と音源モデルの更新を交互に行うことで,全変数が最適化されます. これらの反復計算で尤度が単調増加することが保証されているので,初期値近傍の局所解への収束が保証されています.
  12. つまり,提案手法はまず空間分離フィルタを学習し,それで分離された信号の音色構造をNMFで学習,その結果得られる音源モデルを空間分離フィルタの学習に再利用し,さらに高精度な分離信号が得られる,という反復になります. このプロセスを何度も更新することで,音源毎の明確な音色構造が捉えられ,空間分離フィルタの性能向上が期待できます.
  13. This is a comparison of source models in FDICA, IVA, and ILRMA again. The important idea in ILRMA is that the rank of TF matrix of mixture signal is grater than the rank of TF matrix of each source. So, if we assume not only the independence between source but also a low-rank TF structure for each source, the separation will be done accurately.
  14. また,論文ではNMFの多チャネル信号への拡張手法である多チャネルNMFとILRMAが密接に関連しているという事実を明らかにしています. 簡単に説明いたしますと,従来の多チャネルNMFで定義されている空間情報に関するモデル「空間相関行列」のランクが1となる制約を課した場合とILRMAが等価となる,という事実です. ただし,多チャネルNMFは混合系を推定する手法であり,ILRMAやIVAのように分離系を推定する技術とは異なります.そのため,多チャネルNMFは計算効率や不安定性の観点から実用性にやや欠ける点があります.これに関しては比較実験で示します.
  15. 音楽信号の分離実験を行いました.こちらは実験条件です.二つの音楽信号をこのような配置で鳴らし,2チャンネルのマイクで録音しました.このときの残響時間は300msです. 評価値はSDRという値を用いています.これは音質と分離度合いを含む総合的な性能を示す尺度です.
  16. こちらは3音源の分離結果の一例です.横軸は最適化更新回数,縦軸は分離精度をそれぞれ示しています. このように,反復更新に対する収束速度が多チャネルNMFとは全く違い,IVAやILRMAは非常に高速であることがわかります. また,一回の反復に対する計算量も大きく違うため,実際にかかる計算時間も非常に小さくなっています. そして分離精度はILRMAが良く,少し収束は遅くなりますが潜在変数がある場合が最もよくなっております.
  17. Anyway, the next one is a music source separation. Here we have a mixture signal of three parts. It’s just like a typical music. Please pay attention to listen three parts, guitar, vocal, and keyboard, OK? Let’s listen. Then, if we apply source separation, we can obtain this kind of signals. So, we can remix them, re-edit them, or anything we want. This is a source separation.
  18. This is a graphical image of the source model in ISNMF. In each time-frequency slot, zero-mean complex Gaussian distribution is defined, and they are mutually independent in all time and frequency. Now, the variances of these Gaussians are corresponding to the power spectrogram. Therefore, the slot that has strong power, such as a spectral peak and its harmonics, the Gaussian with a large variance becomes the generative model. Note that, even though each slot is Gaussian, the marginal distribution is non-Gaussian, because the variance fluctuates. So, we can use this model as a source model in ICA-based method.
  19. I do not explain about the detailed derivation of update rules, but they can easily be derived by the same manner as the previous Gaussian ILRMA.
  20. The source model in IVA, spherical Laplace, was extended to this ISNMF model resulting in a independent low-rank matrix analysis (ILRMA). So, ILRMA is a unification of IVA and ISNMF, and we employed NMF source model to capture the low-rank time-frequency structures of each source. This source model can improve the estimation accuracy of the demixing matrix.
  21. As I already explained, the window length in STFT affects the performance of ICA-based separation. If we use short window, x=As does not hold anymore, and if we use long window, the estimation becomes unstable because the number of time frames J decreases. However, ILRMA employs full time-frequency modeling of sources. This model may improve the robustness to a decrease in J. This is our expectation. Let’s check about this issue.
  22. Here we used 4 music and 4 speech signals obtained from SiSEC database, and we produced the observed signal by convolutiong the impulse response shown in the bottom. We used two types of impulse response, one has 300-ms-long reverberation, and the other one is 470 ms.
  23. We compare 4 methods, FDICA + ideal permutation solver, FDICA + DOA-based permutation solver, IVA, and ILRMA. In FDICA+IPS we used the reference, oracle source spectrogram. So this is an upper limit of FDICA. FDICA+DOA is a blind method that uses DOA clustering for solving the permutation problem. Of cause IVA and ILRMA are also blind method. Then, we used Hamming window with various window lengths.
  24. First, we show the results with ideal initialization. Namely, we first give a correct answer of demixing matrix using the oracle source. So, the initial value provides the best separation performance here. In addition, only for ILRMA, we set the initial value of NMF model T and V as the oracle values. Therefore, FDICA+DOA and IVA are using the spatial oracle initialization, and FDICA+IPS and ILRMA are using spatial and spectral oracle initialization.
  25. This is the result. The left ones are music, and right ones are the speech, and the reverberation time is short (top) and long (bottom). The horizontal axis shows the window length, and the vertical axis shows the separation performance. The colored lines are the results of ILRMA with various numbers of NMF bases. In the music results, we can see that FDICA and IVA could not achieve the good separation when the window becomes long. In ILRMA, the performance maintains even in a long long windows. This is obtained from the full modeling of time-frequency structure of each source. However, for the speech signals, the performance of ILRMA becomes worse. We guess this is because speech is not low-rank, and the source model could not capture the precise TF structures.
  26. Next, we show the results with fully blind situation. Initial W is set to identity matrix, and the initial source model is randomized. Note that FDICA+IPS still uses the oracle spectrogram for solving the permutation.
  27. This is the result. We could not obtain the same results as the previous one. The performance of all the methods is degraded when the window length becomes long. Therefore, at least we can say that, ILRMA has a good potential to separate the sources even in a long window case, but in practice, the blind estimation of precise source model is a difficult problem.