SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
Bag of Pursuits and Neural Gas for Improved
Sparse Coding
Kai Labusch, Erhardt Barth, and Thomas Martinetz
University of L¨ubeck
Institute for Neuro- and Bioinformatics
Ratzeburger Allee 160
23562 L¨ubeck, Germany {labusch,barth,martinetz}@inb.uni-luebeck.de
Abstract. Sparse coding employs low-dimensional subspaces in order to encode
high-dimensional signals. Finding the optimal subspaces is a difficult optimization
task. We show that stochastic gradient descent is superior in finding the optimal
subspaces compared to MOD and K-SVD, which are both state-of-the art methods.
The improvement is most significant in the difficult setting of highly overlapping
subspaces. We introduce the so-called ”Bag of Pursuits” that is derived from Or-
thogonal Matching Pursuit. It provides an improved approximation of the optimal
sparse coefficients, which, in turn, significantly improves the performance of the
gradient descent approach as well as MOD and K-SVD. In addition, the ”Bag of
Pursuits” allows to employ a generalized version of the Neural Gas algorithm for
sparse coding, which finally leads to an even more powerful method.
Keywords: sparse coding, neural gas, dictionary learning, matching pursuit
1 Introduction
Many tasks in signal processing and machine learning can be simplified by
choosing an appropriate representation of given data. There are a number
of desirable properties one wants to achieve, e.g., coding-efficiency, resistance
against noise, or invariance against certain transformations. In machine learn-
ing, finding a good representation of given data is an important first step in
order to solve classification or regression tasks.
Suppose that we are given data X = (x1, . . . , xL), xi ∈ RN
. We want to
represent X as a linear combination of some dictionary C, i.e., xi = Cai,
where C = (c1, . . . , cM ), cl ∈ RN
. In case of M > N, the dictionary is
overcomplete.
In this work, we consider the following framework for dictionary design:
We are looking for a dictionary C that minimizes the representation error
Eh =
1
L
L
i=1
xi − Cai
2
2 (1)
where xopt
i = Cai with ai = arg mina xi − Ca , a 0 ≤ k denotes the best
k-term representation of xi in terms of C. The number of dictionary elements
2 Labusch et al.
M and the maximum number of non-zero entries k are user-defined model
parameters.
It has been shown that finding ai is in general NP-hard (Davis et al.
(1997)). Methods such as Orthogonal Matching Pursuit (OMP) (Pati et al.
1993) or Optimized Orthogonal Matching Pursuit (OOMP, Rebollo-Neira and
Lowe (2002)) can be used in order to find an approximation of the coefficients
of the best k-term representation. The Method of Optimal Directions (MOD,
Engan et al. (1999)) and the K-SVD algorithm (Aharon et al. (2006)) can
employ an arbitrary approximation method for the coefficients in order to
learn a dictionary from the data. First, the coefficients are determined, then
they are considered fixed in order to update the dictionary.
Using data that actually was generated as a sparse linear combination of
some given dictionary, it has been shown (Aharon et al. (2006)) that methods
such as MOD or K-SVD can be used in order to reconstruct the dictionary
only from the data even in highly overcomplete settings under the presence
of strong noise. However, our experiments show that even K-SVD which per-
formed best in (Aharon et al. (2006)) requires highly sparse linear combina-
tions for good dictionary reconstruction performance. The SCNG algorithm
does not possess this deficiency (Labusch et al. (2009)), but, unlike MOD or
K-SVD, it is bound to a specific approximation method for the coefficients,
i.e., OOMP.
Here, we propose a new method for designing overcomplete dictionaries
that performs well even if the linear combinations are less sparse. Like MOD
or K-SVD, it can employ an arbitrary approximation method for the coef-
ficients. In order to demonstrate the performance of the method, we test it
on synthetically generated overcomplete linear combinations of known dic-
tionaries and compare the obtained performance against MOD and K-SVD.
2 From vector quantization to sparse coding
Vector quantization learns a representation of given data in terms of so-called
codebook vectors. Each given sample is encoded by the closest codebook
vector. Vector quantization can be understood as a special case of sparse
coding where the coefficients are constrained by ai 0 = 1 and ai 2 = 1,
i.e., vector quantization looks for a codebook C that minimizes (1) choosing
the coefficients according to
(ai)m = 1, (ai)l = 0 ∀l = m where m = arg min
l
cl − xi
2
2 . (2)
In order to learn a good codebook, many vector quantization algorithms
consider only the winner for learning, i.e., the codebook vector cm for which
(ai)m = 1 holds. As a consequence of that type of hard-competitive learning
scheme, problems such as bad quantization, initialization sensitivity, or slow
convergence can arise.
A Bag of Pursuits and Neural Gas for Improved Sparse Coding 3
In order to remedy these problems soft-competitive vector quantization
methods such as the NG algorithm (Martinetz et al. (1993)) have been pro-
posed. In the NG algorithm all possible encodings are considered in each
learning step, i.e., a1
i , . . . , aM
i with (aj
i )j = 1. Then, the encodings are sorted
according to their reconstruction error
xi − Caj0
i ≤ xi − Caj1
i ≤ · · · ≤ xi − Ca
jp
i ≤ · · · ≤ xi − CajM
i . (3)
In contrast to the hard-competitive approaches, in each learning iteration, ev-
ery codebook vector cl is updated. The update is weighted according to the
rank of the encoding that uses the codebook vector cl. It has been shown in
(Martinetz et al. (1993)) that this type of update is equivalent to a gradient
descent on a well-defined cost function. Due to the soft-competitive learn-
ing scheme the NG algorithm shows robust convergence to close to optimal
distributions of the codebook vectors over the data manifold.
Here we want to apply this ranking approach to the learning of sparse
codes. Similar to the NG algorithm, for each given sample xi, we consider
all K possible coefficient vectors aj
i , i.e., encodings that have at most k non-
zero entries. The elements of each aj
i are chosen such that xi − Caj
i is
minimal. We order the coefficients according to the representation error that
is obtained by using them to approximate the sample xi
xi − Caj0
i < xi − Caj1
i < · · · < xi − Ca
jp
i < · · · < xi − CajK
i . (4)
If there are coefficient vectors that lead to the same reconstruction error
xi − Cam1
i = xi − Cam2
i = · · · = xi − CamV
i , (5)
we randomly pick one of them and do not consider the others. Note that we
need this due to theoretical considerations while in practice this situation
almost never occurs. Let rank(xi, aj
i , C) = p denote the number of coefficient
vectors am
i with xi − Cam
i < xi − Caj
i . Introducing the neighborhood
hλt
(v) = e−v/λt
, we consider the following modified error function
Es =
L
i=1
K
j=1
hλt
(rank(xi, aj
i , C)) xi − Caj
i
2
2 (6)
which becomes equal to (1) for λt → 0. In order to minimize (6), we consider
the gradient of Es with respect to C, which is
∂Es
∂C
= −2
L
i=1
K
j=1
hλt
(rank(xi, aj
i , C))(xi − Caj
i )aj
i
T
+ R (7)
with
R =
L
i=1
K
j=1
hλt
(rank(xi, aj
i , C))
∂rank(xi, aj
i , C)
∂C
xi − Caj
i
2
2. (8)
4 Labusch et al.
In order to show that R = 0, we adopt the proof given in (Martinetz et al.
(1993)) to our setting. With ej
i = xi − Caj
i , we write rank(xi, aj
i , C) as
rank(xi, aj
i , C) =
K
m=1
θ((ej
i )2
− (em
i )2
) (9)
where θ(x) is the heaviside step function. The derivative of the heaviside
step function is the delta distribution δ(x) with δ(x) = 0 for x = 0 and
δ(x)dx = 1. Therefore, we can write
R =
L
i=1
K
j=1
hλt
(rank(xi, aj
i , C))(ej
i )2
T
m=1
((aj
i )T
− (am
i )T
)δ((ej
i )2
− (em
i )2
)
(10)
Each term of (10) is non-vanishing only for those aj
i for which (ej
i )2
= (em
i )2
is valid. Since we explicitly excluded this case, we obtain R = 0. Hence,
we can perform a stochastic gradient descent on (6) with respect to C by
applying t = 0, . . . , tmax updates of C using the gradient based learning rule
∆C = αt
K
j=1
hλt
(rank(xi, aj
i , C))(xi − Caj
i )ai
j
T
(11)
for a randomly chosen xi ∈ X where λt = λ0 (λfinal/λ0)
t
tmax is an expo-
nentially decreasing neighborhood-size and αt = α0 (αfinal/α0)
t
tmax an expo-
nentially decreasing learning rate. After each update has been applied, the
column vectors of C are renormalized to one. Then the aj
i are re-determined
and the next update for C can be performed.
3 A bag of orthogonal matching pursuits
So far, for each training sample xi, all possible coefficient vectors aj
i , j =
1, . . . , K with aj
i 0 ≤ k have been considered. K grows exponentially with
M and k. Therefore, this approach is not applicable in practice. However,
since in (6) all those contributions in the sum for which the rank is larger
than the neighborhood-size λt can be neglected, we actually do not need all
possible coefficient vectors. We only need the first best ones with respect to
the reconstruction error.
There are a number of approaches, which try to find the best coefficient
vector, e.g., OMP or OOMP. It has been shown that in well-behaved cases
they can find at least good approximations (Tropp (2004)). In the follow-
ing, we extend OOMP such that not only the best but the first Kuser best
coefficients are determined, at least approximately.
OOMP is a greedy method that iteratively constructs a given sample x
out of the columns of the dictionary C. The algorithm starts with Uj
n = ∅,
A Bag of Pursuits and Neural Gas for Improved Sparse Coding 5
Rj
0 = (r0,j
1 , . . . , r0,j
M ) = C and j
0 = x. The set Uj
n contains the indices of those
columns of C that have been used during the j-th pursuit with respect to x up
to the n-th iteration. Rj
n is a temporary matrix that has been orthogonalized
with respect to the columns of C that are indexed by Uj
n. rn,j
l is the l-th
column of Rj
n. j
n is the residual in the n-th iteration of the j-th pursuit with
respect to x.
In iteration n, the algorithm looks for that column of Rj
n whose inclusion
in the linear combination leads to the smallest residual j
n+1 in the next
iteration of the algorithm, i.e., that has the maximum overlap with respect
to the current residual. Hence, with
yj
n = (rn,j
1
T j
n)/ rn,j
1 , . . . , (rn,j
l
T j
n)/ rn,j
l , . . . , (rn,j
M
T j
n)/ rn,j
M (12)
it looks for lwin(n, j) = arg maxl,l/∈Uj
n
(yj
n)l. Then, the orthogonal projection
of Rj
n to rn,j
lwin(n,j) is removed from Rj
n
Rj
n+1 = Rj
n − (rn,j
lwin(n,j)(Rj
n
T
rn,j
lwin(n,j))T
)/(rn,j
lwin(n,j)
T
rn,j
lwin(n,j)) . (13)
Furthermore, the orthogonal projection of j
n to rn,j
lwin(n,j) is removed from j
n
j
n+1 = j
n − ( j
n
T
rn,j
lwin(n,j))/(rn,j
lwin(n,j)
T
rn,j
lwin(n,j)) rn,j
lwin(n,j) . (14)
The algorithm stops if j
n = 0 or n = k. The j-th approximation of the
coefficients of the best k-term approximation, i.e., aj
, can be obtained by
recursively tracking the contribution of each column of C that has been used
during the iterations of pursuit j. In order to obtain a set of approxima-
tions a1
, ..., aKuser
, where Kuser is chosen by the user, we want to conduct
Kuser matching pursuits. To obtain Kuser different pursuits, we implement
the following function:
Q(l, n, j) =



0 :
If there is no pursuit among all pursuits that have
been performed with respect to x that is equal to
the j-th pursuit up to the n-th iteration where in
that iteration column l has been selected
1 : else .
(15)
Then, while a pursuit is performed, we track all overlaps yj
n that have been
computed during that pursuit. For instance if a1
has been determined, we
have y1
0, . . . , y1
n, . . . , y1
s1−1 where s1 is the number of iterations of the 1st
pursuit with respect to x. In order to find a2
, we now look for the largest
overlap in the previous pursuit that has not been used so far
ntarget = arg max
n=0,...,s1−1
max
l,Q(l,n,j)=0
(y1
n)l (16)
ltarget = arg max
l
(y1
ntarget
)l . (17)
6 Labusch et al.
We replay the 1st pursuit up to iteration ntarget. In that iteration, we select
column ltarget instead of the previous winner and continue with the pursuit
until the stopping criterion has been reached. If m pursuits have been per-
formed, among all previous pursuits, we look for the largest overlap that has
not been used so far:
jtarget = arg max
j=1,...,m
max
n=0,...,sj −1
max
l,Q(l,n,j)=0
(yj
n)l (18)
ntarget = arg max
n=0,...,sjtarget −1
max
l,Q(l,n,jtarget)=0
(yjtarget
n )l (19)
ltarget = arg max
l,Q(l,ntarget,jtarget)=0
(yjtarget
ntarget
)l . (20)
We replay pursuit jtarget up to iteration ntarget. In that iteration, we select
column ltarget instead of the previous winner and continue with the pursuit
until the stopping criterion has been reached. We repeat this procedure until
Kuser pursuits have been performed.
4 Experiments
In the experiments we use synthetic data that actually can be represented as
sparse linear combinations of some dictionary. We perform the experiments
in order to asses two questions: (i) How good is the target function (1) min-
imized? (ii) Is it possible to obtain the generating dictionary only from the
given data?
In the following Ctrue
= (ctrue
1 , . . . , ctrue
50 ) ∈ R20×50
denotes a synthetic
dictionary. Each entry of Ctrue
is uniformly chosen in the interval [−0.5, 0.5].
Furthermore, we set ctrue
l = 1. Using such a dictionary, we create a training
set X = (x1, . . . , x1500), xi ∈ R20
where each training sample xi is a sparse
linear combination of the columns of the dictionary:
xi = Ctrue
bi . (21)
We choose the coefficient vectors bi ∈ R50
such that they contain k non-zero
entries. The selection of the position of the non-zero entries in the coefficient
vectors is performed according to three different data generation scenarios:
Random dictionary elements: In this scenario all combinations of k dic-
tionary elements are possible. Hence, the position of the non-zero entries
in each coefficient vector bi is uniformly chosen in the interval [1, . . . , 50].
Independent Subspaces: In this case the training samples are located in
a small number of k-dimensional subspaces. We achieve this by defining
50/k groups of dictionary elements, each group containing k randomly
selected dictionary elements. The groups do not intersect, i.e., each dic-
tionary element is at most member of one group. In order to generate a
training sample, we uniformly choose one group of dictionary elements
and obtain the training sample as a linear combination of the dictionary
elements that belong to the selected group.
A Bag of Pursuits and Neural Gas for Improved Sparse Coding 7
Dependent subspaces: In this case, similar to the previous scenario, the
training samples are located in a small number of k-dimensional sub-
spaces. In contrast to the previous scenario, the subspaces do highly in-
tersect, i.e., the subspaces share basis vectors. In order to achieve this, we
uniformly select k−1 dictionary elements. Then, we use 50−k+1 groups
of dictionary elements where each group consists of the k−1 selected dic-
tionary elements plus one further dictionary element. Again, in order to
generate a training sample, we uniformly choose one group of dictionary
elements and obtain the training sample as a linear combination of the
dictionary elements that belong to the selected group.
The value of the non-zero entries is always chosen uniformly in the interval
[−0.5, 0.5].
We apply MOD, K-SVD and the stochastic gradient descent method that
is proposed in this paper to the training data. Let Clearned
= (clearned
1 , . . . ,
clearned
50 ) denote the dictionary that has been learned by one of these methods
on the basis of the training samples. In order to measure the performance
of the methods with respect to the minimization of the target function, we
consider
Eh =
1
1500
1500
i=1
xi − Clearned
ai
2
2 (22)
where ai is obtained from the best pursuit out of Kuser = 20 pursuits that
have been performed according to the approach described in section 3. In
order to asses if the true dictionary can be reconstructed from the training
data, we consider the mean maximum overlap between each element of the
true dictionary and the learned dictionary:
MMO =
1
50
50
l=1
max
k=1,...,50
|ctrue
l clearned
k | . (23)
k, the number of non-zero entries is varied from 1 to 11. For the stochastic
gradient descent method, we perform 100×1500 update steps, i.e., 100 learn-
ing epochs. For MOD and K-SVD, we perform 100 learning iterations each
iteration using 1500 training samples. We repeat all experiments 50 times
and report the mean result over all experiments.
In the first set of experiments, for all dictionary learning methods, i.e.,
MOD, K-SVD, and stochastic gradient descent, a single orthogonal matching
pursuit is used in order to obtain the dictionary coefficients during learning.
Furthermore, the stochastic gradient descent is performed in hard-competitive
mode, which uses a neighborhood-size that is practically zero, i.e., λ0 = λfinal
= 10−10
. The results of this experiment are depicted in table 1 (a)-(c) and (j)-
(l). In case of the random dictionary elements scenario (see (a) and (j)) the
stochastic gradient approach clearly outperforms MOD and K-SVD. From
the mean maximum overlap it can be seen that almost all dictionary ele-
ments are well reconstructed with up to 6 non-zero coefficients in the linear
8 Labusch et al.
Random
dictionary elements
Independent
subspaces
Dependent
subspaces
2 4 6 8 10
0
0.2
0.4
0.6
k
E
h
(a)
HC−SGD
K−SVD
MOD
2 4 6 8 10
0
0.2
0.4
0.6
k
E
h
(b)
HC−SGD
K−SVD
MOD
2 4 6 8 10
0
0.2
0.4
0.6
k
Eh
(c)
HC−SGD
K−SVD
MOD
2 4 6 8 10
0
0.2
0.4
0.6
k
E
h
(d)
HC−SGD(BOP)
K−SVD(BOP)
MOD(BOP)
2 4 6 8 10
0
0.2
0.4
0.6
k
E
h
(e)
HC−SGD(BOP)
K−SVD(BOP)
MOD(BOP)
2 4 6 8 10
0
0.2
0.4
0.6
k
Eh
(f)
HC−SGD(BOP)
K−SVD(BOP)
MOD(BOP)
2 4 6 8 10
0
0.2
0.4
0.6
k
E
h
(g)
SC−SGD
HC−SGD(BOP)
4 6 8 10
0
0.05
0.1
0.15
0.2
k
E
h
(h)
SC−SGD
HC−SGD(BOP)
4 6 8 10
0
0.05
0.1
0.15
0.2
kEh
(i)
SC−SGD
HC−SGD(BOP)
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(j)
HC−SGD
K−SVD
MOD
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(k)
HC−SGD
K−SVD
MOD
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(l)
HC−SGD
K−SVD
MOD
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(m)
HC−SGD(BOP)
K−SVD(BOP)
MOD(BOP)
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(n)
HC−SGD(BOP)
K−SVD(BOP)
MOD(BOP)
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(o)
HC−SGD(BOP)
K−SVD(BOP)
MOD(BOP)
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(p)
SC−SGD
HC−SGD(BOP)
2 4 6 8 10
0.6
0.7
0.8
0.9
1
k
meanmaxoverlap(MMO)
(q)
SC−SGD
HC−SGD(BOP)
4 6 8 10
0.6
0.7
0.8
0.9
k
meanmaxoverlap(MMO)
(r)
SC−SGD
HC−SGD(BOP)
Table 1. Experimental results. See text for details. SC-SGD: soft-competitive
stochastic gradient descent (λ0 = 20, λfinal = 0.65). HC-SGD: hard-competitive
stochastic gradient descent (λ0 = λfinal = 10−10
).
A Bag of Pursuits and Neural Gas for Improved Sparse Coding 9
combinations. If the dictionary elements cannot be reconstructed any more,
i.e., for k > 6, the mean representation error Eh starts to grow. In case of the
independent subspaces and dependent subspaces scenario the stochastic gra-
dient method also outperforms MOD and K-SVD in terms of minimization
of the representation error (see (b) and (c)) whereas in terms of dictionary
reconstruction performance only in the intersecting subspaces scenario a clear
performance gain compared to MOD and K-SVD can be seen ((k) and (l)).
This might be caused by the fact that in case of the independent subspaces
scenario it is sufficient to find dictionary elements that span the subspaces
where the data is located in order to minimize the target function, i.e., the
scenario does not force the method to find the true dictionary elements in
order to minimize the target function.
In the second experiment for all methods the dictionary coefficients were
obtained from the best pursuit out of Kuser = 20 pursuits that were per-
formed according to the “Bag of Pursuits” approach described in section 3.
The results of this experiment are depicted in table 1 (d)-(f) and (m)-(o).
Compared to the results of the first experiment (see (a)-(c)) it can be seen
that the computationally more demanding method for the approximation of
the best coefficients leads to a significantly improved performance of MOD,
K-SVD and the stochastic gradient descent with respect to the minimization
of the representation error Eh (see (d)-(f)). The most obvious improvement
can be seen in case of the dependent subspaces scenario where also the dictio-
nary reconstruction performance significantly improves (see (l) and (o)). In
the random dictionary elements (see (j) and (m)) and independent subspaces
scenario (see (k) and (n)) there are only small improvements with respect to
the reconstruction of the true dictionary.
In the third experiment, we employed soft-competitive learning in the
stochastic gradient descend, i.e., the coefficients corresponding to each of
the Kuser = 20 pursuits were used in the update step according to (11) with
λ0 = 20 and λfinal = 0.65. The results of this experiment are depicted in table
1 (g)-(i) and (p)-(r). It can be seen that for less sparse scenarios, i.e. k > 6,
the soft-competitive learning further improves the performance. Particularly
in case of the dependent subspaces scenario a significant improvement in
terms dictionary reconstruction performance can be seen for k > 4 (see (r)).
For very sparse settings, i.e. k ≤ 4, the hard-competitive approach seems to
perform better than the soft-competitive variant. Again, in case of the inde-
pendent subspaces only the representation error decreases (see (h)) whereas
no performance gain for the dictionary reconstruction can be seen (see (q)).
Again this might be caused by the fact that in case of the subspace scenario
learning of dictionary elements that span the subspaces is sufficient in order
to minimize the target function.
10 Labusch et al.
5 Conclusion
We proposed a stochastic gradient descent method that can be used either for
hard-competitive or soft-competitive learning of sparse codes. We introduced
the so-called “bag of pursuits” in order to compute a better estimation of the
best k-term approximation of given data. This method can be used together
with a generalization of the neural gas approach to perform soft-competitive
stochastic gradient learning of (overcomplete) dictionaries for sparse coding.
Our experiments on synthetic data show that compared to other state-of-
the-art methods such as MOD or K-SVD a significant performance improve-
ment in terms of minimization of the representation error as well as in terms
of reconstruction of the true dictionary elements that were used to generate
the data can be observed. While a significant performance gain is already ob-
tained by hard-competitive stochastic gradient descend an even better per-
formance is obtained by using the “bag of pursuits” and soft-competitive
learning. In particular, as a function of decreasing sparseness, the perfor-
mance of the method described in this paper degrades much slower than
that of MOD and K-SVD. This should improve the design of overcomplete
dictionaries also in more complex settings.
References
AHARON, M., ELAD, M. and BRUCKSTEIN. A. (2006): K-SVD: An Algorithm
for Designing Overcomplete Dictionaries for Sparse Representation. IEEE
Transactions on Signal Processing,[see also IEEE Transactions on Acoustics,
Speech, and Signal Processing], 54(11):4311–4322.
DAVIS, G., MALLAT, S. and AVELLANEDA. M. (1997): Greedy adaptive ap-
proximation. J. Constr. Approx., 13:57–89.
K. ENGAN, K., AASE, S. O. and HAKON HUSOY., J. (1999): Method of op-
timal directions for frame design. In ICASSP ’99: Proceedings of the IEEE
International Conference on Acoustics, Speech, and Signal Processing, pages
2443–2446, Washington, DC, USA, 1999. IEEE Computer Society.
LABUSCH, K., BARTH, E. and MARTINETZ. T. (2009): Sparse Coding Neural
Gas: Learning of Overcomplete Data Representations. Neurocomputing, 72(7-
9):1547–1555.
MARTINETZ, T., BERKOVICH, S. and SCHULTEN. K. (1993): “Neural-gas”
Network for Vector Quantization and its Application to Time-Series Predic-
tion. IEEE-Transactions on Neural Networks, 4(4):558–569.
PATI, Y., REZAIIFAR, R. and KRISHNAPRASAD. P. (1993): Orthogonal Match-
ing Pursuit: Recursive Function Approximation with Applications to Wavelet
Decomposition. Proceedings of the 27 th Annual Asilomar Conference on Sig-
nals, Systems, and Computers.
REBOLLO-NEIRA, L. and LOWE. D. (2002): Optimized orthogonal matching
pursuit approach. IEEE Signal Processing Letters, 9(4):137–140.
TROPP. J. A. (2004): Greed is good: algorithmic results for sparse approximation.
IEEE Transactions on Information Theory, 50(10):2231–2242.

Weitere ähnliche Inhalte

Was ist angesagt?

Quaternionic rigid meromorphic cocycles
Quaternionic rigid meromorphic cocyclesQuaternionic rigid meromorphic cocycles
Quaternionic rigid meromorphic cocyclesmmasdeu
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographywtyru1989
 
1982 a simple molecular statistical treatment for cholesterics
1982 a simple molecular statistical treatment for cholesterics1982 a simple molecular statistical treatment for cholesterics
1982 a simple molecular statistical treatment for cholestericspmloscholte
 
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATATIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATAijseajournal
 
Krylov Subspace Methods in Model Order Reduction
Krylov Subspace Methods in Model Order ReductionKrylov Subspace Methods in Model Order Reduction
Krylov Subspace Methods in Model Order ReductionMohammad Umar Rehman
 
An Introduction to Model Order Reduction
An Introduction to Model Order ReductionAn Introduction to Model Order Reduction
An Introduction to Model Order ReductionMohammad Umar Rehman
 
Bathymetry smoothing in ROMS: A new approach
Bathymetry smoothing in ROMS: A new approachBathymetry smoothing in ROMS: A new approach
Bathymetry smoothing in ROMS: A new approachMathieu Dutour Sikiric
 
p-adic integration and elliptic curves over number fields
p-adic integration and elliptic curves over number fieldsp-adic integration and elliptic curves over number fields
p-adic integration and elliptic curves over number fieldsmmasdeu
 
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...SEENET-MTP
 
RuleML 2015 Constraint Handling Rules - What Else?
RuleML 2015 Constraint Handling Rules - What Else?RuleML 2015 Constraint Handling Rules - What Else?
RuleML 2015 Constraint Handling Rules - What Else?RuleML
 
Congruence Distributive Varieties With Compact Intersection Property
Congruence Distributive Varieties With Compact Intersection PropertyCongruence Distributive Varieties With Compact Intersection Property
Congruence Distributive Varieties With Compact Intersection Propertyfilipke85
 
Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...
Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...
Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...SEENET-MTP
 
Nonlinear viscous hydrodynamics in various dimensions using AdS/CFT
Nonlinear viscous hydrodynamics in various dimensions using AdS/CFTNonlinear viscous hydrodynamics in various dimensions using AdS/CFT
Nonlinear viscous hydrodynamics in various dimensions using AdS/CFTMichaelRabinovich
 
A Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsA Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsMatt Parker
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural NetworksESCOM
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmChristian Robert
 

Was ist angesagt? (20)

Quaternionic rigid meromorphic cocycles
Quaternionic rigid meromorphic cocyclesQuaternionic rigid meromorphic cocycles
Quaternionic rigid meromorphic cocycles
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptography
 
LieGroup
LieGroupLieGroup
LieGroup
 
1982 a simple molecular statistical treatment for cholesterics
1982 a simple molecular statistical treatment for cholesterics1982 a simple molecular statistical treatment for cholesterics
1982 a simple molecular statistical treatment for cholesterics
 
Buckingham a4
Buckingham a4Buckingham a4
Buckingham a4
 
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATATIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
 
Krylov Subspace Methods in Model Order Reduction
Krylov Subspace Methods in Model Order ReductionKrylov Subspace Methods in Model Order Reduction
Krylov Subspace Methods in Model Order Reduction
 
An Introduction to Model Order Reduction
An Introduction to Model Order ReductionAn Introduction to Model Order Reduction
An Introduction to Model Order Reduction
 
Bathymetry smoothing in ROMS: A new approach
Bathymetry smoothing in ROMS: A new approachBathymetry smoothing in ROMS: A new approach
Bathymetry smoothing in ROMS: A new approach
 
p-adic integration and elliptic curves over number fields
p-adic integration and elliptic curves over number fieldsp-adic integration and elliptic curves over number fields
p-adic integration and elliptic curves over number fields
 
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
 
RuleML 2015 Constraint Handling Rules - What Else?
RuleML 2015 Constraint Handling Rules - What Else?RuleML 2015 Constraint Handling Rules - What Else?
RuleML 2015 Constraint Handling Rules - What Else?
 
Congruence Distributive Varieties With Compact Intersection Property
Congruence Distributive Varieties With Compact Intersection PropertyCongruence Distributive Varieties With Compact Intersection Property
Congruence Distributive Varieties With Compact Intersection Property
 
20320140501020
2032014050102020320140501020
20320140501020
 
Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...
Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...
Alexei Starobinsky "New results on inflation and pre-inflation in modified gr...
 
CMB Likelihood Part 1
CMB Likelihood Part 1CMB Likelihood Part 1
CMB Likelihood Part 1
 
Nonlinear viscous hydrodynamics in various dimensions using AdS/CFT
Nonlinear viscous hydrodynamics in various dimensions using AdS/CFTNonlinear viscous hydrodynamics in various dimensions using AdS/CFT
Nonlinear viscous hydrodynamics in various dimensions using AdS/CFT
 
A Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsA Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable Functions
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural Networks
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
 

Andere mochten auch

A practical-survival-guide-to-oseh
A practical-survival-guide-to-osehA practical-survival-guide-to-oseh
A practical-survival-guide-to-osehKarlos Svoboda
 
Emotional Multi-Agents System for Peer to peer E-Learning
Emotional Multi-Agents System for Peer to peer E-LearningEmotional Multi-Agents System for Peer to peer E-Learning
Emotional Multi-Agents System for Peer to peer E-LearningKarlos Svoboda
 
Vint big data research privacy technology and the law
Vint big data research privacy technology and the lawVint big data research privacy technology and the law
Vint big data research privacy technology and the lawKarlos Svoboda
 
Committee against torture - Cat
Committee against torture - CatCommittee against torture - Cat
Committee against torture - CatKarlos Svoboda
 
Survey of accountability, trust, consent, tracking, security and privacy mech...
Survey of accountability, trust, consent, tracking, security and privacy mech...Survey of accountability, trust, consent, tracking, security and privacy mech...
Survey of accountability, trust, consent, tracking, security and privacy mech...Karlos Svoboda
 
Stalking victimization in the us
Stalking victimization in the usStalking victimization in the us
Stalking victimization in the usKarlos Svoboda
 
Caat box of rabbit with the tata box of cattle,buffalo camel and rat
Caat box of rabbit with the tata box of cattle,buffalo camel and ratCaat box of rabbit with the tata box of cattle,buffalo camel and rat
Caat box of rabbit with the tata box of cattle,buffalo camel and ratKarlos Svoboda
 
Human Performance Human Performance Human Performance
Human Performance Human Performance Human PerformanceHuman Performance Human Performance Human Performance
Human Performance Human Performance Human PerformanceKarlos Svoboda
 

Andere mochten auch (8)

A practical-survival-guide-to-oseh
A practical-survival-guide-to-osehA practical-survival-guide-to-oseh
A practical-survival-guide-to-oseh
 
Emotional Multi-Agents System for Peer to peer E-Learning
Emotional Multi-Agents System for Peer to peer E-LearningEmotional Multi-Agents System for Peer to peer E-Learning
Emotional Multi-Agents System for Peer to peer E-Learning
 
Vint big data research privacy technology and the law
Vint big data research privacy technology and the lawVint big data research privacy technology and the law
Vint big data research privacy technology and the law
 
Committee against torture - Cat
Committee against torture - CatCommittee against torture - Cat
Committee against torture - Cat
 
Survey of accountability, trust, consent, tracking, security and privacy mech...
Survey of accountability, trust, consent, tracking, security and privacy mech...Survey of accountability, trust, consent, tracking, security and privacy mech...
Survey of accountability, trust, consent, tracking, security and privacy mech...
 
Stalking victimization in the us
Stalking victimization in the usStalking victimization in the us
Stalking victimization in the us
 
Caat box of rabbit with the tata box of cattle,buffalo camel and rat
Caat box of rabbit with the tata box of cattle,buffalo camel and ratCaat box of rabbit with the tata box of cattle,buffalo camel and rat
Caat box of rabbit with the tata box of cattle,buffalo camel and rat
 
Human Performance Human Performance Human Performance
Human Performance Human Performance Human PerformanceHuman Performance Human Performance Human Performance
Human Performance Human Performance Human Performance
 

Ähnlich wie Bag of Pursuits and Neural Gas for Improved Sparse Codin

Indexing Text with Approximate q-grams
Indexing Text with Approximate q-gramsIndexing Text with Approximate q-grams
Indexing Text with Approximate q-gramsYasmine Long
 
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...Mamoon Ismail Khalid
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920Karl Rudeen
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equationsXequeMateShannon
 
A Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS AlgorithmA Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS AlgorithmTELKOMNIKA JOURNAL
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...ANIRBANMAJUMDAR18
 
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...Wendy Belieu
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmCSCJournals
 
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitRobust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitPantelis Bouboulis
 
Simulated annealing for MMR-Path
Simulated annealing for MMR-PathSimulated annealing for MMR-Path
Simulated annealing for MMR-PathFrancisco Pérez
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-reportRyen Krusinga
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Conditioncsandit
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...SSA KPI
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...SSA KPI
 

Ähnlich wie Bag of Pursuits and Neural Gas for Improved Sparse Codin (20)

Indexing Text with Approximate q-grams
Indexing Text with Approximate q-gramsIndexing Text with Approximate q-grams
Indexing Text with Approximate q-grams
 
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
Joint3DShapeMatching
Joint3DShapeMatchingJoint3DShapeMatching
Joint3DShapeMatching
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equations
 
A Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS AlgorithmA Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS Algorithm
 
10.1.1.630.8055
10.1.1.630.805510.1.1.630.8055
10.1.1.630.8055
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
 
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
 
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitRobust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
 
Simulated annealing for MMR-Path
Simulated annealing for MMR-PathSimulated annealing for MMR-Path
Simulated annealing for MMR-Path
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
 

Mehr von Karlos Svoboda

Human vs and part machine - TVP magazine
Human vs and part machine - TVP magazineHuman vs and part machine - TVP magazine
Human vs and part machine - TVP magazineKarlos Svoboda
 
Moral science - Human subjects
Moral science - Human subjectsMoral science - Human subjects
Moral science - Human subjectsKarlos Svoboda
 
Existential Risk Prevention as Global Priority
Existential Risk Prevention as Global PriorityExistential Risk Prevention as Global Priority
Existential Risk Prevention as Global PriorityKarlos Svoboda
 
A taxonomy for descriptive research in law and technology
A taxonomy for descriptive research in law and technologyA taxonomy for descriptive research in law and technology
A taxonomy for descriptive research in law and technologyKarlos Svoboda
 
Brain computer interaction and medical access to the brain
Brain computer interaction and medical access to the brainBrain computer interaction and medical access to the brain
Brain computer interaction and medical access to the brainKarlos Svoboda
 
A rights based model of governance - the case of human enhancement
A rights based model of governance - the case of human enhancementA rights based model of governance - the case of human enhancement
A rights based model of governance - the case of human enhancementKarlos Svoboda
 
Evropská dohoda o ochraně obratlovců
Evropská dohoda o ochraně obratlovcůEvropská dohoda o ochraně obratlovců
Evropská dohoda o ochraně obratlovcůKarlos Svoboda
 
Ethics of security and surveillance technologies opinion 28
Ethics of security and surveillance technologies opinion 28Ethics of security and surveillance technologies opinion 28
Ethics of security and surveillance technologies opinion 28Karlos Svoboda
 
Ethical aspect of ICT Implants in the Human body
Ethical aspect of ICT Implants in the Human bodyEthical aspect of ICT Implants in the Human body
Ethical aspect of ICT Implants in the Human bodyKarlos Svoboda
 
Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...
Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...
Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...Karlos Svoboda
 
Neural devices will change humankind
Neural devices will change humankindNeural devices will change humankind
Neural devices will change humankindKarlos Svoboda
 
Nanotechnology, ubiquitous computing and the internet of things
Nanotechnology, ubiquitous computing and the internet of thingsNanotechnology, ubiquitous computing and the internet of things
Nanotechnology, ubiquitous computing and the internet of thingsKarlos Svoboda
 
Identity REvolution multi disciplinary perspectives
Identity REvolution   multi disciplinary perspectivesIdentity REvolution   multi disciplinary perspectives
Identity REvolution multi disciplinary perspectivesKarlos Svoboda
 
MBAN medical body area network - first report and order
MBAN   medical body area network - first report and orderMBAN   medical body area network - first report and order
MBAN medical body area network - first report and orderKarlos Svoboda
 
Intimate technology - the battle for our body and behaviour
Intimate technology - the battle for our body and behaviourIntimate technology - the battle for our body and behaviour
Intimate technology - the battle for our body and behaviourKarlos Svoboda
 
Making perfect life european governance challenges in 21st Century Bio-engine...
Making perfect life european governance challenges in 21st Century Bio-engine...Making perfect life european governance challenges in 21st Century Bio-engine...
Making perfect life european governance challenges in 21st Century Bio-engine...Karlos Svoboda
 
GRAY MATTERS Integrative Approaches for Neuroscience, Ethics, and Society
GRAY MATTERS Integrative Approaches for  Neuroscience, Ethics, and SocietyGRAY MATTERS Integrative Approaches for  Neuroscience, Ethics, and Society
GRAY MATTERS Integrative Approaches for Neuroscience, Ethics, and SocietyKarlos Svoboda
 

Mehr von Karlos Svoboda (20)

Human vs and part machine - TVP magazine
Human vs and part machine - TVP magazineHuman vs and part machine - TVP magazine
Human vs and part machine - TVP magazine
 
Jan Hus De ecclesia
Jan Hus De ecclesia Jan Hus De ecclesia
Jan Hus De ecclesia
 
Moral science - Human subjects
Moral science - Human subjectsMoral science - Human subjects
Moral science - Human subjects
 
Existential Risk Prevention as Global Priority
Existential Risk Prevention as Global PriorityExistential Risk Prevention as Global Priority
Existential Risk Prevention as Global Priority
 
A taxonomy for descriptive research in law and technology
A taxonomy for descriptive research in law and technologyA taxonomy for descriptive research in law and technology
A taxonomy for descriptive research in law and technology
 
Brain computer interaction and medical access to the brain
Brain computer interaction and medical access to the brainBrain computer interaction and medical access to the brain
Brain computer interaction and medical access to the brain
 
A rights based model of governance - the case of human enhancement
A rights based model of governance - the case of human enhancementA rights based model of governance - the case of human enhancement
A rights based model of governance - the case of human enhancement
 
Evropská dohoda o ochraně obratlovců
Evropská dohoda o ochraně obratlovcůEvropská dohoda o ochraně obratlovců
Evropská dohoda o ochraně obratlovců
 
Ethics of security and surveillance technologies opinion 28
Ethics of security and surveillance technologies opinion 28Ethics of security and surveillance technologies opinion 28
Ethics of security and surveillance technologies opinion 28
 
Ethical aspect of ICT Implants in the Human body
Ethical aspect of ICT Implants in the Human bodyEthical aspect of ICT Implants in the Human body
Ethical aspect of ICT Implants in the Human body
 
Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...
Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...
Emerging Technoethics of Human Interaction with Communication, Bionic and Rob...
 
Neural devices will change humankind
Neural devices will change humankindNeural devices will change humankind
Neural devices will change humankind
 
Nanotechnology, ubiquitous computing and the internet of things
Nanotechnology, ubiquitous computing and the internet of thingsNanotechnology, ubiquitous computing and the internet of things
Nanotechnology, ubiquitous computing and the internet of things
 
Identity REvolution multi disciplinary perspectives
Identity REvolution   multi disciplinary perspectivesIdentity REvolution   multi disciplinary perspectives
Identity REvolution multi disciplinary perspectives
 
MBAN medical body area network - first report and order
MBAN   medical body area network - first report and orderMBAN   medical body area network - first report and order
MBAN medical body area network - first report and order
 
Intimate technology - the battle for our body and behaviour
Intimate technology - the battle for our body and behaviourIntimate technology - the battle for our body and behaviour
Intimate technology - the battle for our body and behaviour
 
Human enhancement
Human enhancementHuman enhancement
Human enhancement
 
Making perfect life european governance challenges in 21st Century Bio-engine...
Making perfect life european governance challenges in 21st Century Bio-engine...Making perfect life european governance challenges in 21st Century Bio-engine...
Making perfect life european governance challenges in 21st Century Bio-engine...
 
Ambient intelligence
Ambient intelligenceAmbient intelligence
Ambient intelligence
 
GRAY MATTERS Integrative Approaches for Neuroscience, Ethics, and Society
GRAY MATTERS Integrative Approaches for  Neuroscience, Ethics, and SocietyGRAY MATTERS Integrative Approaches for  Neuroscience, Ethics, and Society
GRAY MATTERS Integrative Approaches for Neuroscience, Ethics, and Society
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Bag of Pursuits and Neural Gas for Improved Sparse Codin

  • 1. Bag of Pursuits and Neural Gas for Improved Sparse Coding Kai Labusch, Erhardt Barth, and Thomas Martinetz University of L¨ubeck Institute for Neuro- and Bioinformatics Ratzeburger Allee 160 23562 L¨ubeck, Germany {labusch,barth,martinetz}@inb.uni-luebeck.de Abstract. Sparse coding employs low-dimensional subspaces in order to encode high-dimensional signals. Finding the optimal subspaces is a difficult optimization task. We show that stochastic gradient descent is superior in finding the optimal subspaces compared to MOD and K-SVD, which are both state-of-the art methods. The improvement is most significant in the difficult setting of highly overlapping subspaces. We introduce the so-called ”Bag of Pursuits” that is derived from Or- thogonal Matching Pursuit. It provides an improved approximation of the optimal sparse coefficients, which, in turn, significantly improves the performance of the gradient descent approach as well as MOD and K-SVD. In addition, the ”Bag of Pursuits” allows to employ a generalized version of the Neural Gas algorithm for sparse coding, which finally leads to an even more powerful method. Keywords: sparse coding, neural gas, dictionary learning, matching pursuit 1 Introduction Many tasks in signal processing and machine learning can be simplified by choosing an appropriate representation of given data. There are a number of desirable properties one wants to achieve, e.g., coding-efficiency, resistance against noise, or invariance against certain transformations. In machine learn- ing, finding a good representation of given data is an important first step in order to solve classification or regression tasks. Suppose that we are given data X = (x1, . . . , xL), xi ∈ RN . We want to represent X as a linear combination of some dictionary C, i.e., xi = Cai, where C = (c1, . . . , cM ), cl ∈ RN . In case of M > N, the dictionary is overcomplete. In this work, we consider the following framework for dictionary design: We are looking for a dictionary C that minimizes the representation error Eh = 1 L L i=1 xi − Cai 2 2 (1) where xopt i = Cai with ai = arg mina xi − Ca , a 0 ≤ k denotes the best k-term representation of xi in terms of C. The number of dictionary elements
  • 2. 2 Labusch et al. M and the maximum number of non-zero entries k are user-defined model parameters. It has been shown that finding ai is in general NP-hard (Davis et al. (1997)). Methods such as Orthogonal Matching Pursuit (OMP) (Pati et al. 1993) or Optimized Orthogonal Matching Pursuit (OOMP, Rebollo-Neira and Lowe (2002)) can be used in order to find an approximation of the coefficients of the best k-term representation. The Method of Optimal Directions (MOD, Engan et al. (1999)) and the K-SVD algorithm (Aharon et al. (2006)) can employ an arbitrary approximation method for the coefficients in order to learn a dictionary from the data. First, the coefficients are determined, then they are considered fixed in order to update the dictionary. Using data that actually was generated as a sparse linear combination of some given dictionary, it has been shown (Aharon et al. (2006)) that methods such as MOD or K-SVD can be used in order to reconstruct the dictionary only from the data even in highly overcomplete settings under the presence of strong noise. However, our experiments show that even K-SVD which per- formed best in (Aharon et al. (2006)) requires highly sparse linear combina- tions for good dictionary reconstruction performance. The SCNG algorithm does not possess this deficiency (Labusch et al. (2009)), but, unlike MOD or K-SVD, it is bound to a specific approximation method for the coefficients, i.e., OOMP. Here, we propose a new method for designing overcomplete dictionaries that performs well even if the linear combinations are less sparse. Like MOD or K-SVD, it can employ an arbitrary approximation method for the coef- ficients. In order to demonstrate the performance of the method, we test it on synthetically generated overcomplete linear combinations of known dic- tionaries and compare the obtained performance against MOD and K-SVD. 2 From vector quantization to sparse coding Vector quantization learns a representation of given data in terms of so-called codebook vectors. Each given sample is encoded by the closest codebook vector. Vector quantization can be understood as a special case of sparse coding where the coefficients are constrained by ai 0 = 1 and ai 2 = 1, i.e., vector quantization looks for a codebook C that minimizes (1) choosing the coefficients according to (ai)m = 1, (ai)l = 0 ∀l = m where m = arg min l cl − xi 2 2 . (2) In order to learn a good codebook, many vector quantization algorithms consider only the winner for learning, i.e., the codebook vector cm for which (ai)m = 1 holds. As a consequence of that type of hard-competitive learning scheme, problems such as bad quantization, initialization sensitivity, or slow convergence can arise.
  • 3. A Bag of Pursuits and Neural Gas for Improved Sparse Coding 3 In order to remedy these problems soft-competitive vector quantization methods such as the NG algorithm (Martinetz et al. (1993)) have been pro- posed. In the NG algorithm all possible encodings are considered in each learning step, i.e., a1 i , . . . , aM i with (aj i )j = 1. Then, the encodings are sorted according to their reconstruction error xi − Caj0 i ≤ xi − Caj1 i ≤ · · · ≤ xi − Ca jp i ≤ · · · ≤ xi − CajM i . (3) In contrast to the hard-competitive approaches, in each learning iteration, ev- ery codebook vector cl is updated. The update is weighted according to the rank of the encoding that uses the codebook vector cl. It has been shown in (Martinetz et al. (1993)) that this type of update is equivalent to a gradient descent on a well-defined cost function. Due to the soft-competitive learn- ing scheme the NG algorithm shows robust convergence to close to optimal distributions of the codebook vectors over the data manifold. Here we want to apply this ranking approach to the learning of sparse codes. Similar to the NG algorithm, for each given sample xi, we consider all K possible coefficient vectors aj i , i.e., encodings that have at most k non- zero entries. The elements of each aj i are chosen such that xi − Caj i is minimal. We order the coefficients according to the representation error that is obtained by using them to approximate the sample xi xi − Caj0 i < xi − Caj1 i < · · · < xi − Ca jp i < · · · < xi − CajK i . (4) If there are coefficient vectors that lead to the same reconstruction error xi − Cam1 i = xi − Cam2 i = · · · = xi − CamV i , (5) we randomly pick one of them and do not consider the others. Note that we need this due to theoretical considerations while in practice this situation almost never occurs. Let rank(xi, aj i , C) = p denote the number of coefficient vectors am i with xi − Cam i < xi − Caj i . Introducing the neighborhood hλt (v) = e−v/λt , we consider the following modified error function Es = L i=1 K j=1 hλt (rank(xi, aj i , C)) xi − Caj i 2 2 (6) which becomes equal to (1) for λt → 0. In order to minimize (6), we consider the gradient of Es with respect to C, which is ∂Es ∂C = −2 L i=1 K j=1 hλt (rank(xi, aj i , C))(xi − Caj i )aj i T + R (7) with R = L i=1 K j=1 hλt (rank(xi, aj i , C)) ∂rank(xi, aj i , C) ∂C xi − Caj i 2 2. (8)
  • 4. 4 Labusch et al. In order to show that R = 0, we adopt the proof given in (Martinetz et al. (1993)) to our setting. With ej i = xi − Caj i , we write rank(xi, aj i , C) as rank(xi, aj i , C) = K m=1 θ((ej i )2 − (em i )2 ) (9) where θ(x) is the heaviside step function. The derivative of the heaviside step function is the delta distribution δ(x) with δ(x) = 0 for x = 0 and δ(x)dx = 1. Therefore, we can write R = L i=1 K j=1 hλt (rank(xi, aj i , C))(ej i )2 T m=1 ((aj i )T − (am i )T )δ((ej i )2 − (em i )2 ) (10) Each term of (10) is non-vanishing only for those aj i for which (ej i )2 = (em i )2 is valid. Since we explicitly excluded this case, we obtain R = 0. Hence, we can perform a stochastic gradient descent on (6) with respect to C by applying t = 0, . . . , tmax updates of C using the gradient based learning rule ∆C = αt K j=1 hλt (rank(xi, aj i , C))(xi − Caj i )ai j T (11) for a randomly chosen xi ∈ X where λt = λ0 (λfinal/λ0) t tmax is an expo- nentially decreasing neighborhood-size and αt = α0 (αfinal/α0) t tmax an expo- nentially decreasing learning rate. After each update has been applied, the column vectors of C are renormalized to one. Then the aj i are re-determined and the next update for C can be performed. 3 A bag of orthogonal matching pursuits So far, for each training sample xi, all possible coefficient vectors aj i , j = 1, . . . , K with aj i 0 ≤ k have been considered. K grows exponentially with M and k. Therefore, this approach is not applicable in practice. However, since in (6) all those contributions in the sum for which the rank is larger than the neighborhood-size λt can be neglected, we actually do not need all possible coefficient vectors. We only need the first best ones with respect to the reconstruction error. There are a number of approaches, which try to find the best coefficient vector, e.g., OMP or OOMP. It has been shown that in well-behaved cases they can find at least good approximations (Tropp (2004)). In the follow- ing, we extend OOMP such that not only the best but the first Kuser best coefficients are determined, at least approximately. OOMP is a greedy method that iteratively constructs a given sample x out of the columns of the dictionary C. The algorithm starts with Uj n = ∅,
  • 5. A Bag of Pursuits and Neural Gas for Improved Sparse Coding 5 Rj 0 = (r0,j 1 , . . . , r0,j M ) = C and j 0 = x. The set Uj n contains the indices of those columns of C that have been used during the j-th pursuit with respect to x up to the n-th iteration. Rj n is a temporary matrix that has been orthogonalized with respect to the columns of C that are indexed by Uj n. rn,j l is the l-th column of Rj n. j n is the residual in the n-th iteration of the j-th pursuit with respect to x. In iteration n, the algorithm looks for that column of Rj n whose inclusion in the linear combination leads to the smallest residual j n+1 in the next iteration of the algorithm, i.e., that has the maximum overlap with respect to the current residual. Hence, with yj n = (rn,j 1 T j n)/ rn,j 1 , . . . , (rn,j l T j n)/ rn,j l , . . . , (rn,j M T j n)/ rn,j M (12) it looks for lwin(n, j) = arg maxl,l/∈Uj n (yj n)l. Then, the orthogonal projection of Rj n to rn,j lwin(n,j) is removed from Rj n Rj n+1 = Rj n − (rn,j lwin(n,j)(Rj n T rn,j lwin(n,j))T )/(rn,j lwin(n,j) T rn,j lwin(n,j)) . (13) Furthermore, the orthogonal projection of j n to rn,j lwin(n,j) is removed from j n j n+1 = j n − ( j n T rn,j lwin(n,j))/(rn,j lwin(n,j) T rn,j lwin(n,j)) rn,j lwin(n,j) . (14) The algorithm stops if j n = 0 or n = k. The j-th approximation of the coefficients of the best k-term approximation, i.e., aj , can be obtained by recursively tracking the contribution of each column of C that has been used during the iterations of pursuit j. In order to obtain a set of approxima- tions a1 , ..., aKuser , where Kuser is chosen by the user, we want to conduct Kuser matching pursuits. To obtain Kuser different pursuits, we implement the following function: Q(l, n, j) =    0 : If there is no pursuit among all pursuits that have been performed with respect to x that is equal to the j-th pursuit up to the n-th iteration where in that iteration column l has been selected 1 : else . (15) Then, while a pursuit is performed, we track all overlaps yj n that have been computed during that pursuit. For instance if a1 has been determined, we have y1 0, . . . , y1 n, . . . , y1 s1−1 where s1 is the number of iterations of the 1st pursuit with respect to x. In order to find a2 , we now look for the largest overlap in the previous pursuit that has not been used so far ntarget = arg max n=0,...,s1−1 max l,Q(l,n,j)=0 (y1 n)l (16) ltarget = arg max l (y1 ntarget )l . (17)
  • 6. 6 Labusch et al. We replay the 1st pursuit up to iteration ntarget. In that iteration, we select column ltarget instead of the previous winner and continue with the pursuit until the stopping criterion has been reached. If m pursuits have been per- formed, among all previous pursuits, we look for the largest overlap that has not been used so far: jtarget = arg max j=1,...,m max n=0,...,sj −1 max l,Q(l,n,j)=0 (yj n)l (18) ntarget = arg max n=0,...,sjtarget −1 max l,Q(l,n,jtarget)=0 (yjtarget n )l (19) ltarget = arg max l,Q(l,ntarget,jtarget)=0 (yjtarget ntarget )l . (20) We replay pursuit jtarget up to iteration ntarget. In that iteration, we select column ltarget instead of the previous winner and continue with the pursuit until the stopping criterion has been reached. We repeat this procedure until Kuser pursuits have been performed. 4 Experiments In the experiments we use synthetic data that actually can be represented as sparse linear combinations of some dictionary. We perform the experiments in order to asses two questions: (i) How good is the target function (1) min- imized? (ii) Is it possible to obtain the generating dictionary only from the given data? In the following Ctrue = (ctrue 1 , . . . , ctrue 50 ) ∈ R20×50 denotes a synthetic dictionary. Each entry of Ctrue is uniformly chosen in the interval [−0.5, 0.5]. Furthermore, we set ctrue l = 1. Using such a dictionary, we create a training set X = (x1, . . . , x1500), xi ∈ R20 where each training sample xi is a sparse linear combination of the columns of the dictionary: xi = Ctrue bi . (21) We choose the coefficient vectors bi ∈ R50 such that they contain k non-zero entries. The selection of the position of the non-zero entries in the coefficient vectors is performed according to three different data generation scenarios: Random dictionary elements: In this scenario all combinations of k dic- tionary elements are possible. Hence, the position of the non-zero entries in each coefficient vector bi is uniformly chosen in the interval [1, . . . , 50]. Independent Subspaces: In this case the training samples are located in a small number of k-dimensional subspaces. We achieve this by defining 50/k groups of dictionary elements, each group containing k randomly selected dictionary elements. The groups do not intersect, i.e., each dic- tionary element is at most member of one group. In order to generate a training sample, we uniformly choose one group of dictionary elements and obtain the training sample as a linear combination of the dictionary elements that belong to the selected group.
  • 7. A Bag of Pursuits and Neural Gas for Improved Sparse Coding 7 Dependent subspaces: In this case, similar to the previous scenario, the training samples are located in a small number of k-dimensional sub- spaces. In contrast to the previous scenario, the subspaces do highly in- tersect, i.e., the subspaces share basis vectors. In order to achieve this, we uniformly select k−1 dictionary elements. Then, we use 50−k+1 groups of dictionary elements where each group consists of the k−1 selected dic- tionary elements plus one further dictionary element. Again, in order to generate a training sample, we uniformly choose one group of dictionary elements and obtain the training sample as a linear combination of the dictionary elements that belong to the selected group. The value of the non-zero entries is always chosen uniformly in the interval [−0.5, 0.5]. We apply MOD, K-SVD and the stochastic gradient descent method that is proposed in this paper to the training data. Let Clearned = (clearned 1 , . . . , clearned 50 ) denote the dictionary that has been learned by one of these methods on the basis of the training samples. In order to measure the performance of the methods with respect to the minimization of the target function, we consider Eh = 1 1500 1500 i=1 xi − Clearned ai 2 2 (22) where ai is obtained from the best pursuit out of Kuser = 20 pursuits that have been performed according to the approach described in section 3. In order to asses if the true dictionary can be reconstructed from the training data, we consider the mean maximum overlap between each element of the true dictionary and the learned dictionary: MMO = 1 50 50 l=1 max k=1,...,50 |ctrue l clearned k | . (23) k, the number of non-zero entries is varied from 1 to 11. For the stochastic gradient descent method, we perform 100×1500 update steps, i.e., 100 learn- ing epochs. For MOD and K-SVD, we perform 100 learning iterations each iteration using 1500 training samples. We repeat all experiments 50 times and report the mean result over all experiments. In the first set of experiments, for all dictionary learning methods, i.e., MOD, K-SVD, and stochastic gradient descent, a single orthogonal matching pursuit is used in order to obtain the dictionary coefficients during learning. Furthermore, the stochastic gradient descent is performed in hard-competitive mode, which uses a neighborhood-size that is practically zero, i.e., λ0 = λfinal = 10−10 . The results of this experiment are depicted in table 1 (a)-(c) and (j)- (l). In case of the random dictionary elements scenario (see (a) and (j)) the stochastic gradient approach clearly outperforms MOD and K-SVD. From the mean maximum overlap it can be seen that almost all dictionary ele- ments are well reconstructed with up to 6 non-zero coefficients in the linear
  • 8. 8 Labusch et al. Random dictionary elements Independent subspaces Dependent subspaces 2 4 6 8 10 0 0.2 0.4 0.6 k E h (a) HC−SGD K−SVD MOD 2 4 6 8 10 0 0.2 0.4 0.6 k E h (b) HC−SGD K−SVD MOD 2 4 6 8 10 0 0.2 0.4 0.6 k Eh (c) HC−SGD K−SVD MOD 2 4 6 8 10 0 0.2 0.4 0.6 k E h (d) HC−SGD(BOP) K−SVD(BOP) MOD(BOP) 2 4 6 8 10 0 0.2 0.4 0.6 k E h (e) HC−SGD(BOP) K−SVD(BOP) MOD(BOP) 2 4 6 8 10 0 0.2 0.4 0.6 k Eh (f) HC−SGD(BOP) K−SVD(BOP) MOD(BOP) 2 4 6 8 10 0 0.2 0.4 0.6 k E h (g) SC−SGD HC−SGD(BOP) 4 6 8 10 0 0.05 0.1 0.15 0.2 k E h (h) SC−SGD HC−SGD(BOP) 4 6 8 10 0 0.05 0.1 0.15 0.2 kEh (i) SC−SGD HC−SGD(BOP) 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (j) HC−SGD K−SVD MOD 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (k) HC−SGD K−SVD MOD 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (l) HC−SGD K−SVD MOD 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (m) HC−SGD(BOP) K−SVD(BOP) MOD(BOP) 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (n) HC−SGD(BOP) K−SVD(BOP) MOD(BOP) 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (o) HC−SGD(BOP) K−SVD(BOP) MOD(BOP) 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (p) SC−SGD HC−SGD(BOP) 2 4 6 8 10 0.6 0.7 0.8 0.9 1 k meanmaxoverlap(MMO) (q) SC−SGD HC−SGD(BOP) 4 6 8 10 0.6 0.7 0.8 0.9 k meanmaxoverlap(MMO) (r) SC−SGD HC−SGD(BOP) Table 1. Experimental results. See text for details. SC-SGD: soft-competitive stochastic gradient descent (λ0 = 20, λfinal = 0.65). HC-SGD: hard-competitive stochastic gradient descent (λ0 = λfinal = 10−10 ).
  • 9. A Bag of Pursuits and Neural Gas for Improved Sparse Coding 9 combinations. If the dictionary elements cannot be reconstructed any more, i.e., for k > 6, the mean representation error Eh starts to grow. In case of the independent subspaces and dependent subspaces scenario the stochastic gra- dient method also outperforms MOD and K-SVD in terms of minimization of the representation error (see (b) and (c)) whereas in terms of dictionary reconstruction performance only in the intersecting subspaces scenario a clear performance gain compared to MOD and K-SVD can be seen ((k) and (l)). This might be caused by the fact that in case of the independent subspaces scenario it is sufficient to find dictionary elements that span the subspaces where the data is located in order to minimize the target function, i.e., the scenario does not force the method to find the true dictionary elements in order to minimize the target function. In the second experiment for all methods the dictionary coefficients were obtained from the best pursuit out of Kuser = 20 pursuits that were per- formed according to the “Bag of Pursuits” approach described in section 3. The results of this experiment are depicted in table 1 (d)-(f) and (m)-(o). Compared to the results of the first experiment (see (a)-(c)) it can be seen that the computationally more demanding method for the approximation of the best coefficients leads to a significantly improved performance of MOD, K-SVD and the stochastic gradient descent with respect to the minimization of the representation error Eh (see (d)-(f)). The most obvious improvement can be seen in case of the dependent subspaces scenario where also the dictio- nary reconstruction performance significantly improves (see (l) and (o)). In the random dictionary elements (see (j) and (m)) and independent subspaces scenario (see (k) and (n)) there are only small improvements with respect to the reconstruction of the true dictionary. In the third experiment, we employed soft-competitive learning in the stochastic gradient descend, i.e., the coefficients corresponding to each of the Kuser = 20 pursuits were used in the update step according to (11) with λ0 = 20 and λfinal = 0.65. The results of this experiment are depicted in table 1 (g)-(i) and (p)-(r). It can be seen that for less sparse scenarios, i.e. k > 6, the soft-competitive learning further improves the performance. Particularly in case of the dependent subspaces scenario a significant improvement in terms dictionary reconstruction performance can be seen for k > 4 (see (r)). For very sparse settings, i.e. k ≤ 4, the hard-competitive approach seems to perform better than the soft-competitive variant. Again, in case of the inde- pendent subspaces only the representation error decreases (see (h)) whereas no performance gain for the dictionary reconstruction can be seen (see (q)). Again this might be caused by the fact that in case of the subspace scenario learning of dictionary elements that span the subspaces is sufficient in order to minimize the target function.
  • 10. 10 Labusch et al. 5 Conclusion We proposed a stochastic gradient descent method that can be used either for hard-competitive or soft-competitive learning of sparse codes. We introduced the so-called “bag of pursuits” in order to compute a better estimation of the best k-term approximation of given data. This method can be used together with a generalization of the neural gas approach to perform soft-competitive stochastic gradient learning of (overcomplete) dictionaries for sparse coding. Our experiments on synthetic data show that compared to other state-of- the-art methods such as MOD or K-SVD a significant performance improve- ment in terms of minimization of the representation error as well as in terms of reconstruction of the true dictionary elements that were used to generate the data can be observed. While a significant performance gain is already ob- tained by hard-competitive stochastic gradient descend an even better per- formance is obtained by using the “bag of pursuits” and soft-competitive learning. In particular, as a function of decreasing sparseness, the perfor- mance of the method described in this paper degrades much slower than that of MOD and K-SVD. This should improve the design of overcomplete dictionaries also in more complex settings. References AHARON, M., ELAD, M. and BRUCKSTEIN. A. (2006): K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing,[see also IEEE Transactions on Acoustics, Speech, and Signal Processing], 54(11):4311–4322. DAVIS, G., MALLAT, S. and AVELLANEDA. M. (1997): Greedy adaptive ap- proximation. J. Constr. Approx., 13:57–89. K. ENGAN, K., AASE, S. O. and HAKON HUSOY., J. (1999): Method of op- timal directions for frame design. In ICASSP ’99: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 2443–2446, Washington, DC, USA, 1999. IEEE Computer Society. LABUSCH, K., BARTH, E. and MARTINETZ. T. (2009): Sparse Coding Neural Gas: Learning of Overcomplete Data Representations. Neurocomputing, 72(7- 9):1547–1555. MARTINETZ, T., BERKOVICH, S. and SCHULTEN. K. (1993): “Neural-gas” Network for Vector Quantization and its Application to Time-Series Predic- tion. IEEE-Transactions on Neural Networks, 4(4):558–569. PATI, Y., REZAIIFAR, R. and KRISHNAPRASAD. P. (1993): Orthogonal Match- ing Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. Proceedings of the 27 th Annual Asilomar Conference on Sig- nals, Systems, and Computers. REBOLLO-NEIRA, L. and LOWE. D. (2002): Optimized orthogonal matching pursuit approach. IEEE Signal Processing Letters, 9(4):137–140. TROPP. J. A. (2004): Greed is good: algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242.