Algorithms for Sparse Signal Recovery in Compressed Sensing
Perfusion deconvolution via em algorithm
1. Mat-2.108 Independent Research Project in Applied Mathematics
Perfusion Deconvolution
via EM Algorithm
27th January 2004
Helsinki University of Technology
Department of Engineering Physics and Mathematics
Systems Analysis Laboratory
Helsinki Brain Research Center
Functional Brain Imaging Unit
Tero Tuominen
51687J
3. List of abbreviations and symbols
MRI Magnetic Resonance Imaging
fMRI Functional Magnetic Resonance Imaging
PWI Perfusion Weighted Imaging
EM Expectation Maximum
MLE Maximum Likelihood Estimate
MTT Mean Transit Time
CBV Cerebral Blood Volume
CBF Cerebral Blood Flow
SNR Signal-to-Noise Ratio
TR Time-to-Repeat
TE Time-to-Echo
EPI Echo-Planar Imaging
AIF a(t) Arterial Input Function
TCC c(t) Tissue Concentration Curve
r(t) Residue Function
Ψ(t) Impulse Response; Ψ(t) = CBF · r(t)
a vector or matrix, a ∈ n×m , n, m > 1
a scalar, a ∈
A random variable
a realization
A random vector or matrix
a realization of random vector or matrix
ii
4. 1 Introduction
Since its introduction in 1988 perfusion weighted fMRI has gained widespread
interest in the field of medical imaging. It offers an easy and - most importantly - a
non-invasive method for monitoring brain perfusion and even its minor changes
in vivo. General principles of perfusion weighted imaging (PWI) were introduced
by Villinger et al. in 1988 [1] and further developed by Rosen et al. in 1989 [2].
By injecting a bolus of intravascular paramagnetic contrast agent and observing
its first passage concentration-time curves in the brain they were able to gain a
valuable insight to functioning of the living organ.
The theory of kinetics of intravascular tracers was developed by Meier and
Zierler in 1954 [3]. To gain all the knowledge methodologically possible one must
recover so called impulse response function for each volume of interest. This func-
tion characterises the local perfusion properties. According to the work of Meier
and Zierler, however, in order to recover this function one must solve an integral
equation of the form
t
c(t) = a(τ )Ψ(t − τ ) dτ,
0
This is a typical equation of class of equatiations known as Fredholm’s integral
equations. The integral also represent so called convolution; thus solving this
kind of equation is widely known as deconvolution.
Deconvolution belongs to a class of inversion problems. That is, the theory of
Meier and Zierler (equation above) describes the change in the input function a
as it experiences the changes resulting from the properties of the vasculature and
local perfusion (charactirezied by impulse response Ψ). The result is a new func-
tion c. The inverse of this problem emerges when one measures input function a
and its counterpart c and asks from what kind of mechanism do these changes
originate from, i.e. what is the impulse response Ψ.
Several methods have been proposed to solve the inversion problem. Tradi-
tional methods such as Fourier and Laplace techniques fail in this case due to the
significant amount of noise that is present in the measurements. The noisy data
and the form of the problem as a typically hard-to-solve Fredholm’s equation
make an additional requirement for the method used to solve the problem: the
solution has to be recovered so that the effect of noise is either cancelled out or
in some other way ignored because an exact solution computed directly from the
noisy data is heavily biased and physiologically meaningless. This fact highlights
the significance of the physical model which the solution method is based on.
The current standard method is based on an algebraic decomposition method
known as Singular Value Decomposition (SVD). It requires the discretization of
the equation and then reqularises the ill-conditioned system of equations by cut-
ting off the smallest singular values. The method was introduced to the field by
1
5. Østergaard et al. [4].
An alternative methodology for inversion is based on probabilistic formula-
tion of the model for the problem and then solving it in term of maximum likeli-
hood. Such a method was first introduced by Vonken et al. in 1999 [5]. It is based
on the Expectation-Maximum (EM) algorithm developed by Dempster et al. in
1977 [6]. The EM algorithm was introduced to the field of medical imaging in-
dependently by Shepp and Vardi in 1982 [7] and Lange and Carson in 1984 [8]
and further developed by Vardi, Shepp and Kaufman [9]. Vonken’s work relies
heavily on that of Lange’s.
There are four goals for this work. First, there is no comprehensive descrip-
tion of the EM-based perfusion deconvolution; Vonken’s paper is very dense and
brief in what comes to the theory. In some parts it is even inaccurate and falsely
justified. So here we try to offer a comprehensive and thorough desription of the
EM algorithm and its application. We shall take an excessive care to formulate
our presentation in a mathematically fluent form.
Secondly, Vonken tries to base his version of the algorithm on the physical
model but fails to some extent. He simplifies on the expense of the physical model
by borrowing one result directly from Lange. The problem is that the result is de-
rived assuming Poisson distribution for random variates which in reality follow
normal distribution. In this work we correct this assumpition and also the other
inaccuarte parts of Vonken’s work and see wether the results are affected.
Third, we try to repeat Vonken’s results and for this purpose a computer pro-
gram had to created. We also implement the proposed changes and try to com-
pare their effects. These programs are to be created in such a manner that they can
later serve as research tools at the Helsinki Brain Research Center. The HBRC cur-
rently lacks such tools. The comparison of the methods is carried out by Monte
Carlo simulations. Since the main interest in this report, however, is in the the-
oretical aspects of the EM application we do not concentrate too much on the
simulations and thus they are not meant to fully cover the subject.
The fourth and the last goal for this report is to fulfill to requirements of course
Mat-2.108 Independent Research Project in Applied Mathematics at Helsinki Univer-
sity of Technology in Systems Analysis Laboratory.
This report is organized as follows. First in chapter 2 the perfusion model and
the problem description are represented. Also the SVD solution method and dis-
cretization is dealt with. Then the chapter 3 describes the general EM algorithm. It
is followed by in introductory example of the use of EM in typical problem, that
is, the EM complete-data embedding derived and used by Lange [8] and later
adopted by Vonken [5] is revisited. The aim is to offer a simple example and lay
grounds for the later developements and representation of Vonken’s work. Such
derivation is not present even in the original Lange’s article. The next chapter 4
is entirely devoted to the derivation of the corrected probabilistic model and the
2
6. EM algorithm based on it. Since the simplifications used by Vonken are omitted
the derivation is tedious.
The later chapter include the description of the simulation and their results.
The last chapter gives the conlusions.
3
7. 2 Perfusion Model and Problem Description
Villinger and Rosen introduced the general principles of MR perfusion imaging
in 1988 and 1989 ([1],[2]). Using paramagnetic intravascular contrast agent he
was able to detect measurable change in time series of MR signal S(t). Assum-
ing linear relatioship between concentration of a contrast agent c(t) and change
in transverse relaxation rate ∆R2 the concentration as a function of time can be
characterized as
1 S(t)
c(t) ∝ ∆R2 = − ln , (1)
TE S0
where S0 is the baseline intensity of the signal.
For intravascular tracer, i.e. tracers that remain strictly inside the vasculature,
theoretical framework for mathematical analysis was developed by Meier and
Zierler in 1954 [3]. According to their work the concentration of a contrast agent
in vasculature as a function of time can be represented as
t
c(t) = F a(τ )r(t − τ ) dτ, (2)
0
where a(t) is the concentration in large artery (also called Arterial Input Func-
tion, AIF) feeding the volyme of interest (VOI). c(t) on the left hand side of the
equation 2 typically refers to concentration further in tissue and is thus also called
Tissue Concentration Curve or TCC. r(t) is so called residue function which is the
fraction of tracer remaining in the system at time t. Formally it is defined as
t
r(t) = 1 − h(s) ds, (3)
0
where h(t) is the distribution of transit times, i.e. the time a plasma particle takes
to travel through the capillary vasculature detectable by dynamic susceptibility
contranst MRI (DSC-MRI). That is, h(t) is a probability density function. Hence
r(t) has the following properties: r(0) = 1 and r(∞) = 0. In practice it is also
possible that the TCC is delayed be some time td due to the non-zero distance
from where the AIF was measured to where the TCC is measured. In theory, this
shifts r(t) to right. Hence, more general form of the residue function is
0 t < td
rd (t) = (4)
r(t − td ) t ≥ td
From now on we will use more general rd (t) without explicit statement and de-
note it simply as r(t).
In perfusion weighted imaging the TCC c(t) and AIF a(t) are measured. The
goal is in finding the solution to integral equation 2, i.e finding out the impulse
4
8. response Ψ(t) = F · r(t). This impulse response characterizes the prorerties of the
underlying vasculature to the extent that is methodologically possible.
In practical PWI the main interest, however, are the parameters MTT and CBF,
whos interdependency is characterized by the Central Volume Theorem [3]
CBV = M T T · CBF (5)
MTT is so called Mean Transit Time, i.e. the expectancy of h(t) and CBF is Cerebral
Blood Flow, that is, F in equation 2. The CBV is simply the area under the c(t)
curve. In this work we concentrate on recovering only the CBF. Anyway, for this
purpose the whole impulse response has to be recovered.
2.1 Discretization: 0th order approximation
The measurements of a(t) and c(t) are made in discrete time intervals {t0 , t1 , t2 , . . . , tn }
where time between each measurement is ∆t = T R. This represents natural dis-
cretization for the problem 2. Traditionally eq. 2 is discretized directly with an
assumption that both the a(t) and the c(t) are constants over the time interval
∆t [4].
This zeroth order (step function) approximation of the convolution integral 2
leads to following linear formulation for the problem
tj j
c(tj ) = cj = a(τ )Ψ(tj − τ )dτ ≈ ∆t ai Ψj−i (6)
0 i=0
where a(ti ) = ai and Ψ(tj ) = Ψj .
By defining matrix a0◦ ∈ n×n as
···
a0 0 0
a1 a0 ··· 0
a0◦ = ∆t .
. ... .
.
(7)
. .
an an−1 · · · a0
n×1 n×1
and discrete versions of Ψ(t) and c(t) as column vectors Ψ ∈ and c ∈
it is possible to rewrite approximated eq. 6 briefly as
c = a0◦ · Ψ (8)
In practice, however, T R is of magnitude of seconds and a(t) varies between
magnitude of 10 to 30 within a few seconds. This naturally gives rise to a dis-
cretization errors.
5
9. 2.2 SVD Solution to Deconvolution
Traditionally in perfusion fMRI the equation 8 is solved via Singular Value De-
composition (SVD) [4]. This regularises typically ill-conditined system of linear
equations 8. In general SVD of matrix a ∈ m×n is
a=U·D·V (9)
where U ∈ m×m and V ∈ n×n are orthogonal so that U · U = V · V = I. I is
an identity matrix. D is a diagonal matrix with same dimensionality as a and its
elements are so called singular values {σ i }n , i.e. D = diag{σ i }.
i=1
SVD’s regularizing properties come up simply in inverting the decomposed
matrix a. From 9 it is easy to see that
a−1 = V · diag{1/σ i } · U (10)
Now, if singular value are very small, i.e. σ i << 1 the inversion becomes instable
as the elements in the diagonal grow. Hence a pseudo-inversion is performed in
case of small singular values, that is, large elements 1/σ i corresponding to small
singular values σ i are simply set to zero. In practise this requires a threshold un-
der which singular values are ingored. In case of perfusion inversion this thresh-
old has been shown to be 0.2×the largest singular value [4].
SVD solution (pseudo-inverse) is not suitable for approximation represented
in next subsection because trapezoidal approximation weightes separate elements
of a differently.
2.3 Discretization: 1st order approximation
The first order (trapezoidal) approximation for the convolution integral 2 is adopted
from Jacquez [10]. The measurements of a(t) and c(t) are made in discrete time
intervals {t0 , t1 , t2 , . . . , tn }. Now 2 at time tj is approximated as
∆t j
cj ≈ (aj−i Ψi + aj−i+1 Ψi−1 ) (11)
2 i=1
Assuming a0 = 0 and defining a1◦ as
a1 0 ··· 0
a2 2a1 ··· 0
∆t a3 2a2 2a1 0
a1◦ = (12)
2 .
. .. .
.
. . .
an 2an−1 · · · 2a1
6
10. we can write 11 briefly in vector notation as
c = a1◦ · Ψ (13)
This does not help in case of SVD solution but might be of assistance where direct
discrete convolution is needed. EM is one of these.
7
11. 3 EM Algorithm
McLachlan encapsulates the essence of EM algorithm as [11]
The Expectation-Maximization (EM) algorithm is a broadly applicable
approach to the iterative computation of maximum likelihood (ML)
estimates, useful in a variety of incomplete-data problems [. . . ] On
each iteration of the EM algorithm, there are two steps – called the
expectation step or the E-Step and the maximization step or the M-step.
[. . . ] The notion of ’incomplete-data’ includes the conventional sense
of missing data, but it also applies to situations where the complete
data represents what would be available from some hypothetical ex-
periment. [. . . ] even when a problem does not at first appear to be an
incomplete-data one, computation of MLE is often greatly facilitated
by artificially formulating it be as such.
The first general treatment of the EM algorithm was published by Dempster et
al. in 1977 [6]. Since then it has been applied in numerous different fields. In per-
fusion fMRI it was first used by Vonken et al. in 1999 [5]. Vonken’s work relies
heavily on that of Lange’s in 1984 [8]. Lange, however, applied EM to PET image
reconstruction.
In this chapter first a brief overview of the EM algorithm is offered. It culmi-
nates to statemenst of both the E- and M-steps in eqs. 18 and 19. This is followed
by introductory overview of Lange’s method [8] which is meant to offer a compre-
hensive example of the use of EM in typical problem. Next Vonken’s method [5]
is introduced. Excessive care has been taken to formulate the made assumption
in mathematically fluent form.
3.1 Overview of EM Algorithm
Here we offer a brief recap of the EM theory imitating McLachlan’s book [11].
Let Y be the random vector corresponding to the observed data y, that is,
y is Y’s realization. Y has probability density function (pdf) g(y; Ψ) where Ψ
is the vector containing the unknown parameters to be estimated. Respectively
complete-data random vector will be denoted by X and respectively its realiza-
tion as x. X has the pdf f (x; Ψ).
The complete-data log likelihood function that could be formed for Ψ if x
were fully observable is
ln L(Ψ) = ln f (x; Ψ) (14)
Define h as many-to-one mapping from complete-data sample space X to
incomplete-data sample space Y
h:X →Y (15)
8
12. Now we do not observe complete-data x in X but instead incomplete-data
y = h(x) in Y. Thus,
g(y; Ψ) = f (x; Ψ) dx, (16)
X (y)
where X (y) is the subset of the complete-data sample space X determined by the
equation y = h(x).
The eq. 16 in discrete form is
g(y; Ψ) = f (x; Ψ) (17)
x:h(x)=y
Problem here is to solve incomplete-data (observable-data) log likelihood max-
imization. The main idea of EM is to solve it in terms of the complete-data rep-
resentation L(Ψ) = f (x; Ψ). As it is unobservable it is replaced by its conditional
expectation given y and current fit for Ψ which at iteration n is denoted by Ψ(n) .
In other words, the entire likelihood function is replaced by its conditional expec-
tation, not merely complete-data variates.
To crystallize the heuristic EM approach to concrete steps we have the follow-
ing:
First choose an initial value/guess Ψ(0) for the iteration to begin with.
Next carry out the the E-step i.e. calculate the conditional expectation of the
log likelihood function given the current parameter estimate Ψ(n) and the obser-
vations y
Q(Ψ; Ψ(n) ) = EΨ(n) [ ln L(Ψ) | y, Ψ(n) ] (18)
Finally the M-step: maximize Q(Ψ; Ψ(n) ) with respect to the parameters Ψ
Ψ(n+1) = arg max Q(Ψ; Ψ(n) ) (19)
Ψ
Now, if there are terms independent of Ψ in eq. 19 they do not contribute to
new Ψ(n+1) because they drop out in derivation (i.e. maximization) with respect
to Ψ. In some cases this eases the derivation.
3.2 EM Algorithm applied to Perfusion Deconvolution
3.2.1 Lange’s Method in PET image reconstruction
Here we review Lange’s derivation of his version of the physically based EM
algorithm. It is meant to serve as an introductory example and to clarify the use
of EM in practise.
The idea in PET is to recover the values of the emission intensity Ψj when one
sees only the sum of the emission over a finite time interval. Let the number of
emissions from pixel j during projection i be the random variate Xij
Xij ∼ P oisson(cij Ψj ) (20)
9
13. where cij ’s are assumed to be known constants. Next define the observable quan-
tity, i.e. their sum, be the number of emission recorded for projection i as the
random variate Yi
Yi = Xij (21)
j
Hence
Yij ∼ P oisson( cij Ψj ) (22)
j
From 20 it follows that
(cij Ψj )xij −cij Ψj
P [Xij = xij ] = e (23)
xij !
and so
f (x; Ψ) = P [Xij = xij ] (24)
i j
Thus with 14 we have
ln L(Ψ) = { xij ln(cij Ψj ) − cij Ψj − ln xij ! } (25)
i j
and eq. 18 yields based on the linearity of the expectation
Q(Ψ; Ψ(n) ) = EΨ(n) [ ln L(Ψ) | y, Ψ(n) ]
= { E[ Xij | y, Ψ(n) ] ln(cij Ψj ) − cij Ψj } + R (26)
i j
R does not depend on Ψ. It includes the term E[ ln Xij ! | y, Ψ(n) ] which would
be difficult to calculate.
Conditional expectation can be derived as follows
yi
(n)
E[ Xij | y, Ψ ]= k · P [ Xij = k | y, Ψ(n) ] (27)
k=0
where
P [Xij = k, Yi = yi ]
P [ Xij = k | y, Ψ(n) ] =
P [Yi = yi ]
P [Xij = k, pj Xip = yi − k]
=
P [Yi = yi ]
(n)
yi (cij Ψj )k ( pj cip Ψ(n) )yi −k
p
= (n) yi
(28)
k ( p cip Ψp )
10
14. because Ψ(n) is a parameter vector and Xij is independent of other Yj s except of
the Yi to which itself contributes. Substituting this to eq. 27 and using
n
n k n−k
a b = (a + b)n (29)
k=0 k
and
yi yi − 1
k = yi , yi ≥ k > 1 (30)
k k−1
we finally get the conditional expectation for Xij and denote it by Nij
(n)
(n) yi cij Ψj
Nij = E[ Xij | y, Ψ ]= (n)
(31)
p cip Ψp
Now, if the initial guess Ψ(0) is positive then Nij s are all positive. Hence E-step is
completed and yields
Q(Ψ; Ψ(n) ) = { Nij ln(cij Ψj ) − cij Ψj } + R (32)
i j
Now M-step is performed by derivating eq. 32 with respect to Ψ and equating
its derivatives to zero. Derivation yields
∂ Nij
Q(Ψ; Ψ(n) ) = − cij (33)
∂Ψj i Ψj i
(n+1)
and setting it to zero and solving for Ψj yields the new estimate Ψj
(n)
(n+1) iNij Ψj yi cij
Ψj = = (n)
(34)
i cij i cij i p cip Ψp
This solution truly maximizes Q. It can be seen as follows. Q’s second derivative
is
∂2 Nij
Q(Ψ; Ψ(n) ) = − 2
(35)
∂Ψi ∂Ψj j Ψj
when i = j and zero otherwise. Thus the quadratic form Ψ H(Ψ)Ψ, where H
denotes the Hessian matrix of Q, is strictly negative for all Ψj ≥ 0. That is, the
eq. 34 represents the point of concave function where its gradient is equal to zero
vector.
11
15. 3.2.2 Vonken’s Method
Here we review the application of the EM algorithm to perfusion weighted fMRI
published by Vonken in 1999 [5]. First the article is briefly referred and then some
of its flaws are pointed out. The notation is changed to correspond this document
but no changes beyond this have been made. In the next section we try to offer
more exact and thorough treatment of the subject and correct the contradictions
in Vonken’s work.
Vonken starts by defining the convolution operator a as a square matrix whose
elements are defined as
Ai−j if i − j ≥ 0
aij = (36)
0 otherwise
where Ai−j denotes AIF at time ti − tj , i.e. A(ti − tj ). Thus the operator a corre-
sponds to the zeroth order approximation for the convolution integral, i.e. eq 8 in
page 5.
The next two steps are responsible for cleverly formulating the complete-data
embedding. For this purpose Vonken assumes two distributions, one for the com-
plete and one for the observed data. The first one has the pdf f (X; Ψ) and it is
assumed to follow the normal distribution. The observed data is also assumed to
follow the normal distribution. Its pdf is g(C; Ψ). These normality assumption are
satisfactorily justified; especially the normality of C is treated thoroughly. First
Vonken defines the elements of complete-data matrix as
xij = aij Ψj (37)
and then naturally the linkage to the observed (incomplete-) data as
ci = xik = aik Ψk (38)
k k
The notation for current estimate of ci s based on the current estimate Ψ(n) is
(n)
ci =
˜ aik Ψk (39)
k
Next Vonken moves onwards to define the complete-data log likelihood func-
tion based on the assumption that the complete-data xij are distributed normally,
2 2
i.e. Xij ∼ N (aij Ψj , σij )· The variances σij are later taken to be equal and after all
in the M-step they cancel out. He says:
(n) (n)
" . . . using Eq. 38 and the expectancy E[Xij |c, Ψ(n) ] = ci ·aij Ψj / j aij Ψj
= ci /˜i ≡ Nij . This gives
c
(n)
E[ln f (X; Ψ)|c, Ψ(n) ] = ln P [Xij ] = − (aij Ψj − Nij )2 /2σij
2
i j i j
12
16. with P [Xij ] the probability of Xij and σij the standard deviation in the
complete-data representation."
From this Vonken proceeds to the M-step. He takes the derivative of the condi-
tional expectation above and equates this to zero. This yield a set of equations
(n)
aij (aij Ψj − Nij ) = 0 (40)
i
(n+1)
i.e. an equation for each Ψj . To finish, Vonken says: "A program has been
implemented that numerically solves Eq. 40 using a Newton-Raphson scheme."
The above summarization is not meant to be a complete description of the
Vonken’s article; rather it tries to describe the essential points of his derivation in
order to illustrate the the facts that are to be changed here. Here are the points
that seem to need changes.
First, Vonken’s notation could be more exact. He does not make notational
difference between random variates and their realizations. This might be a con-
sequence of the Lange’s work being the reference point throughout his work.
Secondly, more explicit expression of the assumpition used migth clarify the
derivation. Especially, even though Vonken let’s the reader to believe that the en-
tire derivation is faithfully based on the normality assumptions there is one point
where this is not the case. Namely when Vonken takes the conditional expecta-
tion E[Xij |c, Ψ(n) ] he does not mention its origins. In fact it is taken directly from
Lange [8]. The result, however, is derived based on the assumption of Poisson
(n)
distribution Xij ∼ P oisson(aij Ψj ). This may serve as a satisfactory approxima-
tion but is clearly incorrect and ungrounded here. Vonken’s obvious goal is to try
to ground his work on the physical model like Lange but here he deviated from
this without any explanation.
Finally, the calculation of the log likelihood of the complete-data is guestion-
albe. In EM theory the conditional expectation is taken from the entire log likeli-
hood function ln L(Ψ) as stated in eq. 18. If the log likelihood function is linear
in x in terms containing the parameter Ψj the result looks just like the xij s had
simply been replaced by their conditional expectations. For an example see eq. 25
in page 10. Here, however, the normality assumption leads to non-linear term
(n)
(aij Ψj − xij )2 whose conditional expectation with notation E[Xij |c, Ψ(n) ] ≡ Nij
(n)
is not (aij Ψj − Nij )2 as derived by Vonken. This might be the explanation for the
fast and sometimes instable convergence of the algorithm.
13
17. 4 Improved application of EM
Vonken’s reasoning in complete-data embedding is adopted and the convolution
operator a is defined as
Ai−j if i − j ≥ 0
aij = (41)
0 otherwise
where Ai−j denotes AIF at time ti − tj , i.e. A(ti − tj ). Thus a represents a zeroth
order approximation for the convolution integral as eq. 8 on page 5 shows.
The distribution of the measured values of time-series of AIF and TCC is as-
sumed the be normal as Vonken argued. This is also intuitively appealing as the
values at issue are measurement values of a physical quantity after almost linear
transformation.
Now A refers to both the convolution matrix which is treated as a random
variate (matrix) and also to the random vector of AIF values. After measurement
A is realized as a; first as the AIF and then after transfomation 41 also as the
convolution operator a. These two differ only at the notational level: aj refers to
the element of AIF whereas aij is an element of the operator 41.
Based on the previous reasoning the AIF values Ai are assumed to be normally
distributed around its mean which will be notated here with parameter µi , i.e.
E[Aij ] = µij . Later when the actual measurements are made and the developed
algorithm will be used to recover the residual this parameter will be replaced by
the measured aij , i.e Aij ’s realization. The variance associated with the parameter
2
naturally is σAIF . Explicitly
Aij ∼ N (µij , σ 2 )
AIF (42)
From this the distribution of the complete-data elements Xij can be easily de-
rived. They are defined as Xij = Aij Ψj and thus
Xij ∼ N (µij Ψj , (Ψj σ AIF )2 ) (43)
Thus the complete-data pdf is of the familiar exponential form and is from now
on denoted by fX (x; Ψ).
Now as the observed-data are defined as
Ci = Xik (44)
k
we have
Ci ∼ N ( µik Ψk , (Ψk σAIF )2 ) (45)
k k
From now on the pdf of random observed data vector C is denoted by gC (c; Ψ).
14
18. From eq. 43 one can easily formulate the complete-data log likelihood function
which is needed in the E-step
√ (µij Ψj − xij )2
ln L(Ψ) = { − ln( 2π (Ψj σAIF )2 ) − } (46)
i j 2(Ψj σAIF )2
Writing the binomial open and denoting by R the terms independent of Ψ the
conditional expectation of the log likelihood can be written as
Q(Ψ; Ψ(n) ) = EΨ(n) [ ln L(Ψ) | c, Ψ(n) ]
√ µij
= − ln( 2π (Ψj σAIF )2 ) + 2
E[ Xij | c, Ψ(n) ]
i j Ψj σAIF
1
− E[ Xij | c, Ψ(n) ] + R
2
(47)
2(Ψj σAIF )2
From eq. 47 it is clear that two different conditional expectations are needed:
E[ Xij | c, Ψ(n) ] = xij fX|C,Ψ(n) (xij |ci , Ψ(n) ) dxij (48)
E[ Xij | c, Ψ(n) ] =
2
x2 fX|C,Ψ(n) (xij |ci , Ψ(n) ) dxij
ij (49)
where fX|C,Ψ(n) (xij |ci , Ψ(n) ) refers to the current conditional pdf of Xij given c and
Ψ(n) . This can be found using basic property familiar from the probability theory
(n) (n)
fX|Ψ(n) (xij |Ψ(n) )
fX|C,Ψ(n) (xij |ci , Ψ ) = gC|X,Ψ(n) (ci |xij , Ψ ) (50)
gC|Ψ(n) (ci |Ψ(n) )
where gC|X,Ψ(n) (ci |xij , Ψ(n) ) refers respectively to the conditional pdf of Ci given
xij and current Ψ(n) . fX (xij ) and gC (ci ) are merely the pdfs of Xij and Ci .
The functions in eq. 50 expressed explicitly are:
(n)
1 (µij Ψj − xij )2
fX|Ψ(n) (xij |Ψ(n) ) = √ exp(− (n)
) (51)
(n)
2π (Ψj σAIF )2 2(Ψj σAIF )2
and
(n)
(n) 1 ( k µik Ψk − ci )2
gC|Ψ(n) (ci |Ψ )= √ exp(− (n)
) (52)
(n) 2 2
2π k (Ψk σAIF )
2 k (Ψk σAIF )
and
(n)
1 ( kj µik Ψk + xij − ci )2
gC|X,Ψ(n) (ci |xij , Ψ(n) ) = √ (n)
exp(− (n)
)
2 2
kj (Ψk σAIF )
2π 2
kj (Ψk σAIF )
(53)
15
19. The notation kj means that the sum is taken over all k exept j, in other
words kj zk = k zk − zj .
From equations 50 through 53 it is obvious that the results will get messy.
Therefore we define the following short-hand notations
(n)
µij Ψj = γij
(n)
(Ψj σAIF )2 = αj
(n)
(Ψk σAIF )2 = α
k
(n)
(Ψk σAIF )2 = βj
kj
(n)
(µik Ψk ) = γi
k
(n)
(µik Ψk ) = δij
kj
One must not confuse Ψ and Ψ(n) because the maximization in the M-step re-
(n)
quires derivation of Q with respect to each Ψj and iterates Ψj are treated as con-
stant parameters. Hence 52 has no dependency on xij it will be denoted merely
by gC (ci ) in the future.
The conditional expectations yield with defined notation
ci αj + γ βj − αj δij ( γi − ci )2
E[ Xij | c, Ψ(n) ] = √ exp(− ) (54)
2πgC (ci ) α−3/2 2 α
and
1
E[ Xij | c, Ψ(n) ] = √
2
(ci αj )2 + (γij βj )2 +
2πgC (ci ) α5/2
+2ci αj (γij βj − αj δij ) +
+αj βj ( βj − 2γij δij ) +
2 2 ( γi − ci )2
+αj ( βj + δij ) exp(− ) (55)
2 α
Substituting these to Q (eq. 47) have have completed the E-step.
Now, Q is of the form
Q(Ψ; Ψ(n) ) = Kij (Ψj ) (56)
i j
thus the derivation with respect to each Ψj yields
∂Q(Ψ; Ψ(n) ) ∂Kij (Ψj )
= (57)
∂Ψj i ∂Ψj
16
20. where the derivative of Kij (Ψj ) can be written as
∂Kij (Ψj )
= Λij Ψ−3 − Ωij Ψ−2 − Ψ−1
j j j (58)
∂Ψj
where we have again defined the following short-hand notations
1
Λij = √ 2
(ci αj )2 + (γij βj )2 +
2πσAIF gC (ci ) α5/2
+2ci αj (γij βj − αj δij ) +
+αj βj ( βj − 2γij δij ) +
2 2 ( γi − ci )2
+αj ( βj + δij ) exp(− ) (59)
2 α
and
µij (ci αj + γ βj − αj δij ) ( γi − ci )2
Ωij = √ 2
exp(− ) (60)
2πσAIF gC (ci ) α−3/2 2 α
Hence after the summation over i and multiplication by Ψ3 in eq. 56 we have the
j
equation for the root of the derivative eq. 57
Ψ2
j 1 + Ψj Ωij − Λij = 0 (61)
i i i
This second-degree equation can easily be solve for Ψj . Choosing the positive
root we have
(n+1) − i Ωij + ( i Ωij )2 + 4 i 1 i Λij
Ψj = Ψj = (62)
2 i 1
17
21. 5 This Work
5.1 Overview
The two main goals of this report are to descripe the EM-based deconvolution
method published by Vonken [5] and try to improve it and then to evaluate the
made changes by simulations. For this purpose both the Vonken’s methdod and
the new method were implemented on the MATLAB platform. Both methods
were also implemented using both the 0th and then the 1st order approximations
for the convolution integral. Therefore in total four different methods were to be
evaluated.
As stated in the introduction, however, this report concentrates mainly on
the theoretical aspects and the full evaluation is not included. Instead only the
reproductibility of CBF was studied.
The methods are evaluated using Monte Carlo simulation. For this purpose
the true values of AIF, TCC and impulse response Ψ have to be known. This
was achieved by creating a numerical integrator which computes the "true" TCC
based on given AIF and impulse response using eq. 2. This avoids the effect of
discretization errors arising from discretized eq. 8, for example. This method also
enables us easily to change all the parameters affecting the implulse response;
most importantly the delay is not binded to multiples of TR.
After the true functions are know the gaussian noise is added to TCC using
eq. 1. This noisy TCC is then used when performing the deconvolution by the
methods to be tested. Numerical values used in this work were S0 = 300 and
k = 1. Signal-to-Noise ratio was set to clinically interesting values of SN R = 35.
Vonken reported difficulties in deciding the optimal number of iterations needed.
In his clinical experiment he used four iterations. This number is adoted here,
also. This number is without any further investigation used for both the zeroth
and the first order approximations.
However, the convergence properties of the algorithm change dramatically
when the proposed changes are implemented. Empirically (try-and-error) the fol-
lowing iteration numbers were found: the zeroth order approximation was iter-
ated 100 times whereas in the case of the first order approximation the maximum
number of iterations was set to 400.
Another problematic area not described by Vonken was the tendency of the
recovered impulse response to "upraise its tail". In other word the convergence
produced nearly always an impulse response whose last and sometime even
the second-to-last elements were clearly incorrectly large. This, however, did not
seem to affect the previous elements. The same was observed in the case of the
new EM version. This may result in erroneously determined CBF if the tail rises
higher than the true maximum of the impulse response. To overcome this diffi-
18
22. culty in CBF estimation the last four elements of the estimated impulse response
were simply put to zero.
The new algorith was found to suffer minor numerical instabilities. The val-
ues of eq. 52 are typicall very small and in case of initial guess that differs greatly
from the measured data the values of eq. 52 become too small for the available ac-
curacy. Therefore a good initial guess is needed. To guarantee equal treatment of
all methods a common initial guess was set to constant function of value .02. The
insufficienf numerical accuracy, however, was in some cases so severe that some-
times (very rarely, in present simulations the occurence frequency was 5 times
out of 13 · 512 = 6656 simulations) the algorithm could not procede and in such
cases was set to produce a NaN (Not a Number) result. The mean and standard
deviations of the estimates were calculated ignoring these NaN values.
5.2 Detailed description and the parameters used
There were two different sets of simulations: one with zero delay (td = 0) and
other with 2.7 seconds delay,i.e. td = 2.7 in eq. 4. Both were carried out in sim-
ilar manner. The CBF was varied between 0.01 and .13 [arbitary units] with .01
intervals. At each flow level 512 different noisy TCCs were generated and each
of them was deconvolved with every method. The average CBF estimate and its
standard deviation was then calculated. The original residue function (see eq. 2
pp. 4) was generated from h(t) of the form
Γ(α + β)
h(t) = (t1 − t0 )1−α−β (t − t0 )α−1 (t1 − t)β−1
Γ(α)Γ(β)
with empirically seems to be reasonable model for h(t) [12]. The numerical values
were set to t0 = 0, t1 = 8, α = 2, 3 and β = 3, 8 corresponding physiologically
typical to M T T ≈ 3s. The AIF was modeled as a gamma-variate function of the
form (t−t0 )
AIF (t) = a(t − t0 )b e− c
where now a = 2, b = 4 and c = 1, 1. All the time T R was kept at 1,5s. For
comparison also SVD solution was calculated.
The simulation were very time consuming. Each of the two set described
above took nearly two days to complete on 2,4GHz AMD platform.
6 Results
The simulation results are depicted in figures 1 and 2 on pages 21 and 22. The first
one depicts a normal case whereas in the latter one TCC is delayed by 2.7 seconds.
19
23. The similar results for the standard SVD deconvolution method are show in fig-
ure 3 on page 23. There are four pictures corresponding to each four different
versions of the EM based deconvolution: first two (upper row) depict the perfor-
mance of the new EM algorithm using both the zeroth order and the first order
approximation for the convolution integral. The lower row respectively depicts
the performance of the Vonken’s EM algorithm in both the zeroth and the first
order cases.
The two most eye-catching features are the enormous standard deviation of
the traditional Vonken’s EM based CBF estimate and the tendency of the Vonken’s
original algorithm to yield dramatically overestimated CBF estimates in low CBF
values. Standard deviations of such magnitude were not reported in Vonken’s
original paper. Neither was the obviously incorrect convergence in the low CBF
values. Since the last elements of the impulse responses recovered here were set
to zero this huge variation in CBF estimates has to originate from the physically
meaningfull part of the impulse responses.
The principal differences between the results obtained by different methods
are as follows. In case where no delay is present the original Vonken’s algorithm
seem to provide equal results as the new zeroth order version developed here.
Despite the major difference in the standard deviation the means of the results
seem equal. The simultaneous appearance of the huge change in the standard
deviation and smaller change in the mean CBF value may indicate the existence
of few major out-liers.
The effect of the first order approximation for the convolution integral results
in more loyal estimate of the CBF. In both cases - traditional and new EM decon-
volution - the estimated CBF seems follow the true value well. The new version,
however, is prone to overestimation. The original version of the EM deconvolu-
tion equipped with the more accurete approximation, however, yields very good
results. The constant overestimation of the new algorithm may be a result of poor
selection of the number of maximum iterations.
The presence of 2,7 seconds delay in general deteriorates the performance of
both methods. The standard deviations are not affected but the CBF estimates are
lower throughout the range than before. The new algorithm with higher order
approximation (upper right corner in figure 2 in pp. 22), however, gives extremely
good results withmodest standard deviation. However, the biased estimation in
no-delay-situation and the behaviour of the original algorithm with the higher
order approximation suggest that here a bias is compensated by another bias.
20
24. new EM−d0 CBF new EM−d1 CBF
0.2 0.2
estimated CBF [arb.units]
estimated CBF [arb.units]
0.1 0.1
0 0
0.02 0.04 0.06 0.08 0.1 0.12 0.02 0.04 0.06 0.08 0.1 0.12
true CBF [arb.units] true CBF [arb.units]
trad. EM−d0 CBF trad. EM−d1 CBF
0.2 0.2
estimated CBF [arb.units]
estimated CBF [arb.units]
0.1 0.1
0 0
0.02 0.04 0.06 0.08 0.1 0.12 0.02 0.04 0.06 0.08 0.1 0.12
true CBF [arb.units] true CBF [arb.units]
Figure 1: Simulation results in case there is no delay (td = 0). Pictures in upper
row correspond to the new version of the EM algorithm whereas the lower row
corresponds to the original Vonken’s version. The left pictures are computed with
the original zeroth order convolution integral approximation but in right ones the
linear approximation is used. The thiker lines give the mean of the deconvolved
CBF estimate vs. the true CBF. The dashed lines are the mean of the CBF ± their
standard deviation. The dotted line corresponds the perfect match.
21
25. new EM−d0 CBF new EM−d1 CBF
0.2 0.2
estimated CBF [arb.units]
estimated CBF [arb.units]
0.1 0.1
0 0
0.02 0.04 0.06 0.08 0.1 0.12 0.02 0.04 0.06 0.08 0.1 0.12
true CBF [arb.units] true CBF [arb.units]
trad. EM−d0 CBF trad. EM−d1 CBF
0.2 0.2
estimated CBF [arb.units]
estimated CBF [arb.units]
0.1 0.1
0 0
0.02 0.04 0.06 0.08 0.1 0.12 0.02 0.04 0.06 0.08 0.1 0.12
true CBF [arb.units] true CBF [arb.units]
Figure 2: Simulation results in case there is 2.7 seconds delay (td = 2.7). Pictures in
upper row correspond to the new version of the EM algorithm whereas the lower
row corresponds to the original Vonken’s version. The left pictures are computed
with the original zeroth order convolution integral approximation but in right
ones the linear approximation is used. The thiker lines give the mean of the de-
convolved CBF estimate vs. the true CBF. The dashed lines are the mean of the
CBF ± their standard deviation. The dotted line corresponds the perfect match.
22
26. SVD−CBF norm. & delayd
0.2
estimated CBF [arb.units]
0.1
0
0.02 0.04 0.06 0.08 0.1 0.12
true CBF [arb.units]
Figure 3: With and without delay SVD. Dashed line is with delay
7 Conclusions
In this work the EM based deconvolution method developed by Vonken et al. [5]
was reviewed. Also some theoretical backgrounds were given and attention was
paid especially to discretization accuracy. Some flaws of Vonken’s article were
pointed out and corrected. This resulted in an entirely new version of the EM
deconvolution algorithm.
The new EM based algorithm was tedious to derive. First major change with
respect to that of Vonken’s was in implementing the more natural and better
grounded normality assumption concerning the distribution of the complete-data
variates. Second, more fundamental change was done amending Vonken’s con-
ditional expectancy of the complete-data log likelihood function. This is likely to
be the source of the different convergence properties of the new algorithm.
After implementing the first order approximation and the new version of the
algorithm there were four different versions of the algorithm to be tested. Sim-
ulations were carried out with and without delay between AIF and TCC. For
comparison, also traditional SVD deconvolution was carried out.
The results were surprising. First of all, the strange behaviour of Vonken’s
original algorithm is in contrast of that reported in his original article. It seems
23
27. to be prone to dramatically overestimate the low CBF values and in addition to
that it suffers from large standard deviation. These are likely to originate from the
wrongly derived equation 40 on page 13.
The new version of the algorithm converges much slowlier and hence requires
more iterations to be used. Neither the optimal number of iterations nor the ini-
tial guess were subjects here. Regardless of that the results were promising. The
standard deviation was of the same magnitude as that of SVD’s. In fact, the ze-
roth order approximation yielded almost identical results as SVD did. The use
of first order approximation resulted in minor overestimation of the CBF but no-
table here is that the magnitude of the bias does not change as the CBF does.
The higher order approximation, however, results in somewhat greater standard
deviation of the estimate. The absence of the improvement due to higher order
approximation in the case of delayed TCC, however, suggest that the excellent
performance of the new algorithm with higher order approximation results from
one bias being compensated by another.
After all, developements described in this report seem promising. They were
able to guarantee nearly certain convergence with modest spread of CBF esti-
mates. A clear improvement with respect to Vonken’s original algorithm was
recorded. The price paid was slower convergence and longer computation time.
Further research still has to be carried out. The reproductibility of the full im-
pulse response is of great importance in some applications. The effect of different
delays and especially different shapes of residual function also remain to be in-
vestigated.
24
28. References
[1] A. Villringer, B. Rosen, J. Belliveau, J. Ackerman, R. Lauffer, R. Buxton,
Y. Chao, V. Wedeen, and T. Brady, “Dynamic imaging with lanthanide
chelates in normal brain: contrast due to magnetic susceptibility effects,”
Magnetic Resonance In Medicine, vol. 6, no. 2, pp. 164–174, 1988.
[2] B. R. Rosen, J. W. Belliveau, and D. Chien, “Perfusion Imaging by Nuclear
Magnetic Resonance,” Magnetic Resonance Quaterly, vol. 5, no. 4, pp. 263–281,
1989.
[3] P. Meier and K. L. Zierler, “On the Theory of the Indicator-Dilution Method
for Measurement of Blood Flow and Volume,” Journal of Applied Physiology,
vol. 6, no. 12, pp. 731–744, 1954.
[4] L. Østergaard, R. M. Weisskoff, D. A. Chesler, C. Gyldensted, and B. R.
Rosen, “High Resolution Measurement of Cerebral Blood Flow using In-
travascular Tracer Bolus Passages. Part 1: Mathematical Approach and Sta-
tistical Analysis,” Magnetic Resonance in Medicine, vol. 36, pp. 715–725, 1996.
[5] E.-J. P. Vonken, F. J. Beekman, C. J. Bakker, and M. A. Viergever, “Maxi-
mum Likelihood Estimation of Cerebral Blood Flow in Dynamic Suscepti-
bility Contrast MRI,” Magnetic Resonance in Medicine, vol. 41, pp. 343–350,
1999.
[6] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from
Incomplete Data via EM Algorithm,” Journal of the Royal Statistical Society.
Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.
[7] L. A. Shepp and Y. Vardi, “Maximum Likelihood Reconstruction for Emis-
sion Tomography,” IEEE Transactions on Medical Imaging, vol. 1, pp. 113–122,
1982.
[8] K. Lange and R. Carson, “EM Reconstruction Algorithms for Emission and
Trasmission Tomography,” Journal of Computer Assisted Tomography, vol. 8,
no. 2, pp. 306–316, 1984.
[9] Y. Vardi, L. A. Shepp, and L. Kaufman, “A Statistical Model for Positron
Emission Tomography,” Journal of the American Statistical Association, vol. 80,
no. 389, pp. 8–20, 1985.
[10] J. A. Jacquez, Compartmental Analysis in Biology and Medicine. The University
of Michigan Press, 2 ed., 1985.
25
29. [11] G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley
Series in Probability and Statistics, Wiley, 1997.
[12] L. Østergaard, D. A. Chesler, R. M. Weisskoff, A. G. Sorensen, and B. R.
Rosen, “Modeling Cerebral Blood Flow and Flow Heterogenity From Mag-
netic Resonance Residue Data,” Journal of Cerebral Blood Flow and Metabolism,
vol. 19, pp. 690–699, 1999.
26