A brief discussion of Multivariate Gaussin, Rayleigh & Rician distributions
Prof. H.Amindavar complementary notes for the first session of "Advanced communications theory" course, Spring 2021
1. Advanced Communication’s Theory
Session 2
H.Amindavar
February 2021
Multivariate Normal Distribution
Definitions
The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly)
correlated real-valued random variables each of which clusters around a mean value. The multivariate normal
distribution of a k−dimensional random vector
X = (X1, . . . , Xk)T
can be written in the following notation:
X ∼ N (µ, Σ),
or to make it explicitly known that X is k−dimensional,
Xk ∼ N (µk, σk),
with k−dimensional mean vector
µ = E[X] = (E[X1], E[X2], . . . , E[Xk])T
,
and k × k covariance matrix
Σi,j := E[(Xi − µi)(Xj − µj)] = Cov[Xi, Xj]
such that 1 ≤ i, j ≤ k.
The inverse of the covariance matrix is called the precision matrix, denoted by Q = Σ−1
.
A real random vector X = (X1, . . . , Xk)T
is called a standard normal random vector if all of its
components Xn are independent and each is a zero-mean unit-variance normally distributed random vari-
able, i.e. if Xn ∼ N (0, 1) for all n.
A real random vector X = (X1, . . . , Xk)T
is called a centered normal random vector if there exists
a deterministic k × ` matrix A such that AZ has the same distribution as X where Z is a standard normal
random vector with ` components.
A real random vector X = (X1, . . . , Xk)T
is called a normal random vector if there exists a random
` -vector Z, which is a standard normal random vector, a k−vector µ, and a k × ` matrix A, such that
X = AZ + µ.
X ∼ N (µ, Σ) ⇐⇒ there exist µ ∈ Rk
, A ∈ Rk×`
such that X = AZ + µ for Zn ∼ N (0, 1), i.i.d.
1
2. Figure 1: joint probability density function of bivariate Gaussian with zero mean and [0.25 0.3; 0.3 1]
covariance matrix
Here the covariance matrix is Σ = AAT
.
In the degenerate case where the covariance matrix is singular, the corresponding distribution has no
density. This case arises frequently in statistics; for example, in the distribution of the vector of residuals in
the ordinary least squares regression. The Xi are in general not independent; they can be seen as the result
of applying the matrix A to a collection of independent Gaussian variables Z.
A random vector X = (X1, . . . , Xk)T
has a multivariate normal distribution if it satisfies one of
the following equivalent conditions.
• Every linear combination Y = a1X1 + · · · + akXk of its components is normally distributed. That
is, for any constant vector a ∈ Rk
, the random variable Y = aT
X has a univariate normal distribution,
where a univariate normal distribution with zero variance is a point mass on its mean.
• There is a k-vector µ and a symmetric, positive semidefinite k × k matrix ΣΣ, such that the charac-
teristic function of X is
ϕX(u) = exp
iuT
µ − 1
2
uT
Σu
.
The spherical normal distribution can be characterised as the unique distribution where components are
independent in any orthogonal coordinate system.
2
3. Density function
Non-degenerate case
The multivariate normal distribution is said to be ”non-degenerate” when the symmetric covariance matrix
Σ is positive definite. In this case the distribution has density:
fX(x1, . . . , xk) =
exp −1
2
(x − µ)T
Σ−1
(x − µ)
p
(2π)k | Σ|
where x is a real k−dimensional column vector and |Σ| ≡ det Σ is the determinant of Σ.
The equation above reduces to that of the univariate normal distribution if Σ is a 1 × 1 matrix (i.e. a single
real number).
the locus of points in k−dimensional space each of which gives the same particular value of the density
— is an ellipse or its higher-dimensional generalization; hence the multivariate normal is a special case of
the elliptical distributions.
The quantity
q
(x − µ)TΣ−1
(x − µ) is known as the Mahalanobis distance, which represents the
distance of the test point x from the mean µ. In the case when k = 1, the distribution reduces to a univariate
normal distribution and the Mahalanobis distance reduces to the absolute value of the standard score.
Bivariate case
In the 2-dimensional nonsingular case k = rank (Σ) = 2, the probability density function of a vector [X Y]’
is:
f(x, y) =
1
2πσXσY
p
1 − ρ2
e
− 1
2(1−ρ2)
h
(
x−µX
σX
)2
−2ρ(
x−µX
σX
)(
y−µY
σY
)+(
y−µY
σY
)2
i
where ρ is the correlation between X and Y and where σX 0 and σY 0. In this case,
µ =
µX
µY
, Σ =
σ2
X ρσXσY
ρσXσY σ2
Y
.
In the bivariate case, the first equivalent condition for multivariate reconstruction of normality can be
made less restrictive as it is sufficient to verify that countably many distinct linear combinations of X and
Y are normal in order to conclude that the vector of [X Y]’ is bivariate normal.
The bivariate iso-density loci plotted in the x, y-plane are ellipses, whose principal axes are defined by
the eigenvectors of the covariance matrix Σ (the major and minor semidiameters of the ellipse equal the
square-root of the ordered eigenvalues).
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
x1
x2
Figure 2: samples of bivariate normal distribution and the contour plot of it’s pdf
3
4. As the absolute value of the correlation parameter ρ increases, these loci are squeezed toward the following
line :
y(x) = sgn(ρ)
σY
σX
(x − µX) + µY .
is the best linear unbiased prediction of Y given a value of X.
The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. linear
transformations of hyperspheres) centered at the mean. Hence the multivariate normal distribution is an
example of the class of elliptical distributions.
The directions of the principal axes of the ellipsoids are given by the eigen-vectors of the covariance matrix
Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues.
If Σ = UΛUT
is an eigendecomposition where the columns of U are unit eigenvectors and Λ is a diagonal
matrix of the eigenvalues, then we have
X ∼ N (µ, Σ) ⇐⇒ X ∼ µ + UΛ1/2
N (0, I) ⇐⇒ X ∼ µ + UN (0, Λ)
Moreover, U can be chosen to be a rotation matrix, as inverting an axis does not have any effect on
N (0, Λ), but inverting a column changes the sign of Us determinant. The distribution N (µ, Σ) is in effect
N (0, I) scaled by Λ, rotated by U and translated by µ.
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
x2
x1
Figure 3: the iso-density loci of pdf and the eigen-vectors of covariance matrix of the bivariate normal
distribution
4
5. Rayleigh Distribution
In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution
for nonnegative-valued random variables. It is essentially a chi distribution with two degrees of freedom.
The probability density function of the Rayleigh distribution is
f(x; σ) =
x
σ2
e−x2
/(2σ2
)
, x ≥ 0,
where σ is the scale parameter of the distribution. The cumulative distribution function is
F (x; σ) = 1 − e−x2
/(2σ2
)
for x ∈ [0, ∞).
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8 10
σ=0.5
σ=0.5
σ=0.5
σ=1
σ=1
σ=1
σ=2
σ=2
σ=2
σ=3
σ=3
σ=3
σ=4
σ=4
σ=4
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8 10
σ=0.5
σ=1
σ=2
σ=3
σ=4
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8 10
σ=0.5
σ=1
σ=2
σ=3
σ=4
Figure 4: probability density function of Rayleigh distribution
Consider the two-dimensional vector Y = (U, V ) which has components that are normally distributed,
centered at zero, and independent. Then U and V have density functions
fU (x; σ) = fV (x; σ) =
e−x2
/(2σ2
)
√
2πσ2
.
Let X be the length of Y That is, X =
p
U2 + V 2 Then X has cumulative distribution function
FX(x; σ) =
ZZ
Dx
fU (u; σ)fV (v; σ) dA
where Dx is the disk
Dx =
n
(u, v) :
p
u2 + v2 x
o
Writing the double integral in polar coordinates, it becomes
FX(x; σ) =
1
2πσ2
Z 2π
0
Z x
0
re−r2
/(2σ2
)
dr dθ =
1
σ2
Z x
0
re−r2
/(2σ2
)
dr.
Finally, the probability density function for X is the derivative of its cumulative distribution function,
which is
fX(x; σ) =
d
dx
FX(x; σ) =
x
σ2
e−x2
/(2σ2
)
5
6. which is the Rayleigh distribution. It is straightforward to generalize to vectors of dimension other than
2. There are also generalizations when the components have unequal variance or correlations, or when the
vector Y follows a bivariate Student t-distribution.
Rice Distribution
In probability theory, the Rice distribution or Rician distribution (or, less commonly, Ricean
distribution) is the probability distribution of the magnitude of a circularly-symmetric bivariate normal
random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice.
f(x | ν, σ) =
x
σ2
exp
−(x2
+ ν2
)
2σ2
I0
xν
σ2
,
Figure 5: probability density function of Rayleigh distribution
where I0(z) is the modified Bessel function of the first kind with order zero,
Iα(x) = i−α
Jα(ix) =
∞
X
m=0
1
m! Γ(m + α + 1)
x
2
2m+α
,
Kα(x) =
π
2
I−α(x) − Iα(x)
sin απ
,
α is not an integer; when α is an integer, then the limit is used. Iα(x) and Kα(x) are the two linearly
independent solutions to the modified Bessel’s equation:
x2 d2
y
dx2
+ x
dy
dx
− x2
+ α2
y = 0.
R ∼ Rice (|ν|, σ) if R =
p
X2 + Y 2 where N ν cos θ, σ2
and N ν sin θ, σ2
are statistically in-
dependent normal random variables and θ is any real number.
6