1. Data analysis and Its applications
on Asteroseismology
Olga Moreira
April 2005
DEA en Sciences
“Astérosismologie”
Lectured by Anne Thoul
2. Outline
Principles of data analysis Introduction to spectral analysis
Introduction Fourier analysis
Fourier transform
Power spectrum estimation
Merit functions an parameters
fitting
Deconvolution analysis
Maximum Likelihood Estimator
CLEAN
All poles
Maximization/Minimization Problem
Ordinary methods Phase dispersion Minimization
Exotic methods Period search
Goodness-of-fit Wavelet analysis
Chi-square test Wavelets transform and Its applications
K-S test
The beauty of synthetic data
Monte-Carlo simulations
Hare-and-Hounds game
12. Analysis Method
A complete analysis should provide:
Parameters;
Error estimates on the parameters;
A statistical measure of the goodness-of-fit
Ignoring the 3rd step will bring
drastical consequences
14. Maximum Likelihood Estimators (MLE)
λ = λ λ λ : Set of parameters
= : Set of random variables
= λ : Probability distribution characterized by λ and
The posteriori probability of a single measurement is given by:
= λ
If are a set of independents and identical distributed (i.i.d) then the joint probability
function becomes:
= ∏ =
λ
Where λ =∏ λ is defined as the Likelihood
=
• The best fit of parameters is the one that maximizes the likelihood.
15. It is common to find defined the as the likelihood., but in fact is
just the logarithm of the likelihood, which more easy to work with.
= = λ or = −
=
Posteriori probability is the probability after the event under no
circuntances should the likelihood be confused with probability
density.
16. Error Estimate
λ
Gaussian shape Non-guassian shape Non-guassian shape
with a single with several local
λ = λ ± ∆λ extreme: extremes:
∆λ ≈ σ
• No problem on the • Problems on the
determination of determination of
maximum, although it maximum
can represent some • Problems on the error
difficulties to the error bars estimative.
bars estimative.
17. Estimator: Desirable properties
Unbiased: Minimum variance :
λ = λ −λ = σ λ →
Information inequality/Cramer-Rao inequality:
+ ′λ
σ λ ≥ λ = ′ =−
λ
′ λ = σ λ ≥
λ
• The larger the information the smaller the variance
18. MLE asymptotically unbiased
′ λ = λ + λ − λ λ +
Neglecting the largers orders and →∞ λ =− λ
− λ −λ
λ =− λ −λ + λ ≈
σ λ
λ
The MLE function has the form of normal distribution with σ λ =
and :
λ
λ = λ ± σ λ
20. If λ λ = ↔ ≠
λ ≈ λ ± σ λ =
If λ λ ≠ ↔ ≠
Tere is an error region not only defined by σ λ but by the complete
covariance matrix . For instance in 2D the error region defines an elipse.
21. Least-square and Chi-square fit
1. Considering one measures with errors that are independently and normal
distributed around the true value
2. The standard deviations σ are the same for all points.
Then joint probability for !"# is given by $
##
∝∏ − −
∆
= σ
Maximazing is the same as minimazing
−
= σ
The least-square fitting is the MLE of the fitted parameters if the measurements
are independent and normally distributed.
3. If the deviations are different σ ! σ then : −
=χ
= σ
22. Limitations:
Real data most of the time violate the i.i.d condition
Sometimes one have a limited sample
In practice depends on the λ behaviour
The MLE grants a unique solution.
α =α λ
∂ ∂ ∂α ∂ ∂
= ⋅ = =
∂λ ∂α ∂λ ∂α ∂λ
But the uncertity of an estimate depends on the specifique choice of λ
λ α α
∂λ
λ λ α α λ
∂α
λ <λ<λ = λ
+∞
= α
+∞
≠ α
+∞
∂λ
λ λ α α λ
−∞ −∞
∂α −∞
25. Going “Downhill” Methods
Finding a global extreme is general very
Press et al.(1992)
difficult.
For one dimensional minimization
Usually there are two types of methods:
• Methods that bracket the minimum: Golden
section search, and parabolic interpolation
(Brent’s Method)
• Methods that use the first derivative
Press et al.(1992)
information.
Multidimensional there are three kind of
methods:
• Direction-set methods. Powell’s method
is the prototype.
• Downhill Simplex method.
• Methods that use the gradient
information.
Adapted
from Press
et al.(1992)
26. Falling in the wrong valley
The downhill methods a lack on
efficiency/robustness. For instance the simplex
method can very fast for some functions and very
slow for others.
They depend on priori knowledge of the overall
structure of vector space, and require repeated
manual intervention.
If the function to minimize is not well-known,
sometimes, numerically speaking, a smooth hill
can become an headache.
They also don’t solve the famous combinatorial
analysis problem :
The traveling salesman problem
27. Exotic Methods
Solving “The traveling salesman problem”:
A salesman has to visit each city on a given list, knowing the
distance between all cities will try to minimize the length of his tour.
Methods available:
Simulated Annealing: based on an analogy
with thermodynamics.
Genetic algorithms: based on an analogy
to evolutionary selection rules.
Nearest Neighbor
Neural networks :based on the observation
of biological neural network (brains).
Knowledge-based systems, etc … Adapted from Charbonneau (1995)
29. Chi-square test
− : is the number of events observed in
χ = the ith bin
: is the number expected according to
= some known distribution
+ + - +# , ,# + ! , #$%
,
) ( " ( !" #$% &
" $
' " !" #$% & "
$
' * * "
H0: The data follow a specified distribution
Significance level is determined by - χ &
& is the degree of freedom: & ! ' ( + +, ) +
)* )+ + ,
Normally acceptable models have - . /# , but day-in and day-out we find
//"
accepted models with -!" / '0
30. Kolmogorov-Smirnov (K-S) test
Press et al.(1992)
% ( ) : Cumulative distribution
( ) : Known cumulative distribution
1 : Maximum absolute difference
between the two cumulative functions
The significance of an observed value
of D is given approximately by:
1 = % −
℘ 1> =- , + + + 1 −∞ > > +∞
+
∞
−
- = − − =
+
, +
=
- , is a monotonic function with limits values:
- , = : Largest agreement
- , ∞ = : Smallest agreement
32. Monte-Carlo simulations
If one know something about the
process that generated our data , given
an assumed set of parameters l then
one can figure out how to simulate our
own sets of “synthetic” realizations of
these parameters. The procedure is to
draw random numbers from
appropriate distribution so as to mimic
our best understanding of the
underlying processes and
measurement errors.
Stello et al. (2004) xi-hya
33. Hare-and-Hounds game
Team A: generates theoretical mode frequencies and synthetic
time series.
Team B: analyses the time series, performs the mode identification
and fitting, does the structure inversion
Rules: The teams only have access to time series. Nothing else is allowed.
34. End of Part I
Options available :
• Questions
• Coffee break
• “Get on with it !!!”
36. Fourier transform
Properties:
+∞
2 = 2( & ) = π&
−∞
2 ( ( )+ 3 ) = 2 & +4 &
&
2 =2& 2 ( )= 2
+∞
( )= 3 +( ⇔ 2 ( )=2& ⋅4 &
−∞
2 ( ( )⊗ 3 ) = 2 & ⋅4 &
Parseval’s Theorem:
The power of a signal represented by f(t) is the same whether computed in time space
or in frequency space:
+∞ +∞
= 2 & &
−∞ −∞
37. Sampling theorem
γ ⋅γ
2& ϒ& 2 & ⊗ϒ &
Adapted from Bracewell (1986)
For a bandlimited signal, which has no components the frequency & , the
sampling theorem states that a real signal can be reconstructed without error
from samples taken uniformly at &.5& . The minimum sampling frequency,
2 & !5& is called the Nyquist frequency, corresponding to the sampling interval
!"5& ( where ! ).
6
38. Undersampling
The sampling theorem assumes that a
signal is limited in frequency but in
practice the signal is time limited. For
1 alias spectrum
."5& then signal the signal is
6
Spectrum
undersampled. Overlying tails appear
in spectrum, spectrum alias.
Adapted from Bracewell (1986)
Aliasing :
Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)):
The undersampled FT is evener than the complete FT as consequence the
sampling procedure discriminates the zero components at &!&
There is a leakage of the high frequencies (aliasing)
39. Discrete Fourier transform
Discrete Fourier transform
2 & = π& = δ =
=
Discrete form of Parseval’s theorem:
= 2 &
= =
Fast Fourier Transform (FFT):
The FFT is a discrete Fourier transform algorithm which reduces the number of
computation of N pints from 5 5 to 5 3 . This is done by means of Danielson-
Lanczos lemma, which basic idea id to break a transform of length to 5
transforms of length 6 5.
40. Power spectrum estimation
Periodogram:
& = 2 & = π&
=
& = π& + π&
= =
If contains periodic signal i.e.:
=3 +7 Random noise
3 = π& +ϕ
Then at & =&/ there is a large contribution in the sum , for other values the terms in
the sum will be randomly negative and positive, yielding to small contribution. Thus
a peak in the periodogram reveals the existence of a periodic embedded signal.
41. Frequencies leakage:
Leakage from nearby frequencies, which is described usually as a spectral
window and is a primarily product of the finite length of data.
Leakage from high frequencies, due to data sampling, the aforementioned
aliasing.
Tapering functions: Sometimes also called as data windowing. These functions
try to smooth the leakage between frequencies bringing the interference slowly
back to zero. The main goal is to narrow the peak and vanish the side lobes.
Smoothing can represents in certain cases loss of information.
Press et al.(1992) Press et al.(1992)
42. Futher complications
Closely spaced frequencies:
Direct contribution for the first
aforementioned leakage.
2( + ) = 2 & +2 &
= 2 & + 2 & + 2 & 2 &
Damping:
=( π& −ϕ ) −η
The peak in power spectrum will
have a Lorentzian profile
43. Power spectrum of random noise
=3 +7
7 →, , , +,,
3 → + ,3
The estimation of spectral density:
ρ & = γ 7 π&
=
γ 7 → ( + 7
Thus :
No matter how much one increase the number of
7 & =σ 7 points, N, the signal-to-noise will tend to be
constant.
For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed
it’s only valid for homogeneous white noise (independent and identically distributed
normal random variables)
44. Filling gaps
The unevenly spaced data problem can be solve by (few suggestions):
Finding a way to reduce the unevenly spaced sample into a evenly spaced.
Basic idea: Interpolation of the missing points (problem: Doesn’t work for
long gaps)
Using the Lomb-Scargle periodogram
Doing a deconvolution analysis (Filters)
45. Lomb-Scargle peridogram
& − τ & − τ
= =
& = +
& − τ & − τ
= =
&
τ = − =
&
&
=
It’s like weighting the data on a “per point” basis instead on a “per time
interval” basis, which make independent on sampling irregularity.
It has an exponential probability distribution with unit mean, which means
one can establish a false-alarm probability of the null hypothesis (significance
level).
9> 9 = − − −9 8
≈ 8 −9
47. Deconvolution
2 & ⊗ % & = 3 & + ε &
signal noise
Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to
incomplete sampling (irregular sampling) of spatial frequency.
Non-linear algorithm: CLEAN, All poles.
Problem : The deconvolution usually does not a unique solutions.
48. Hogbom CLEAN algorithm
The first CLEAN method was developed by Hogbom (1974). It constructs
discrete approximations of the clean map from the convolution equation:
⊗ =
Starting with /=/ , it searches for the largest value in the residual map:
= − ⊗ −
After locating the largest residual of given amplitude, it subtracts it from to to
yield to . The iteration continues until root-mean-square (RMS) decreases to
some level. Each subtracted location is saved in so-called CLEAN map. The
resulting final map denoted by it is assumed that is mainly in noise.
49. CLEAN algorithm
The basic steps of the CLEAN algorithm used in asteroseismology are:
1. Compute the power spectra of the signal and identify the dominant period
2. Perform a least-square fit to the data to obtain the amplitude and phase of
the identified mode.
3. Constructs the time series corresponding to that single mode and subtracts
from the original signal to obtain a new signal
4. Repeats all steps until all its left is noise.
Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting
the frequency it recalculates the amplitude, phase and frequencies of the
previous subtracted peaks while fixing the frequency of the latest extracted
peak.
50. All poles
: = π &δ & = 2 & = :
=
The discrete FT is a particular case of the Z-transform (unilateral):
+∞
: = :
=
It turns up that one can have some advantages by doing the following approximation:
& ≈
8
Press et al.(1992)
+ :
=
The notable fact is that the equation allows to have
poles, corresponding to infinite spectral power density,
on the unit z-circle (at the real frequencies of the Nyquist
interval), and such poles can provide an accurate
representation for underlying power spectra that have
sharp discrete “lines” or delta-functions. M is called the
number of poles. This approximation does under several
names all-poles model, Maximum Entropy method
(MEM), auto regressive model (AR).
52. Definitions
A discrete set of observations can be represented by to vectors, the magnitudes
and the observation times ( with !"; ). Thus the variance of is given:
−
σ = =
=
− =
Suppose that one divides the initial set into several subsets/samples. If M are the
number samples, having , variances, and containing data points then the over all
variance for all the samples is given by:
− ,
=
% =
− 8
=
53. PDM as period search method
Suppose that one want to minimize the
variance of a data set with respect to the
mean light curve.
The phase vector is given:
−
φ =
Considering as a function of the
phase, the variance of these
samples gives a scatter around the
mean light curve.
Defining :
%
Θ =
σ
If P is not the true period % ≈ σ Θ ≈
If P is true value then Θ will reach a local minimum.
Mathematically, the PDM is a least-square fitting, but rather than fitting a given
curve, is a fitting relatively to mean curve as defined by means of each bin,
simultaneously one obtain the best period.
55. Wavelets transform
Wavelets are a class of functions used to localize a given function in both space and
scaling. A family of wavelets can be constructed from a function Ψ
*
sometimes
known as the “mother wavelet” which is confined in a finite interval. The “daughter
wavelets” Ψ are then formed by translation of (b) and contraction of (a).
An individual wavelet can be written as:
− *
Ψ *
= Ψ
Then the wavelet transform is given by:
+∞
− *
< * = Ψ
−∞
+∞ +∞
−
= Ψ Ψ *
Ψ *
*
−∞ −∞
58. Short overview
Data analysis results must never be subjective, it should return the best fitting
parameters, the underlying errors, accuracy of the fitted model. All the provided
statistical information must be clear.
Because data is necessary in all scientific fields there a bunch methods for
optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to
decided which method is the ideal method. Most of the time it the decision
dependents on the data to be analyzed.
All that has been considering here, was the case of a deterministic signal (a fixed
amplitude) add to random noise. Sometimes the signal itself is probabilistic
59. Data analysis and Its applications
on Asteroseismology
Olga Moreira
April 2005
DEA en Sciences
“Astérosismologie”
Lectured by Anne Thoul
60. Outline
Principles of data analysis Introduction to spectral analysis
Introduction Fourier analysis
Fourier transform
Power spectrum estimation
Merit functions an parameters
fitting
Deconvolution analysis
Maximum Likelihood Estimator
CLEAN
All poles
Maximization/Minimization Problem
Ordinary methods Phase dispersion Minimization
Exotic methods Period search
Goodness-of-fit Wavelet analysis
Chi-square test Wavelets transform and Its applications
K-S test
The beauty of synthetic data
Monte-Carlo simulations
Hare-and-Hounds game
70. Analysis Method
A complete analysis should provide:
Parameters;
Error estimates on the parameters;
A statistical measure of the goodness-of-fit
Ignoring the 3rd step will bring
drastical consequences
72. Maximum Likelihood Estimators (MLE)
λ = λ λ λ : Set of parameters
= : Set of random variables
= λ : Probability distribution characterized by λ and
The posteriori probability of a single measurement is given by:
= λ
If are a set of independents and identical distributed (i.i.d) then the joint probability
function becomes:
= ∏ =
λ
Where λ =∏ λ is defined as the Likelihood
=
• The best fit of parameters is the one that maximizes the likelihood.
73. It is common to find defined the as the likelihood., but in fact is
just the logarithm of the likelihood, which more easy to work with.
= = λ or = −
=
Posteriori probability is the probability after the event under no
circuntances should the likelihood be confused with probability
density.
74. Error Estimate
λ
Gaussian shape Non-guassian shape Non-guassian shape
with a single with several local
λ = λ ± ∆λ extreme: extremes:
∆λ ≈ σ
• No problem on the • Problems on the
determination of determination of
maximum, although it maximum
can represent some • Problems on the error
difficulties to the error bars estimative.
bars estimative.
75. Estimator: Desirable properties
Unbiased: Minimum variance :
λ = λ −λ = σ λ →
Information inequality/Cramer-Rao inequality:
+ ′λ
σ λ ≥ λ = ′ =−
λ
′ λ = σ λ ≥
λ
• The larger the information the smaller the variance
76. MLE asymptotically unbiased
′ λ = λ + λ − λ λ +
Neglecting the largers orders and →∞ λ =− λ
− λ −λ
λ =− λ −λ + λ ≈
σ λ
λ
The MLE function has the form of normal distribution with σ λ =
and :
λ
λ = λ ± σ λ
78. If λ λ = ↔ ≠
λ ≈ λ ± σ λ =
If λ λ ≠ ↔ ≠
Tere is an error region not only defined by σ λ but by the complete
covariance matrix . For instance in 2D the error region defines an elipse.
79. Least-square and Chi-square fit
1. Considering one measures with errors that are independently and normal
distributed around the true value
2. The standard deviations σ are the same for all points.
Then joint probability for !"# is given by $
##
∝∏ − −
∆
= σ
Maximazing is the same as minimazing
−
= σ
The least-square fitting is the MLE of the fitted parameters if the measurements
are independent and normally distributed.
3. If the deviations are different σ ! σ then : −
=χ
= σ
80. Limitations:
Real data most of the time violate the i.i.d condition
Sometimes one have a limited sample
In practice depends on the λ behaviour
The MLE grants a unique solution.
α =α λ
∂ ∂ ∂α ∂ ∂
= ⋅ = =
∂λ ∂α ∂λ ∂α ∂λ
But the uncertity of an estimate depends on the specifique choice of λ
λ α α
∂λ
λ λ α α λ
∂α
λ <λ<λ = λ
+∞
= α
+∞
≠ α
+∞
∂λ
λ λ α α λ
−∞ −∞
∂α −∞
83. Going “Downhill” Methods
Finding a global extreme is general very
Press et al.(1992)
difficult.
For one dimensional minimization
Usually there are two types of methods:
• Methods that bracket the minimum: Golden
section search, and parabolic interpolation
(Brent’s Method)
• Methods that use the first derivative
Press et al.(1992)
information.
Multidimensional there are three kind of
methods:
• Direction-set methods. Powell’s method
is the prototype.
• Downhill Simplex method.
• Methods that use the gradient
information.
Adapted
from Press
et al.(1992)
84. Falling in the wrong valley
The downhill methods a lack on
efficiency/robustness. For instance the simplex
method can very fast for some functions and very
slow for others.
They depend on priori knowledge of the overall
structure of vector space, and require repeated
manual intervention.
If the function to minimize is not well-known,
sometimes, numerically speaking, a smooth hill
can become an headache.
They also don’t solve the famous combinatorial
analysis problem :
The traveling salesman problem
85. Exotic Methods
Solving “The traveling salesman problem”:
A salesman has to visit each city on a given list, knowing the
distance between all cities will try to minimize the length of his tour.
Methods available:
Simulated Annealing: based on an analogy
with thermodynamics.
Genetic algorithms: based on an analogy
to evolutionary selection rules.
Nearest Neighbor
Neural networks :based on the observation
of biological neural network (brains).
Knowledge-based systems, etc … Adapted from Charbonneau (1995)
87. Chi-square test
− : is the number of events observed in
χ = the ith bin
: is the number expected according to
= some known distribution
+ + - +# , ,# + ! , #$%
,
) ( " ( !" #$% &
" $
' " !" #$% & "
$
' * * "
H0: The data follow a specified distribution
Significance level is determined by - χ &
& is the degree of freedom: & ! ' ( + +, ) +
)* )+ + ,
Normally acceptable models have - . /# , but day-in and day-out we find
//"
accepted models with -!" / '0
88. Kolmogorov-Smirnov (K-S) test
Press et al.(1992)
% ( ) : Cumulative distribution
( ) : Known cumulative distribution
1 : Maximum absolute difference
between the two cumulative functions
The significance of an observed value
of D is given approximately by:
1 = % −
℘ 1> =- , + + + 1 −∞ > > +∞
+
∞
−
- = − − =
+
, +
=
- , is a monotonic function with limits values:
- , = : Largest agreement
- , ∞ = : Smallest agreement
90. Monte-Carlo simulations
If one know something about the
process that generated our data , given
an assumed set of parameters l then
one can figure out how to simulate our
own sets of “synthetic” realizations of
these parameters. The procedure is to
draw random numbers from
appropriate distribution so as to mimic
our best understanding of the
underlying processes and
measurement errors.
Stello et al. (2004) xi-hya
91. Hare-and-Hounds game
Team A: generates theoretical mode frequencies and synthetic
time series.
Team B: analyses the time series, performs the mode identification
and fitting, does the structure inversion
Rules: The teams only have access to time series. Nothing else is allowed.
92. End of Part I
Options available :
• Questions
• Coffee break
• “Get on with it !!!”
94. Fourier transform
Properties:
+∞
2 = 2( & ) = π&
−∞
2 ( ( )+ 3 ) = 2 & +4 &
&
2 =2& 2 ( )= 2
+∞
( )= 3 +( ⇔ 2 ( )=2& ⋅4 &
−∞
2 ( ( )⊗ 3 ) = 2 & ⋅4 &
Parseval’s Theorem:
The power of a signal represented by f(t) is the same whether computed in time space
or in frequency space:
+∞ +∞
= 2 & &
−∞ −∞
95. Sampling theorem
γ ⋅γ
2& ϒ& 2 & ⊗ϒ &
Adapted from Bracewell (1986)
For a bandlimited signal, which has no components the frequency & , the
sampling theorem states that a real signal can be reconstructed without error
from samples taken uniformly at &.5& . The minimum sampling frequency,
2 & !5& is called the Nyquist frequency, corresponding to the sampling interval
!"5& ( where ! ).
6
96. Undersampling
The sampling theorem assumes that a
signal is limited in frequency but in
practice the signal is time limited. For
1 alias spectrum
."5& then signal the signal is
6
Spectrum
undersampled. Overlying tails appear
in spectrum, spectrum alias.
Adapted from Bracewell (1986)
Aliasing :
Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)):
The undersampled FT is evener than the complete FT as consequence the
sampling procedure discriminates the zero components at &!&
There is a leakage of the high frequencies (aliasing)
97. Discrete Fourier transform
Discrete Fourier transform
2 & = π& = δ =
=
Discrete form of Parseval’s theorem:
= 2 &
= =
Fast Fourier Transform (FFT):
The FFT is a discrete Fourier transform algorithm which reduces the number of
computation of N pints from 5 5 to 5 3 . This is done by means of Danielson-
Lanczos lemma, which basic idea id to break a transform of length to 5
transforms of length 6 5.
98. Power spectrum estimation
Periodogram:
& = 2 & = π&
=
& = π& + π&
= =
If contains periodic signal i.e.:
=3 +7 Random noise
3 = π& +ϕ
Then at & =&/ there is a large contribution in the sum , for other values the terms in
the sum will be randomly negative and positive, yielding to small contribution. Thus
a peak in the periodogram reveals the existence of a periodic embedded signal.
99. Frequencies leakage:
Leakage from nearby frequencies, which is described usually as a spectral
window and is a primarily product of the finite length of data.
Leakage from high frequencies, due to data sampling, the aforementioned
aliasing.
Tapering functions: Sometimes also called as data windowing. These functions
try to smooth the leakage between frequencies bringing the interference slowly
back to zero. The main goal is to narrow the peak and vanish the side lobes.
Smoothing can represents in certain cases loss of information.
Press et al.(1992) Press et al.(1992)
100. Futher complications
Closely spaced frequencies:
Direct contribution for the first
aforementioned leakage.
2( + ) = 2 & +2 &
= 2 & + 2 & + 2 & 2 &
Damping:
=( π& −ϕ ) −η
The peak in power spectrum will
have a Lorentzian profile
101. Power spectrum of random noise
=3 +7
7 →, , , +,,
3 → + ,3
The estimation of spectral density:
ρ & = γ 7 π&
=
γ 7 → ( + 7
Thus :
No matter how much one increase the number of
7 & =σ 7 points, N, the signal-to-noise will tend to be
constant.
For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed
it’s only valid for homogeneous white noise (independent and identically distributed
normal random variables)
102. Filling gaps
The unevenly spaced data problem can be solve by (few suggestions):
Finding a way to reduce the unevenly spaced sample into a evenly spaced.
Basic idea: Interpolation of the missing points (problem: Doesn’t work for
long gaps)
Using the Lomb-Scargle periodogram
Doing a deconvolution analysis (Filters)
103. Lomb-Scargle peridogram
& − τ & − τ
= =
& = +
& − τ & − τ
= =
&
τ = − =
&
&
=
It’s like weighting the data on a “per point” basis instead on a “per time
interval” basis, which make independent on sampling irregularity.
It has an exponential probability distribution with unit mean, which means
one can establish a false-alarm probability of the null hypothesis (significance
level).
9> 9 = − − −9 8
≈ 8 −9
105. Deconvolution
2 & ⊗ % & = 3 & + ε &
signal noise
Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to
incomplete sampling (irregular sampling) of spatial frequency.
Non-linear algorithm: CLEAN, All poles.
Problem : The deconvolution usually does not a unique solutions.
106. Hogbom CLEAN algorithm
The first CLEAN method was developed by Hogbom (1974). It constructs
discrete approximations of the clean map from the convolution equation:
⊗ =
Starting with /=/ , it searches for the largest value in the residual map:
= − ⊗ −
After locating the largest residual of given amplitude, it subtracts it from to to
yield to . The iteration continues until root-mean-square (RMS) decreases to
some level. Each subtracted location is saved in so-called CLEAN map. The
resulting final map denoted by it is assumed that is mainly in noise.
107. CLEAN algorithm
The basic steps of the CLEAN algorithm used in asteroseismology are:
1. Compute the power spectra of the signal and identify the dominant period
2. Perform a least-square fit to the data to obtain the amplitude and phase of
the identified mode.
3. Constructs the time series corresponding to that single mode and subtracts
from the original signal to obtain a new signal
4. Repeats all steps until all its left is noise.
Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting
the frequency it recalculates the amplitude, phase and frequencies of the
previous subtracted peaks while fixing the frequency of the latest extracted
peak.
108. All poles
: = π &δ & = 2 & = :
=
The discrete FT is a particular case of the Z-transform (unilateral):
+∞
: = :
=
It turns up that one can have some advantages by doing the following approximation:
& ≈
8
Press et al.(1992)
+ :
=
The notable fact is that the equation allows to have
poles, corresponding to infinite spectral power density,
on the unit z-circle (at the real frequencies of the Nyquist
interval), and such poles can provide an accurate
representation for underlying power spectra that have
sharp discrete “lines” or delta-functions. M is called the
number of poles. This approximation does under several
names all-poles model, Maximum Entropy method
(MEM), auto regressive model (AR).
110. Definitions
A discrete set of observations can be represented by to vectors, the magnitudes
and the observation times ( with !"; ). Thus the variance of is given:
−
σ = =
=
− =
Suppose that one divides the initial set into several subsets/samples. If M are the
number samples, having , variances, and containing data points then the over all
variance for all the samples is given by:
− ,
=
% =
− 8
=
111. PDM as period search method
Suppose that one want to minimize the
variance of a data set with respect to the
mean light curve.
The phase vector is given:
−
φ =
Considering as a function of the
phase, the variance of these
samples gives a scatter around the
mean light curve.
Defining :
%
Θ =
σ
If P is not the true period % ≈ σ Θ ≈
If P is true value then Θ will reach a local minimum.
Mathematically, the PDM is a least-square fitting, but rather than fitting a given
curve, is a fitting relatively to mean curve as defined by means of each bin,
simultaneously one obtain the best period.
113. Wavelets transform
Wavelets are a class of functions used to localize a given function in both space and
scaling. A family of wavelets can be constructed from a function Ψ
*
sometimes
known as the “mother wavelet” which is confined in a finite interval. The “daughter
wavelets” Ψ are then formed by translation of (b) and contraction of (a).
An individual wavelet can be written as:
− *
Ψ *
= Ψ
Then the wavelet transform is given by:
+∞
− *
< * = Ψ
−∞
+∞ +∞
−
= Ψ Ψ *
Ψ *
*
−∞ −∞
116. Short overview
Data analysis results must never be subjective, it should return the best fitting
parameters, the underlying errors, accuracy of the fitted model. All the provided
statistical information must be clear.
Because data is necessary in all scientific fields there a bunch methods for
optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to
decided which method is the ideal method. Most of the time it the decision
dependents on the data to be analyzed.
All that has been considering here, was the case of a deterministic signal (a fixed
amplitude) add to random noise. Sometimes the signal itself is probabilistic