Dataanalysis2

1. Data analysis and Its applications on Asteroseismology Olga Moreira April 2005 DEA en Sciences “Astérosismologie” Lectured by Anne Thoul

2. Outline Principles of data analysis Introduction to spectral analysis Introduction Fourier analysis Fourier transform Power spectrum estimation Merit functions an parameters fitting Deconvolution analysis Maximum Likelihood Estimator CLEAN All poles Maximization/Minimization Problem Ordinary methods Phase dispersion Minimization Exotic methods Period search Goodness-of-fit Wavelet analysis Chi-square test Wavelets transform and Its applications K-S test The beauty of synthetic data Monte-Carlo simulations Hare-and-Hounds game

3. Part I Principles of data analysis

4. Introduction

5. What do you think of when someone say “data”? Roxbourg & Paternó - Eddington Workshop (Italy)

6. What do you think of when someone say “data”?

10. What all those definitions of data have in common? Incomplete Probability Inferences information theory Data Tools Analysis

11. Analysis Method Merit function Best fit Goodness-of-fit

12. Analysis Method A complete analysis should provide: Parameters; Error estimates on the parameters; A statistical measure of the goodness-of-fit Ignoring the 3rd step will bring drastical consequences

13. Merit functions and parameters fitting

14. Maximum Likelihood Estimators (MLE) λ = λ λ λ : Set of parameters = : Set of random variables = λ : Probability distribution characterized by λ and The posteriori probability of a single measurement is given by: = λ If are a set of independents and identical distributed (i.i.d) then the joint probability function becomes: = ∏ = λ Where λ =∏ λ is defined as the Likelihood = • The best fit of parameters is the one that maximizes the likelihood.

15. It is common to find defined the as the likelihood., but in fact is just the logarithm of the likelihood, which more easy to work with. = = λ or = − = Posteriori probability is the probability after the event under no circuntances should the likelihood be confused with probability density.

16. Error Estimate λ Gaussian shape Non-guassian shape Non-guassian shape with a single with several local λ = λ ± ∆λ extreme: extremes: ∆λ ≈ σ • No problem on the • Problems on the determination of determination of maximum, although it maximum can represent some • Problems on the error difficulties to the error bars estimative. bars estimative.

17. Estimator: Desirable properties Unbiased: Minimum variance : λ = λ −λ = σ λ → Information inequality/Cramer-Rao inequality: + ′λ σ λ ≥ λ = ′ =− λ ′ λ = σ λ ≥ λ • The larger the information the smaller the variance

18. MLE asymptotically unbiased ′ λ = λ + λ − λ λ + Neglecting the largers orders and →∞ λ =− λ − λ −λ λ =− λ −λ + λ ≈ σ λ λ The MLE function has the form of normal distribution with σ λ = and : λ λ = λ ± σ λ

19. In multi-dimensions: = λ= λ λ λ λ = λ = λ = ∂ λ ∂ λ λ = λ + λ −λ − λ −λ λ −λ + = ∂λ = = ∂λ ∂λ λ = λ + λ −λ λ −λ + ∂ λ →∞ = = Hessian matrix ∂λ ∂λ = λ −λ λ −λ Multivariate gaussian distribution λ λ λ λ = − ρ λ λ = σ λ = − σ λ σ λ

20. If λ λ = ↔ ≠ λ ≈ λ ± σ λ = If λ λ ≠ ↔ ≠ Tere is an error region not only defined by σ λ but by the complete covariance matrix . For instance in 2D the error region defines an elipse.

21. Least-square and Chi-square fit 1. Considering one measures with errors that are independently and normal distributed around the true value 2. The standard deviations σ are the same for all points. Then joint probability for !"# is given by $ ## ∝∏ − − ∆ = σ Maximazing is the same as minimazing − = σ The least-square fitting is the MLE of the fitted parameters if the measurements are independent and normally distributed. 3. If the deviations are different σ ! σ then : − =χ = σ

22. Limitations: Real data most of the time violate the i.i.d condition Sometimes one have a limited sample In practice depends on the λ behaviour The MLE grants a unique solution. α =α λ ∂ ∂ ∂α ∂ ∂ = ⋅ = = ∂λ ∂α ∂λ ∂α ∂λ But the uncertity of an estimate depends on the specifique choice of λ λ α α ∂λ λ λ α α λ ∂α λ <λ<λ = λ +∞ = α +∞ ≠ α +∞ ∂λ λ λ α α λ −∞ −∞ ∂α −∞

23. Example: modes stochastically excited ν =%ν χ ν For a single mode: Γ % ν = + Γ ν −ν + ν ν λ = − % ν λ % ν λ ν =− = %ν λ + = %ν Minimization λ= Γν →λ = Γν

24. Maximization/Minimization Problem

25. Going “Downhill” Methods Finding a global extreme is general very Press et al.(1992) difficult. For one dimensional minimization Usually there are two types of methods: • Methods that bracket the minimum: Golden section search, and parabolic interpolation (Brent’s Method) • Methods that use the first derivative Press et al.(1992) information. Multidimensional there are three kind of methods: • Direction-set methods. Powell’s method is the prototype. • Downhill Simplex method. • Methods that use the gradient information. Adapted from Press et al.(1992)

26. Falling in the wrong valley The downhill methods a lack on efficiency/robustness. For instance the simplex method can very fast for some functions and very slow for others. They depend on priori knowledge of the overall structure of vector space, and require repeated manual intervention. If the function to minimize is not well-known, sometimes, numerically speaking, a smooth hill can become an headache. They also don’t solve the famous combinatorial analysis problem : The traveling salesman problem

27. Exotic Methods Solving “The traveling salesman problem”: A salesman has to visit each city on a given list, knowing the distance between all cities will try to minimize the length of his tour. Methods available: Simulated Annealing: based on an analogy with thermodynamics. Genetic algorithms: based on an analogy to evolutionary selection rules. Nearest Neighbor Neural networks :based on the observation of biological neural network (brains). Knowledge-based systems, etc … Adapted from Charbonneau (1995)

28. Goodness-of-fit

29. Chi-square test − : is the number of events observed in χ = the ith bin : is the number expected according to = some known distribution + + - +# , ,# + ! , #$% , ) ( " ( !" #$% & " $ ' " !" #$% & " $ ' * * " H0: The data follow a specified distribution Significance level is determined by - χ & & is the degree of freedom: & ! ' ( + +, ) + )* )+ + , Normally acceptable models have - . /# , but day-in and day-out we find //" accepted models with -!" / '0

30. Kolmogorov-Smirnov (K-S) test Press et al.(1992) % ( ) : Cumulative distribution ( ) : Known cumulative distribution 1 : Maximum absolute difference between the two cumulative functions The significance of an observed value of D is given approximately by: 1 = % − ℘ 1> =- , + + + 1 −∞ > > +∞ + ∞ − - = − − = + , + = - , is a monotonic function with limits values: - , = : Largest agreement - , ∞ = : Smallest agreement

31. Synthetic data

32. Monte-Carlo simulations If one know something about the process that generated our data , given an assumed set of parameters l then one can figure out how to simulate our own sets of “synthetic” realizations of these parameters. The procedure is to draw random numbers from appropriate distribution so as to mimic our best understanding of the underlying processes and measurement errors. Stello et al. (2004) xi-hya

33. Hare-and-Hounds game Team A: generates theoretical mode frequencies and synthetic time series. Team B: analyses the time series, performs the mode identification and fitting, does the structure inversion Rules: The teams only have access to time series. Nothing else is allowed.

34. End of Part I Options available : • Questions • Coffee break • “Get on with it !!!”

35. Part II Introduction to spectral analysis

36. Fourier transform Properties: +∞ 2 = 2( & ) = π& −∞ 2 ( ( )+ 3 ) = 2 & +4 & & 2 =2& 2 ( )= 2 +∞ ( )= 3 +( ⇔ 2 ( )=2& ⋅4 & −∞ 2 ( ( )⊗ 3 ) = 2 & ⋅4 & Parseval’s Theorem: The power of a signal represented by f(t) is the same whether computed in time space or in frequency space: +∞ +∞ = 2 & & −∞ −∞

37. Sampling theorem γ ⋅γ 2& ϒ& 2 & ⊗ϒ & Adapted from Bracewell (1986) For a bandlimited signal, which has no components the frequency & , the sampling theorem states that a real signal can be reconstructed without error from samples taken uniformly at &.5& . The minimum sampling frequency, 2 & !5& is called the Nyquist frequency, corresponding to the sampling interval !"5& ( where ! ). 6

38. Undersampling The sampling theorem assumes that a signal is limited in frequency but in practice the signal is time limited. For 1 alias spectrum ."5& then signal the signal is 6 Spectrum undersampled. Overlying tails appear in spectrum, spectrum alias. Adapted from Bracewell (1986) Aliasing : Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)): The undersampled FT is evener than the complete FT as consequence the sampling procedure discriminates the zero components at &!& There is a leakage of the high frequencies (aliasing)

39. Discrete Fourier transform Discrete Fourier transform 2 & = π& = δ = = Discrete form of Parseval’s theorem: = 2 & = = Fast Fourier Transform (FFT): The FFT is a discrete Fourier transform algorithm which reduces the number of computation of N pints from 5 5 to 5 3 . This is done by means of Danielson- Lanczos lemma, which basic idea id to break a transform of length to 5 transforms of length 6 5.

40. Power spectrum estimation Periodogram: & = 2 & = π& = & = π& + π& = = If contains periodic signal i.e.: =3 +7 Random noise 3 = π& +ϕ Then at & =&/ there is a large contribution in the sum , for other values the terms in the sum will be randomly negative and positive, yielding to small contribution. Thus a peak in the periodogram reveals the existence of a periodic embedded signal.

41. Frequencies leakage: Leakage from nearby frequencies, which is described usually as a spectral window and is a primarily product of the finite length of data. Leakage from high frequencies, due to data sampling, the aforementioned aliasing. Tapering functions: Sometimes also called as data windowing. These functions try to smooth the leakage between frequencies bringing the interference slowly back to zero. The main goal is to narrow the peak and vanish the side lobes. Smoothing can represents in certain cases loss of information. Press et al.(1992) Press et al.(1992)

42. Futher complications Closely spaced frequencies: Direct contribution for the first aforementioned leakage. 2( + ) = 2 & +2 & = 2 & + 2 & + 2 & 2 & Damping: =( π& −ϕ ) −η The peak in power spectrum will have a Lorentzian profile

43. Power spectrum of random noise =3 +7 7 →, , , +,, 3 → + ,3 The estimation of spectral density: ρ & = γ 7 π& = γ 7 → ( + 7 Thus : No matter how much one increase the number of 7 & =σ 7 points, N, the signal-to-noise will tend to be constant. For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed it’s only valid for homogeneous white noise (independent and identically distributed normal random variables)

44. Filling gaps The unevenly spaced data problem can be solve by (few suggestions): Finding a way to reduce the unevenly spaced sample into a evenly spaced. Basic idea: Interpolation of the missing points (problem: Doesn’t work for long gaps) Using the Lomb-Scargle periodogram Doing a deconvolution analysis (Filters)

45. Lomb-Scargle peridogram & − τ & − τ = = & = + & − τ & − τ = = & τ = − = & & = It’s like weighting the data on a “per point” basis instead on a “per time interval” basis, which make independent on sampling irregularity. It has an exponential probability distribution with unit mean, which means one can establish a false-alarm probability of the null hypothesis (significance level). 9> 9 = − − −9 8 ≈ 8 −9

46. Deconvolution analysis

47. Deconvolution 2 & ⊗ % & = 3 & + ε & signal noise Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to incomplete sampling (irregular sampling) of spatial frequency. Non-linear algorithm: CLEAN, All poles. Problem : The deconvolution usually does not a unique solutions.

48. Hogbom CLEAN algorithm The first CLEAN method was developed by Hogbom (1974). It constructs discrete approximations of the clean map from the convolution equation: ⊗ = Starting with /=/ , it searches for the largest value in the residual map: = − ⊗ − After locating the largest residual of given amplitude, it subtracts it from to to yield to . The iteration continues until root-mean-square (RMS) decreases to some level. Each subtracted location is saved in so-called CLEAN map. The resulting final map denoted by it is assumed that is mainly in noise.

49. CLEAN algorithm The basic steps of the CLEAN algorithm used in asteroseismology are: 1. Compute the power spectra of the signal and identify the dominant period 2. Perform a least-square fit to the data to obtain the amplitude and phase of the identified mode. 3. Constructs the time series corresponding to that single mode and subtracts from the original signal to obtain a new signal 4. Repeats all steps until all its left is noise. Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting the frequency it recalculates the amplitude, phase and frequencies of the previous subtracted peaks while fixing the frequency of the latest extracted peak.

50. All poles : = π &δ & = 2 & = : = The discrete FT is a particular case of the Z-transform (unilateral): +∞ : = : = It turns up that one can have some advantages by doing the following approximation: & ≈ 8 Press et al.(1992) + : = The notable fact is that the equation allows to have poles, corresponding to infinite spectral power density, on the unit z-circle (at the real frequencies of the Nyquist interval), and such poles can provide an accurate representation for underlying power spectra that have sharp discrete “lines” or delta-functions. M is called the number of poles. This approximation does under several names all-poles model, Maximum Entropy method (MEM), auto regressive model (AR).

51. Phase dispersion minimization PDM

52. Definitions A discrete set of observations can be represented by to vectors, the magnitudes and the observation times ( with !"; ). Thus the variance of is given: − σ = = = − = Suppose that one divides the initial set into several subsets/samples. If M are the number samples, having , variances, and containing data points then the over all variance for all the samples is given by: − , = % = − 8 =

53. PDM as period search method Suppose that one want to minimize the variance of a data set with respect to the mean light curve. The phase vector is given: − φ = Considering as a function of the phase, the variance of these samples gives a scatter around the mean light curve. Defining : % Θ = σ If P is not the true period % ≈ σ Θ ≈ If P is true value then Θ will reach a local minimum. Mathematically, the PDM is a least-square fitting, but rather than fitting a given curve, is a fitting relatively to mean curve as defined by means of each bin, simultaneously one obtain the best period.

54. Wavelets

55. Wavelets transform Wavelets are a class of functions used to localize a given function in both space and scaling. A family of wavelets can be constructed from a function Ψ * sometimes known as the “mother wavelet” which is confined in a finite interval. The “daughter wavelets” Ψ are then formed by translation of (b) and contraction of (a). An individual wavelet can be written as: − * Ψ * = Ψ Then the wavelet transform is given by: +∞ − * < * = Ψ −∞ +∞ +∞ − = Ψ Ψ * Ψ * * −∞ −∞

56. Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.

57. Conclusion

58. Short overview Data analysis results must never be subjective, it should return the best fitting parameters, the underlying errors, accuracy of the fitted model. All the provided statistical information must be clear. Because data is necessary in all scientific fields there a bunch methods for optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to decided which method is the ideal method. Most of the time it the decision dependents on the data to be analyzed. All that has been considering here, was the case of a deterministic signal (a fixed amplitude) add to random noise. Sometimes the signal itself is probabilistic

59. Data analysis and Its applications on Asteroseismology Olga Moreira April 2005 DEA en Sciences “Astérosismologie” Lectured by Anne Thoul

60. Outline Principles of data analysis Introduction to spectral analysis Introduction Fourier analysis Fourier transform Power spectrum estimation Merit functions an parameters fitting Deconvolution analysis Maximum Likelihood Estimator CLEAN All poles Maximization/Minimization Problem Ordinary methods Phase dispersion Minimization Exotic methods Period search Goodness-of-fit Wavelet analysis Chi-square test Wavelets transform and Its applications K-S test The beauty of synthetic data Monte-Carlo simulations Hare-and-Hounds game

61. Part I Principles of data analysis

62. Introduction

63. What do you think of when someone say “data”? Roxbourg & Paternó - Eddington Workshop (Italy)

68. What all those definitions of data have in common? Incomplete Probability Inferences information theory Data Tools Analysis

69. Analysis Method Merit function Best fit Goodness-of-fit

70. Analysis Method A complete analysis should provide: Parameters; Error estimates on the parameters; A statistical measure of the goodness-of-fit Ignoring the 3rd step will bring drastical consequences

71. Merit functions and parameters fitting

72. Maximum Likelihood Estimators (MLE) λ = λ λ λ : Set of parameters = : Set of random variables = λ : Probability distribution characterized by λ and The posteriori probability of a single measurement is given by: = λ If are a set of independents and identical distributed (i.i.d) then the joint probability function becomes: = ∏ = λ Where λ =∏ λ is defined as the Likelihood = • The best fit of parameters is the one that maximizes the likelihood.

73. It is common to find defined the as the likelihood., but in fact is just the logarithm of the likelihood, which more easy to work with. = = λ or = − = Posteriori probability is the probability after the event under no circuntances should the likelihood be confused with probability density.

74. Error Estimate λ Gaussian shape Non-guassian shape Non-guassian shape with a single with several local λ = λ ± ∆λ extreme: extremes: ∆λ ≈ σ • No problem on the • Problems on the determination of determination of maximum, although it maximum can represent some • Problems on the error difficulties to the error bars estimative. bars estimative.

75. Estimator: Desirable properties Unbiased: Minimum variance : λ = λ −λ = σ λ → Information inequality/Cramer-Rao inequality: + ′λ σ λ ≥ λ = ′ =− λ ′ λ = σ λ ≥ λ • The larger the information the smaller the variance

76. MLE asymptotically unbiased ′ λ = λ + λ − λ λ + Neglecting the largers orders and →∞ λ =− λ − λ −λ λ =− λ −λ + λ ≈ σ λ λ The MLE function has the form of normal distribution with σ λ = and : λ λ = λ ± σ λ

77. In multi-dimensions: = λ= λ λ λ λ = λ = λ = ∂ λ ∂ λ λ = λ + λ −λ − λ −λ λ −λ + = ∂λ = = ∂λ ∂λ λ = λ + λ −λ λ −λ + ∂ λ →∞ = = Hessian matrix ∂λ ∂λ = λ −λ λ −λ Multivariate gaussian distribution λ λ λ λ = − ρ λ λ = σ λ = − σ λ σ λ

78. If λ λ = ↔ ≠ λ ≈ λ ± σ λ = If λ λ ≠ ↔ ≠ Tere is an error region not only defined by σ λ but by the complete covariance matrix . For instance in 2D the error region defines an elipse.

79. Least-square and Chi-square fit 1. Considering one measures with errors that are independently and normal distributed around the true value 2. The standard deviations σ are the same for all points. Then joint probability for !"# is given by $ ## ∝∏ − − ∆ = σ Maximazing is the same as minimazing − = σ The least-square fitting is the MLE of the fitted parameters if the measurements are independent and normally distributed. 3. If the deviations are different σ ! σ then : − =χ = σ

80. Limitations: Real data most of the time violate the i.i.d condition Sometimes one have a limited sample In practice depends on the λ behaviour The MLE grants a unique solution. α =α λ ∂ ∂ ∂α ∂ ∂ = ⋅ = = ∂λ ∂α ∂λ ∂α ∂λ But the uncertity of an estimate depends on the specifique choice of λ λ α α ∂λ λ λ α α λ ∂α λ <λ<λ = λ +∞ = α +∞ ≠ α +∞ ∂λ λ λ α α λ −∞ −∞ ∂α −∞

81. Example: modes stochastically excited ν =%ν χ ν For a single mode: Γ % ν = + Γ ν −ν + ν ν λ = − % ν λ % ν λ ν =− = %ν λ + = %ν Minimization λ= Γν →λ = Γν

82. Maximization/Minimization Problem

83. Going “Downhill” Methods Finding a global extreme is general very Press et al.(1992) difficult. For one dimensional minimization Usually there are two types of methods: • Methods that bracket the minimum: Golden section search, and parabolic interpolation (Brent’s Method) • Methods that use the first derivative Press et al.(1992) information. Multidimensional there are three kind of methods: • Direction-set methods. Powell’s method is the prototype. • Downhill Simplex method. • Methods that use the gradient information. Adapted from Press et al.(1992)

84. Falling in the wrong valley The downhill methods a lack on efficiency/robustness. For instance the simplex method can very fast for some functions and very slow for others. They depend on priori knowledge of the overall structure of vector space, and require repeated manual intervention. If the function to minimize is not well-known, sometimes, numerically speaking, a smooth hill can become an headache. They also don’t solve the famous combinatorial analysis problem : The traveling salesman problem

85. Exotic Methods Solving “The traveling salesman problem”: A salesman has to visit each city on a given list, knowing the distance between all cities will try to minimize the length of his tour. Methods available: Simulated Annealing: based on an analogy with thermodynamics. Genetic algorithms: based on an analogy to evolutionary selection rules. Nearest Neighbor Neural networks :based on the observation of biological neural network (brains). Knowledge-based systems, etc … Adapted from Charbonneau (1995)

86. Goodness-of-fit

87. Chi-square test − : is the number of events observed in χ = the ith bin : is the number expected according to = some known distribution + + - +# , ,# + ! , #$% , ) ( " ( !" #$% & " $ ' " !" #$% & " $ ' * * " H0: The data follow a specified distribution Significance level is determined by - χ & & is the degree of freedom: & ! ' ( + +, ) + )* )+ + , Normally acceptable models have - . /# , but day-in and day-out we find //" accepted models with -!" / '0

88. Kolmogorov-Smirnov (K-S) test Press et al.(1992) % ( ) : Cumulative distribution ( ) : Known cumulative distribution 1 : Maximum absolute difference between the two cumulative functions The significance of an observed value of D is given approximately by: 1 = % − ℘ 1> =- , + + + 1 −∞ > > +∞ + ∞ − - = − − = + , + = - , is a monotonic function with limits values: - , = : Largest agreement - , ∞ = : Smallest agreement

89. Synthetic data

90. Monte-Carlo simulations If one know something about the process that generated our data , given an assumed set of parameters l then one can figure out how to simulate our own sets of “synthetic” realizations of these parameters. The procedure is to draw random numbers from appropriate distribution so as to mimic our best understanding of the underlying processes and measurement errors. Stello et al. (2004) xi-hya

91. Hare-and-Hounds game Team A: generates theoretical mode frequencies and synthetic time series. Team B: analyses the time series, performs the mode identification and fitting, does the structure inversion Rules: The teams only have access to time series. Nothing else is allowed.

92. End of Part I Options available : • Questions • Coffee break • “Get on with it !!!”

93. Part II Introduction to spectral analysis

94. Fourier transform Properties: +∞ 2 = 2( & ) = π& −∞ 2 ( ( )+ 3 ) = 2 & +4 & & 2 =2& 2 ( )= 2 +∞ ( )= 3 +( ⇔ 2 ( )=2& ⋅4 & −∞ 2 ( ( )⊗ 3 ) = 2 & ⋅4 & Parseval’s Theorem: The power of a signal represented by f(t) is the same whether computed in time space or in frequency space: +∞ +∞ = 2 & & −∞ −∞

95. Sampling theorem γ ⋅γ 2& ϒ& 2 & ⊗ϒ & Adapted from Bracewell (1986) For a bandlimited signal, which has no components the frequency & , the sampling theorem states that a real signal can be reconstructed without error from samples taken uniformly at &.5& . The minimum sampling frequency, 2 & !5& is called the Nyquist frequency, corresponding to the sampling interval !"5& ( where ! ). 6

96. Undersampling The sampling theorem assumes that a signal is limited in frequency but in practice the signal is time limited. For 1 alias spectrum ."5& then signal the signal is 6 Spectrum undersampled. Overlying tails appear in spectrum, spectrum alias. Adapted from Bracewell (1986) Aliasing : Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)): The undersampled FT is evener than the complete FT as consequence the sampling procedure discriminates the zero components at &!& There is a leakage of the high frequencies (aliasing)

97. Discrete Fourier transform Discrete Fourier transform 2 & = π& = δ = = Discrete form of Parseval’s theorem: = 2 & = = Fast Fourier Transform (FFT): The FFT is a discrete Fourier transform algorithm which reduces the number of computation of N pints from 5 5 to 5 3 . This is done by means of Danielson- Lanczos lemma, which basic idea id to break a transform of length to 5 transforms of length 6 5.

98. Power spectrum estimation Periodogram: & = 2 & = π& = & = π& + π& = = If contains periodic signal i.e.: =3 +7 Random noise 3 = π& +ϕ Then at & =&/ there is a large contribution in the sum , for other values the terms in the sum will be randomly negative and positive, yielding to small contribution. Thus a peak in the periodogram reveals the existence of a periodic embedded signal.

99. Frequencies leakage: Leakage from nearby frequencies, which is described usually as a spectral window and is a primarily product of the finite length of data. Leakage from high frequencies, due to data sampling, the aforementioned aliasing. Tapering functions: Sometimes also called as data windowing. These functions try to smooth the leakage between frequencies bringing the interference slowly back to zero. The main goal is to narrow the peak and vanish the side lobes. Smoothing can represents in certain cases loss of information. Press et al.(1992) Press et al.(1992)

100. Futher complications Closely spaced frequencies: Direct contribution for the first aforementioned leakage. 2( + ) = 2 & +2 & = 2 & + 2 & + 2 & 2 & Damping: =( π& −ϕ ) −η The peak in power spectrum will have a Lorentzian profile

101. Power spectrum of random noise =3 +7 7 →, , , +,, 3 → + ,3 The estimation of spectral density: ρ & = γ 7 π& = γ 7 → ( + 7 Thus : No matter how much one increase the number of 7 & =σ 7 points, N, the signal-to-noise will tend to be constant. For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed it’s only valid for homogeneous white noise (independent and identically distributed normal random variables)

102. Filling gaps The unevenly spaced data problem can be solve by (few suggestions): Finding a way to reduce the unevenly spaced sample into a evenly spaced. Basic idea: Interpolation of the missing points (problem: Doesn’t work for long gaps) Using the Lomb-Scargle periodogram Doing a deconvolution analysis (Filters)

103. Lomb-Scargle peridogram & − τ & − τ = = & = + & − τ & − τ = = & τ = − = & & = It’s like weighting the data on a “per point” basis instead on a “per time interval” basis, which make independent on sampling irregularity. It has an exponential probability distribution with unit mean, which means one can establish a false-alarm probability of the null hypothesis (significance level). 9> 9 = − − −9 8 ≈ 8 −9

104. Deconvolution analysis

105. Deconvolution 2 & ⊗ % & = 3 & + ε & signal noise Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to incomplete sampling (irregular sampling) of spatial frequency. Non-linear algorithm: CLEAN, All poles. Problem : The deconvolution usually does not a unique solutions.

106. Hogbom CLEAN algorithm The first CLEAN method was developed by Hogbom (1974). It constructs discrete approximations of the clean map from the convolution equation: ⊗ = Starting with /=/ , it searches for the largest value in the residual map: = − ⊗ − After locating the largest residual of given amplitude, it subtracts it from to to yield to . The iteration continues until root-mean-square (RMS) decreases to some level. Each subtracted location is saved in so-called CLEAN map. The resulting final map denoted by it is assumed that is mainly in noise.

107. CLEAN algorithm The basic steps of the CLEAN algorithm used in asteroseismology are: 1. Compute the power spectra of the signal and identify the dominant period 2. Perform a least-square fit to the data to obtain the amplitude and phase of the identified mode. 3. Constructs the time series corresponding to that single mode and subtracts from the original signal to obtain a new signal 4. Repeats all steps until all its left is noise. Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting the frequency it recalculates the amplitude, phase and frequencies of the previous subtracted peaks while fixing the frequency of the latest extracted peak.

108. All poles : = π &δ & = 2 & = : = The discrete FT is a particular case of the Z-transform (unilateral): +∞ : = : = It turns up that one can have some advantages by doing the following approximation: & ≈ 8 Press et al.(1992) + : = The notable fact is that the equation allows to have poles, corresponding to infinite spectral power density, on the unit z-circle (at the real frequencies of the Nyquist interval), and such poles can provide an accurate representation for underlying power spectra that have sharp discrete “lines” or delta-functions. M is called the number of poles. This approximation does under several names all-poles model, Maximum Entropy method (MEM), auto regressive model (AR).

109. Phase dispersion minimization PDM

110. Definitions A discrete set of observations can be represented by to vectors, the magnitudes and the observation times ( with !"; ). Thus the variance of is given: − σ = = = − = Suppose that one divides the initial set into several subsets/samples. If M are the number samples, having , variances, and containing data points then the over all variance for all the samples is given by: − , = % = − 8 =

111. PDM as period search method Suppose that one want to minimize the variance of a data set with respect to the mean light curve. The phase vector is given: − φ = Considering as a function of the phase, the variance of these samples gives a scatter around the mean light curve. Defining : % Θ = σ If P is not the true period % ≈ σ Θ ≈ If P is true value then Θ will reach a local minimum. Mathematically, the PDM is a least-square fitting, but rather than fitting a given curve, is a fitting relatively to mean curve as defined by means of each bin, simultaneously one obtain the best period.

112. Wavelets

113. Wavelets transform Wavelets are a class of functions used to localize a given function in both space and scaling. A family of wavelets can be constructed from a function Ψ * sometimes known as the “mother wavelet” which is confined in a finite interval. The “daughter wavelets” Ψ are then formed by translation of (b) and contraction of (a). An individual wavelet can be written as: − * Ψ * = Ψ Then the wavelet transform is given by: +∞ − * < * = Ψ −∞ +∞ +∞ − = Ψ Ψ * Ψ * * −∞ −∞

114. Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.

115. Conclusion

116. Short overview Data analysis results must never be subjective, it should return the best fitting parameters, the underlying errors, accuracy of the fitted model. All the provided statistical information must be clear. Because data is necessary in all scientific fields there a bunch methods for optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to decided which method is the ideal method. Most of the time it the decision dependents on the data to be analyzed. All that has been considering here, was the case of a deterministic signal (a fixed amplitude) add to random noise. Sometimes the signal itself is probabilistic

Dataanalysis2

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Dataanalysis2

Ähnlich wie Dataanalysis2 (20)

Dataanalysis2