This tutorial tries to define and describe the concept of Auto and Cross Correlation and how to calculate the coefficients. The procedure for finding the auto and cross correlation coefficients are described with examples.
1. AUTO-CORRELATION AND
CROSS-CORRELATION
DR.MRINMOY MAJUMDER (ORCID ID : 0000-0001-6231-5989)
Available at : “ LEARN ABOUT SOFT-COMPUTATION FOR OPTIMIZATION”-
www.utilizeoptimally.com
Lecture 1 of “INTRODUCTION TO DATA ANALYSIS TECHNIQUES” course and “HYDRO-
INFORMATICS”/”ADVANCED OPTIMIZATION TECHNIQUES” of MTECH(HIE)
3. DEFINITION
• The degree of correlation between adjacent values of the same time series.
• The analysis usually examines the changes in correlation as the separation
distance increases
• The separation distance is called the lag and is denoted by the letter tau or t
• The correlation between the adjacent time series is known as Lag 1 Auto
Correlation
• The correlation between the values separated by two time interval is known as
Lag 2 Auto Correlation
• A plot of the auto correlation coefficient vs lag is called the correlogram
4. PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT BETWEEN TWO
ADJACENT DATASET OF SAME TIMESERIES SEPARATED BY A DISTANCE OF TAU
• If A = Sum of the product of xi and xi+t from i
= 1 to N-t ( N is the total no. of data point in
the series)
• B = Sum of xi from i=1 to N-t
• C = Sum of xi from i=t+1 to N
• Let D = N-t
• If E = Sum of the square of xi from i=1 to N-t
• F= Sum of the square of xi from i=t+1 to N
• Then,
• R(t)
• = t-lag Auto Correlation Coefficient
•
𝐴−𝐷−1(𝐵×𝐶)
𝐸−𝐷−1 𝐵2 × 𝐹−𝐷−1 C2
M
NO
5. POINTS TO REMEMBER
• At t = 0, R(t) = 1
• As t increases values to calculate R(t) decreases and correlogram begin to
oscillate
• Maximum value of t must be equal to or less than 10% of N
• Strong secular trend = high auto correlation for small lags
• Periodic trend = peak of correlogram occurring at the period of the component
6. HOW TO CALCULATE
AUTOCORRELATION COEFFICIENT
1) The procedure to calculate R(t) begins with the creation of a table where the first
column will indicate the value of i.
2) Second column will depict the value of xi
3) Third column will show the value of xi+t
4) Fourth column will indicate the value of square of xi
5) After deducing the value of D, the fifth to ninth column will depict the value of
respectively A,B,C,E and F (As indicated in Pearson Product.. slide).
6) The tenth column will show the value of R(t) after calculating the same as per the
equation given in Pearson Product.. slide.
7. AUTOCORRELATION EXAMPLE : PROBLEM
• Calculate the lag-1 autocorrelation coefficient of the following data series :
4
3
5
4
6
5
8
Indicate the range within the dataset taken for calculation of B
Indicate the range within the dataset taken for calculation of C
Indicate the range within the dataset taken for calculation of E
Indicate the range within the dataset taken for calculation of F
10. DEFINITION
• Objective is to identify the significance of correlation and thus the predictability
between two time series
• Cross-correlation coefficients can be plotted against lag to produce Cross-
correlogram
• Calculation of coefficient is same like that of deduction of the auto-correlation
coefficient
• Only the xi+t term is replaced by yi+t where y represents the data point of the other
series
11. DIFFERENCE BETWEEN AUTO(A) AND CROSS(C)
CORRELATION
For A : value of coefficient at lag = 0 is always 1
For C : can take any value between - 1 to + 1
For A : peak of correlogram can be found at lag = 0
For C : peak of cross-correlogram can be observed at any lag other than 0
For A : calculation of positive lags is enough
For C : calculation of both positive and negative lag is required if both are physically
feasible
(rainfall of today will have zero impact on the runoff of yesterday)
12. CALCULATION OF CROSS-CORRELATION COEFFICIENT BETWEEN TWO TIME SERIES
XI (INDEPENDENT/CAUSE) AND YI (DEPENDANT/EFFECT)
• If A = Sum of the product of xi and yi+T from i =
1 to N- T ( N is the total no. of data point in
the series)
• B = Sum of xi from i=1 to N- T
• C = Sum of yi from i=T+1 to N
• Let D = N- T
• If E = Sum of the square of xi from i=1 to N-T
• F= Sum of the square of yi from i=1+T to N
• G = Sum of yi from i=1 +T to N
• Then,
• Rc(T)
• = T-lag Cross Correlation Coefficient
•
𝐴−𝐷−1(𝐵×𝐶)
𝐸−𝐷−1 𝐵2 × 𝐹−𝐷−1 G2
M
NO
Note : Here the absolute value of tau(T) is considered
13. HOW TO CALCULATE CROSS-CORRELATION COEFFICIENT
1) The procedure to calculate R(T) begins with the creation of a table where the first column will
indicate the value of i.
2) Second column will depict the value of xi
3) Third column will indicate the value of yi
4) Fourth column will show the value of yi+T
5) Fifth column will indicate the value of square of xi
6) Sixth column will depict the value of square of yi
7) After deducing the value of D, the seventh to eleventh column will depict the value of respectively
A,B,C,E and F (As indicated in twelvth slide).C and G is generally the same factor if tau is positive.
8) The tenth column will show the value of R(t) after calculating the same as per the equation given
in twelvth slide
14. CROSS CORRELATION EXAMPLE : PROBLEM
• Calculate the lag-1 cross-correlation coefficient of the following data series :
Indicate the range within the dataset taken for calculation of B
Indicate the range within the dataset taken for calculation of C
Indicate the range within the dataset taken for calculation of E
Indicate the range within the dataset taken for calculation of F
i xi yi
1 5 2.5
2 4.8 2.1
3 3.7 2
4 2.8 1.3
5 3.6 1.7
6 3.3 2
7 2.9 1.8