Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

correlation.pptx

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 35 Anzeige

Weitere Verwandte Inhalte

Aktuellste (20)

Anzeige

correlation.pptx

  1. 1. Applied Techniques for Economists Josphat Omanga. Copenhagen Business College. 1
  2. 2. Quantitative Methods, 4th edition, Swift & Piff Describing Data Toolkit Displays for numerical data: • Frequency charts • Relative frequency • Histogram Displays for categorical data: • Bar charts • Pie charts • Contingency tables Quantitative (numerical) data • Mean • Median • Mode • Range • Variance • Standard deviation • Quartiles • Index Summarising Data Toolkit Recap Location Spread 2
  3. 3. Main Objectives Scatter Plot Measures of Association a) Covariance b) Correlation Contingency Table & Dependency/Independency 3
  4. 4. Quantitative Methods, 4th edition, Swift & Piff Why analyse two (or more) variables together ? The main aim is to examine whether there is any pattern between the responses of two or more variables: - To understand the relationship better; - To impact one variable by changing the other; - To forecast; Depending on the type of variable: • Quantitative (numerical): scatter plot, covariance, correlation coefficient • Qualitative (categorical): contingency table and dependency 4
  5. 5. Quantitative Methods, 4th edition, Swift & Piff Scatter plot As briefly mentioned before in Data and Graphing Data, a scatter plot can show the relationship between two variables. Returns of two shares in 9 consecutive months 5
  6. 6. Quantitative Methods, 4th edition, Swift & Piff Scatter plot examples 6
  7. 7. Quantitative Methods, 4th edition, Swift & Piff Scatter plot examples 7
  8. 8. Quantitative Methods, 4th edition, Swift & Piff Scatter plot The owner of an ice cream store wants to examine the relationship between daily sales and atmospheric temperature. A sample of 25 consecutive days is selected and the data of consumption of ice cream per head (in pints) and average temperature (in Fahrenheit) is recorded as follows. 8
  9. 9. Scatter plot • Does a relationship between daily sales and temperature exist? • What kind of relationship exists? • How to measure this relationship (if it exists)? The distribution of the “cloud” of data indicates a positive relationship. How to be more precise on measuring this relationship? 9
  10. 10. Measures of Association Covariance • Covariance is a measure of the linear relationship between two variables. • A positive value indicates a direct or increasing linear relationship, while a negative value indicates a decreasing linear relationship. The equation for calculating the covariance is defined as: Note that this is the formula for sample covariance, and for population covariance it is divided by n. 10
  11. 11. Quantitative Methods, 4th edition, Swift & Piff Measures of Association Covariance e.g. Data (no. of observations n=5 ) (smoking years) x: 0 5 10 15 20 (lung capacity) y: 45 42 33 31 29 𝑥 = 0 + 5 + 10 + 15 + 20 5 = 10 𝑦 = 45 + 42 + 33 + 31 + 29 5 = 36 𝐶𝑜𝑣(𝑥, 𝑦) = 0 − 10 ∗ 45 − 36 + 5 − 10 ∗ 42 − 36 + 10 − 10 ∗ 33 − 36 + 15 − 10 ∗ 31 − 36 + 20 − 10 ∗ (29 − 36) 𝑛 − 1 = −10 ∗9 + −5 ∗6+0∗ −3 +5∗ −5 +10∗(−7) 5−1 = −90−30+0−25−70 4 = −215 4 = −53.75 11
  12. 12. Quantitative Methods, 4th edition, Swift & Piff Measures of Association Covariance e.g. Data (no. of observations n=5 ) (smoking years) x: 0 5 10 15 20 (lung capacity) y: 45 42 33 31 29 𝐶𝑜𝑣(𝑥, 𝑦) = −53.75 12
  13. 13. Practice question 1: 13 Person Age Days off work last year A 19 21 B 21 18 C 32 15 D 27 17 E 45 8 The table below is the sickness records: (a) Calculate the covariance of two variables according to question above. - Compute it manually using the formula - Compute it in Excel using spreadsheet. - Step by step - Using Excel formula
  14. 14. Quantitative Methods, 4th edition, Swift & Piff Measures of Association Covariance Properties Note that when cov(x, y) = 0 , it indicates that these two variables are independent (no relationship), whilst positive/negative relationship indicates dependent. 14
  15. 15. Measures of Association Correlation Coefficient Covariance is a very useful index to verify if two variables are independent or not. However, it fails to indicate how strong is this relationship. Correlation coefficient is kind of a normalized index to measure the association. The equation for calculating the correlation coefficient is defined as: same as 15
  16. 16. Measures of Association Correlation Coefficient VS Covariance The Correlation Coefficient has advantages over covariance for determining strengths of relationships: • Covariance can be any number with corresponding unit, while a correlation coefficient is unit free and its range is limited between -1 to 1. • Correlation is comparable index and it is useful for determining how strong the relationship is. 16
  17. 17. Measures of Association Correlation Coefficient e.g. Data (no. of observations n=5 ) (smoking years) x: 0 5 10 15 20 (lung capacity) y: 45 42 33 31 29 𝑥 = 0+5+10+15+20 5 = 10 𝑦 = 45+42+33+31+29 5 = 36 𝑟 = 0 − 10 ∗ 45 − 36 + 5 − 10 ∗ 42 − 36 + 10 − 10 ∗ 33 − 36 + 15 − 10 ∗ 31 − 36 + 20 − 10 ∗ (29 − 36) (0 − 10)2+(5 − 10)2+(10 − 10)2+(15 − 10)2+(20 − 10)2∗ (45 − 36)2+(42 − 36)2+(33 − 36)2+(31 − 36)2+(29 − 36)2 = −10 ∗ 9 + −5 ∗ 6 + 0 ∗ −3 + 5 ∗ −5 + 10 ∗ (−7) 100 + 25 + 0 + 25 + 100 ∗ 81 + 36 + 9 + 25 + 49 = −90 − 30 + 0 − 25 − 70 250 ∗ 200 = −215 50000 = −0.9615 17
  18. 18. Measures of Association Correlation Examples • The closer to -1, the stronger the negative linear relationship • The closer to 1, the stronger the positive linear relationship • The closer to 0, the weaker the linear relationship 18
  19. 19. Practice question 1 (cont.): 19 Person Age Days off work last year A 19 21 B 21 18 C 32 15 D 27 17 E 45 8 The table below is the sickness records: (a) Calculate the correlation of two variables according to question above. - Compute it manually using the formula - Compute it in Excel using spreadsheet. - Step by step - Using Excel formula
  20. 20. Quantitative Methods, 4th edition, Swift & Piff Contingency Table & Dependency/Independency The contingency table can be used to study the relationships that may exist between two qualitative variables. i.e. The contingency table that was briefly introduced in Data and Graphing Data 20
  21. 21. Quantitative Methods, 4th edition, Swift & Piff Contingency Table & Dependency/Independency The case of Titanic Many well-known facts are reflected in the survival rates for various classes of passenger. The British Board of Trade originally collected the data regarding passengers in their investigation of the sinking, which makes it possible to verify, i.e., if the “women and children first” or the “first-class passengers first” policies have been entirely followed during the saving operations. Here we will look into the “first-class passengers first” policy. 21
  22. 22. Contingency Table & Dependency/Independency The case of Titanic The data contain two qualitative variables: the class where each passenger was travelling and if the passenger has survived or not to the disaster. (note that unfortunately, no complete agreement among primary sources as to the exact numbers on board, rescued, or lost.) Just part of the long list of survivors: 22
  23. 23. Contingency Table & Dependency/Independency The case of Titanic ---- univariate description by table and graphs 23
  24. 24. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table By frequency By relative frequency 24
  25. 25. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table 25
  26. 26. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table 26
  27. 27. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table ---conditional relative distribution Three conditional relative distribution of “survived (X)” given “class (Y)”, normally written as X|Y Two conditional relative distribution of “class (Y)” given “survived (X)” , normally written as Y|X 27
  28. 28. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table ---conditional relative distribution 28
  29. 29. Contingency Table & Dependency/Independency Why not just the frequencies? Assume that 29
  30. 30. The Maths and Notations – Boring but Necessary 30
  31. 31. Contingency Table & Dependency/Independency 31
  32. 32. Contingency Table & Dependency/Independency 32
  33. 33. Contingency Table & Dependency/Independency The case of Titanic 33
  34. 34. Contingency Table & Dependency/Independency The case of Titanic 34
  35. 35. Make sure that you can • Calculate the covariance and correlation coefficient; • Understand their differences and properties; • Conduct a comprehensive association analysis on quantitative variables by adopting scatter plot, covariance, correlation with corresponding interpretations; • Understand the use of contingency table on dependency/independency analysis for qualitative variables. 35

×