This document provides a summary of key concepts from chapters on simple regression and correlation analysis. It defines regression analysis as determining the nature and strength of relationships between variables. Scatter plots are used to visualize these relationships. The regression line estimates the relationship between an independent and dependent variable. Correlation analysis describes the degree of linear relationship between variables using the coefficient of determination and coefficient of correlation. Examples are provided to demonstrate calculating the regression equation and correlation coefficient.
4. Correction of EXCEL Exercise 5 L=(8+1)*25%=2.25 Q1=133.5 L=(8+1)*75%=6.75 Q3=274.5 Interquartile Range =274.5-133.5 =141
5. Boxplot 1 2 2 4 5 7 8 9 12 Median 1 2 2 4 7 8 9 12 Quartile Q 1 =2 Q 3 =8.5 5 Interquartile Range Decile 1st D 9th D Percentile http://cnx.org/content/m11192/latest/ How to interpret?
6. Boxplot The distribution is skewed to __________ because the mean is __________the median. the right larger than http://cnx.org/content/m11192/latest/ € 20 € 2000 Q 1 = € 250 Q 3 = € 850 Median= € 350 Mean= € 450 a b
7. 0.8 1.0 1.0 1.2 1.2 1.3 1.5 1.7 2.0 2.0 2.1 2.2 4.0 2.0 3.2 3.6 3.7 4.0 4.2 4.2 4.5 4.5 4.6 4.8 5.0 5.0 Mean > Median Mean < Median Positively skewed Negatively skewed http://qudata.com/online/statcalc/
8. This means that the data is symmetrically distributed . Zero skewness mode=median=mean
17. Chapter 12: Sim Reg & Corr Scatter Diagrams: 2. Estimation Using the Regression Line
18.
19.
20. 2. Estimation Using the Regression Line Chapter 12: Sim Reg & Corr Slope of the Best-Fitting Regression Line: Y = a + b X a = Y - b X
21. 2. Estimation Using the Regression Line Chapter 12: Sim Reg & Corr the relationship between the age of a truck and the annual repair expense? a = 6 - 0.75*3 = 3.75 Ŷ = 3.75 + 0.75 X If the city has a truck that is 4 years old, the director could use the equation to predict $675 annually in repairs. 6.75 = 3.75 + 0.75 * 4 Y = a + b X a = Y - b X X=3 Y=6
22.
23. Exercise Chapter 12: Sim Reg & Corr Find Σ X, Σ Y, Σ XY, Σ X 2 . Σ X = 311 Mean = 62.2 Σ Y = 18.6 Mean = 3.72 Σ XY = 1159.7 Σ X 2 = 19359 Step 3: Step 4: Substitute in the above slope formula given. Slope(b) = = 0.19 1159.7-5*62.2*3.72 19359-5*62.2*62.2
24. Exercise Chapter 12: Sim Reg & Corr Then substitute these values in regression equation formula Regression Equation( Ŷ ) = a + bX Ŷ = -8.098 + 0.19 X . Slope(b) = 0.19 Suppose if we want to know the approximate y value for the variable X = 64. Then we can substitute the value in the above equation. Regression Equation: Ŷ = a + bX = -8.098 + 0.19( 64 ). = -8.098 + 12.16 = 4.06 Step 5: Step 6: Now, again substitute in the above intercept formula given. Intercept(a) = Y - b X = 3.72- 0.19 * 62.2= -8.098
25. 2. Estimation Using the Regression Line Chapter 12: Sim Reg & Corr Least Squares Method: Minimize the sum of the squares of the errors to measure the goodness of fit of a line e i = residual i
26. 2. Estimation Using the Regression Line Chapter 12: Sim Reg & Corr Least Squares Method:
28. 2. Estimation Using the Regression Line Chapter 12: Sim Reg & Corr Example Solution:
29. 3. Correlation Analysis Chapter 12: Sim Reg & Corr Correlation Analysis: describe the degree to which one variable is linearly related to another. Coefficient of Determination: Measure the extent, or strength, of the association that exists between two variables. Coefficient of Correlation: Square root of coefficient of determination r 2 r
35. Review Chapter 3: Describing Data Which value of r indicates a stronger correlation than 0.40? A. -0.30 B. -0.50 C. +0.38 D. 0 If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate? A. -1 B. +1 C. 0 D. Infinity
36. Review Chapter 3: Describing Data In the least squares equation, Ŷ = 10 + 20 X the value of 20 indicates A. the Y intercept. B. for each unit increase in X , Y increases by 20. C. for each unit increase in Y , X increases by 20. D. none of these.
37. Exercise Chapter 3: Describing Data A sales manager for an advertising agency believes there is a relationship between the number of contacts and the amount of the sales. To verify this belief, the following data was collected: What is the Y-intercept of the linear equation? A. -12.201 B. 2.1946 C. -2.1946 D. 12.201
Correlation and Cause Just because two variables are correlated, does not mean that one of the variables is the cause of the other. It could be the case, but it does not necessarily follow: There is a strong positive correlation between the number of cigarettes that one smokes a day and one's chances of contracting lung cancer (measured as the number of cases of lung cancer per hundred people who smoke a given number of cigarettes). The percentage of heavy smokers who contract lung cancer is higher than the percentage of light smokers who develop the disease, and both figures are higher than the percentage of non-smokers who get lung cancer. In this case, the cigarettes are definitely causing the cancer. There is a strong negative correlation between the total number of skiing holidays that people book for any month of the year and the total amount of ice cream that supermarkets sell for that month. This means that the more skiing holidays that are booked, the less ice cream is sold. Is there a cause here? Are people spending so much money on ice cream that they can't afford skiing holidays? Is the fact that the ice cream is so cold putting people off skiing? Clearly not! The simple fact is that most people tend to book their skiing holidays in the winter, and they tend to buy ice cream in the summer. Although a correlation between two variables doesn't mean that one of them causes the other, it can suggest a way of finding out what the true cause might be. There may be some underlying variable that is causing both of them. For instance, if a survey found that there is a correlation between the time that people spend watching television and the amount of crime that people commit, it could be because unemployed people tend to sit around watching the television, and that unemployed people are more likely to commit crime. If that were the case, then unemployment would be the true cause!
Correlation and Cause Just because two variables are correlated, does not mean that one of the variables is the cause of the other. It could be the case, but it does not necessarily follow: There is a strong positive correlation between the number of cigarettes that one smokes a day and one's chances of contracting lung cancer (measured as the number of cases of lung cancer per hundred people who smoke a given number of cigarettes). The percentage of heavy smokers who contract lung cancer is higher than the percentage of light smokers who develop the disease, and both figures are higher than the percentage of non-smokers who get lung cancer. In this case, the cigarettes are definitely causing the cancer. There is a strong negative correlation between the total number of skiing holidays that people book for any month of the year and the total amount of ice cream that supermarkets sell for that month. This means that the more skiing holidays that are booked, the less ice cream is sold. Is there a cause here? Are people spending so much money on ice cream that they can't afford skiing holidays? Is the fact that the ice cream is so cold putting people off skiing? Clearly not! The simple fact is that most people tend to book their skiing holidays in the winter, and they tend to buy ice cream in the summer. Although a correlation between two variables doesn't mean that one of them causes the other, it can suggest a way of finding out what the true cause might be. There may be some underlying variable that is causing both of them. For instance, if a survey found that there is a correlation between the time that people spend watching television and the amount of crime that people commit, it could be because unemployed people tend to sit around watching the television, and that unemployed people are more likely to commit crime. If that were the case, then unemployment would be the true cause!
Correlation and Cause Just because two variables are correlated, does not mean that one of the variables is the cause of the other. It could be the case, but it does not necessarily follow: There is a strong positive correlation between the number of cigarettes that one smokes a day and one's chances of contracting lung cancer (measured as the number of cases of lung cancer per hundred people who smoke a given number of cigarettes). The percentage of heavy smokers who contract lung cancer is higher than the percentage of light smokers who develop the disease, and both figures are higher than the percentage of non-smokers who get lung cancer. In this case, the cigarettes are definitely causing the cancer. There is a strong negative correlation between the total number of skiing holidays that people book for any month of the year and the total amount of ice cream that supermarkets sell for that month. This means that the more skiing holidays that are booked, the less ice cream is sold. Is there a cause here? Are people spending so much money on ice cream that they can't afford skiing holidays? Is the fact that the ice cream is so cold putting people off skiing? Clearly not! The simple fact is that most people tend to book their skiing holidays in the winter, and they tend to buy ice cream in the summer. Although a correlation between two variables doesn't mean that one of them causes the other, it can suggest a way of finding out what the true cause might be. There may be some underlying variable that is causing both of them. For instance, if a survey found that there is a correlation between the time that people spend watching television and the amount of crime that people commit, it could be because unemployed people tend to sit around watching the television, and that unemployed people are more likely to commit crime. If that were the case, then unemployment would be the true cause!
Correlation and Cause Just because two variables are correlated, does not mean that one of the variables is the cause of the other. It could be the case, but it does not necessarily follow: There is a strong positive correlation between the number of cigarettes that one smokes a day and one's chances of contracting lung cancer (measured as the number of cases of lung cancer per hundred people who smoke a given number of cigarettes). The percentage of heavy smokers who contract lung cancer is higher than the percentage of light smokers who develop the disease, and both figures are higher than the percentage of non-smokers who get lung cancer. In this case, the cigarettes are definitely causing the cancer. There is a strong negative correlation between the total number of skiing holidays that people book for any month of the year and the total amount of ice cream that supermarkets sell for that month. This means that the more skiing holidays that are booked, the less ice cream is sold. Is there a cause here? Are people spending so much money on ice cream that they can't afford skiing holidays? Is the fact that the ice cream is so cold putting people off skiing? Clearly not! The simple fact is that most people tend to book their skiing holidays in the winter, and they tend to buy ice cream in the summer. Although a correlation between two variables doesn't mean that one of them causes the other, it can suggest a way of finding out what the true cause might be. There may be some underlying variable that is causing both of them. For instance, if a survey found that there is a correlation between the time that people spend watching television and the amount of crime that people commit, it could be because unemployed people tend to sit around watching the television, and that unemployed people are more likely to commit crime. If that were the case, then unemployment would be the true cause!
More explanation: http://www.ncsu.edu/labwrite/res/gt/gt-reg-home.html
More explanation: http://www.ncsu.edu/labwrite/res/gt/gt-reg-home.html
More explanation: http://www.ncsu.edu/labwrite/res/gt/gt-reg-home.html