SlideShare a Scribd company logo
1 of 30
Download to read offline
SIMPLE LINEAR
REGRESSION
Avjinder Singh Kaler and Kristi Mai
 In the first part of this section we find the equation of the straight line
that best fits the paired sample data. That equation algebraically
describes the relationship between two variables.
 The best-fitting straight line is called a regression line and its equation is
called the regression equation.
Regression Equation – given a collection of paired sample data, the regression
equation that algebraically describes the relationship between the two
variables 𝑥 and 𝑦 is 𝑦 = 𝑏0 + 𝑏1 𝑥
• The regression equation attempts to describe a relationship between two
variables
• Inherently, the equation algebraically describes how the values of one variable
are somehow associated with the values of the other variable
Regression line – the graph of the regression equation
• Also known as the “line of best fit” or the “least square line”
• The regression line fits the sample points best
Notice the 𝑦 in the sample regression equation!
This implies that we are predicting something!
We are predicting values for 𝑦 based upon true and observed values
of 𝑥.
1. The sample of paired data is a simple random sample of quantitative data
2. The pairs of data 𝑥, 𝑦 have a bivariate normal distribution, meaning the
following:
• Visual examination of the scatter plot(s) confirms that the sample points follow
an approximately straight line(s)
• Because results can be strongly affected by the presence of outliers, any outliers
should be removed if they are known to be errors (Note: Use caution when
removing data points)
Slope:
• 𝑏1 = 𝑟 ∗
𝑆 𝑦
𝑆 𝑥
where 𝑠 𝑦 and 𝑠 𝑥 are the standard deviations of
𝑦 values and 𝑥 values
Intercept:
• 𝑏0 = 𝑦 − 𝑏1 ∗ 𝑥
Software will typically be utilized to calculate these
coefficients
We would like to use the explanatory variable, x, shoe print length, to
predict the response variable, y, height.
The data are listed below:
Requirement Check:
1. The data are assumed to be a simple random sample.
2. The scatterplot showed a roughly straight-line pattern.
3. There are no outliers.
The use of technology is recommended for finding the equation of a
regression line.
Using StatCrunch, we obtain the following result:
So then, the regression equation can be expressed as:
ˆ 125 1.73y x 
Graph the regression equation on a scatterplot: ˆ 125 1.73y x 
1. Use the regression equation for predictions ONLY if the graph of the regression
line on the scatter plot confirms that the line fits the points reasonably well.
2. Use the equation for predictions ONLY if the data used for prediction does not
go much beyond the scope of the available sample data.
3. Use the equation for prediction ONLY if 𝑟 indicates that there is a significant
linear correlation indicated between the two variables, 𝑥 and 𝑦
Notice: If the regression equation does not appear to be useful for predictions,
the best predicted value of a 𝑦 variable is its point estimate [i.e. the sample
mean of the 𝑦 variable would be the best predicted value for that variable]
Marginal change – in working with two variables related by a regression
equation, the marginal change in a variable is the amount that the variable
changes when the other variable changes by exactly one unit
• The slope, 𝑏1, in the regression equation is the marginal change in 𝑦 when 𝑥
changes by one unit
For the 40 pairs of shoe print lengths and heights, the regression equation was:
The slope of 3.22 tells us that if we increase shoe print length by 1 cm, the
predicted height of a person increases by 3.22 cm.
ˆ 80.9 3.22y x 
Scatter Plot – plot of paired 𝑥, 𝑦 quantitative data with a dot representing
each pair of points
• Helpful in determining whether there is a relationship between two variables
Time Series Graph – quantitative data collected over time and plotted
accordingly with the horizontal axis representing some measure of time
• Outlier – in a scatter plot, an outlier is a point lying far away from the other
data points
• Influential point – a point that strongly affects the graph of the regression line
For the 40 pairs of shoe prints and heights, observe what happens if we include
this additional data point:
x = 35 cm and y = 25 cm
The additional point is an influential point because the graph of the regression
line because the graph of the regression line did change considerably.
The additional point is also an outlier because it is far from the other points.
Residual – for a pair of sample 𝑥 and 𝑦 values, the difference between the
observed sample value of 𝑦 (a true value observed) and the y-value that is
predicted by using the regression equation 𝑦 is the residual
• 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 = 𝑦 − 𝑦
• A residual represents a type of inherent prediction error
• The regression equation does not, typically, pass through all the observed data
values that we have
The Least Squares Property – a straight line satisfies this property if the sum of the
squares of the residuals is the smallest sum possible
Residual Plot – a scatter plot of the 𝑥, 𝑦 values after each of the y-values has
been replaced by the residual value, 𝑦 − 𝑦.
• That is, a residual plot is a graph of the points 𝑥, 𝑦 − 𝑦 .
When analyzing a residual plot, look for a pattern in the way the points
are configured, and use these criteria:
The residual plot should not have any obvious patterns (not even a
straight line pattern). This confirms that the scatterplot of the sample
data is a straight-line pattern.
The residual plot should not become thicker (or thinner) when viewed
from left to right. This confirms the requirement that for different fixed
values of x, the distributions of the corresponding y values all have the
same standard deviation.
The shoe print and height data are used to generate the following
residual plot:
The residual plot becomes thicker, which suggests that the requirement of
equal standard deviations is violated.
The following slides show three residual plots.
Let’s observe what is good or bad about the individual regression
models.
Regression model is a good model:
Distinct pattern: sample data may not follow a straight-line pattern.
Residual plot becoming thicker: equal standard deviations violated.
1. Construct a scatterplot and verify that the pattern of the points is
approximately a straight-line pattern without outliers. (If there are
outliers, consider their effects by comparing results that include the
outliers to results that exclude the outliers.)
2. Construct a residual plot and verify that there is no pattern (other
than a straight-line pattern) and also verify that the residual plot
does not become thicker (or thinner).
3. Use a histogram and/or normal quantile plot to confirm that the
values of the residuals have a distribution that is approximately
normal.
4. Consider any effects of a pattern over time.

More Related Content

What's hot

Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
Carlo Magno
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
Harsh Upadhyay
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
Elkana Rorio
 

What's hot (20)

Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Simple linear regression and correlation
Simple linear regression and correlationSimple linear regression and correlation
Simple linear regression and correlation
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Regression
RegressionRegression
Regression
 
Linear regression theory
Linear regression theoryLinear regression theory
Linear regression theory
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysis
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 

Similar to Simple linear regression

FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
budbarber38650
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
Ulster BOCES
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
Kemal İnciroğlu
 
IDS.pdf
IDS.pdfIDS.pdf

Similar to Simple linear regression (20)

Correlation in Statistics
Correlation in StatisticsCorrelation in Statistics
Correlation in Statistics
 
rugs koco.pptx
rugs koco.pptxrugs koco.pptx
rugs koco.pptx
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Chapter05
Chapter05Chapter05
Chapter05
 
R nonlinear least square
R   nonlinear least squareR   nonlinear least square
R nonlinear least square
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
IDS.pdf
IDS.pdfIDS.pdf
IDS.pdf
 
Linear relations
   Linear relations   Linear relations
Linear relations
 
Regression
RegressionRegression
Regression
 

More from Avjinder (Avi) Kaler

More from Avjinder (Avi) Kaler (20)

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
 
Population genetics
Population geneticsPopulation genetics
Population genetics
 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Simple linear regression

  • 2.  In the first part of this section we find the equation of the straight line that best fits the paired sample data. That equation algebraically describes the relationship between two variables.  The best-fitting straight line is called a regression line and its equation is called the regression equation.
  • 3. Regression Equation – given a collection of paired sample data, the regression equation that algebraically describes the relationship between the two variables 𝑥 and 𝑦 is 𝑦 = 𝑏0 + 𝑏1 𝑥 • The regression equation attempts to describe a relationship between two variables • Inherently, the equation algebraically describes how the values of one variable are somehow associated with the values of the other variable Regression line – the graph of the regression equation • Also known as the “line of best fit” or the “least square line” • The regression line fits the sample points best
  • 4. Notice the 𝑦 in the sample regression equation! This implies that we are predicting something! We are predicting values for 𝑦 based upon true and observed values of 𝑥.
  • 5. 1. The sample of paired data is a simple random sample of quantitative data 2. The pairs of data 𝑥, 𝑦 have a bivariate normal distribution, meaning the following: • Visual examination of the scatter plot(s) confirms that the sample points follow an approximately straight line(s) • Because results can be strongly affected by the presence of outliers, any outliers should be removed if they are known to be errors (Note: Use caution when removing data points)
  • 6. Slope: • 𝑏1 = 𝑟 ∗ 𝑆 𝑦 𝑆 𝑥 where 𝑠 𝑦 and 𝑠 𝑥 are the standard deviations of 𝑦 values and 𝑥 values Intercept: • 𝑏0 = 𝑦 − 𝑏1 ∗ 𝑥 Software will typically be utilized to calculate these coefficients
  • 7. We would like to use the explanatory variable, x, shoe print length, to predict the response variable, y, height. The data are listed below:
  • 8. Requirement Check: 1. The data are assumed to be a simple random sample. 2. The scatterplot showed a roughly straight-line pattern. 3. There are no outliers. The use of technology is recommended for finding the equation of a regression line.
  • 9. Using StatCrunch, we obtain the following result: So then, the regression equation can be expressed as: ˆ 125 1.73y x 
  • 10. Graph the regression equation on a scatterplot: ˆ 125 1.73y x 
  • 11. 1. Use the regression equation for predictions ONLY if the graph of the regression line on the scatter plot confirms that the line fits the points reasonably well. 2. Use the equation for predictions ONLY if the data used for prediction does not go much beyond the scope of the available sample data. 3. Use the equation for prediction ONLY if 𝑟 indicates that there is a significant linear correlation indicated between the two variables, 𝑥 and 𝑦 Notice: If the regression equation does not appear to be useful for predictions, the best predicted value of a 𝑦 variable is its point estimate [i.e. the sample mean of the 𝑦 variable would be the best predicted value for that variable]
  • 12.
  • 13. Marginal change – in working with two variables related by a regression equation, the marginal change in a variable is the amount that the variable changes when the other variable changes by exactly one unit • The slope, 𝑏1, in the regression equation is the marginal change in 𝑦 when 𝑥 changes by one unit
  • 14. For the 40 pairs of shoe print lengths and heights, the regression equation was: The slope of 3.22 tells us that if we increase shoe print length by 1 cm, the predicted height of a person increases by 3.22 cm. ˆ 80.9 3.22y x 
  • 15. Scatter Plot – plot of paired 𝑥, 𝑦 quantitative data with a dot representing each pair of points • Helpful in determining whether there is a relationship between two variables
  • 16. Time Series Graph – quantitative data collected over time and plotted accordingly with the horizontal axis representing some measure of time
  • 17. • Outlier – in a scatter plot, an outlier is a point lying far away from the other data points • Influential point – a point that strongly affects the graph of the regression line
  • 18. For the 40 pairs of shoe prints and heights, observe what happens if we include this additional data point: x = 35 cm and y = 25 cm
  • 19. The additional point is an influential point because the graph of the regression line because the graph of the regression line did change considerably. The additional point is also an outlier because it is far from the other points.
  • 20. Residual – for a pair of sample 𝑥 and 𝑦 values, the difference between the observed sample value of 𝑦 (a true value observed) and the y-value that is predicted by using the regression equation 𝑦 is the residual • 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 = 𝑦 − 𝑦 • A residual represents a type of inherent prediction error • The regression equation does not, typically, pass through all the observed data values that we have The Least Squares Property – a straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible
  • 21.
  • 22. Residual Plot – a scatter plot of the 𝑥, 𝑦 values after each of the y-values has been replaced by the residual value, 𝑦 − 𝑦. • That is, a residual plot is a graph of the points 𝑥, 𝑦 − 𝑦 .
  • 23. When analyzing a residual plot, look for a pattern in the way the points are configured, and use these criteria: The residual plot should not have any obvious patterns (not even a straight line pattern). This confirms that the scatterplot of the sample data is a straight-line pattern. The residual plot should not become thicker (or thinner) when viewed from left to right. This confirms the requirement that for different fixed values of x, the distributions of the corresponding y values all have the same standard deviation.
  • 24. The shoe print and height data are used to generate the following residual plot: The residual plot becomes thicker, which suggests that the requirement of equal standard deviations is violated.
  • 25. The following slides show three residual plots. Let’s observe what is good or bad about the individual regression models.
  • 26. Regression model is a good model:
  • 27. Distinct pattern: sample data may not follow a straight-line pattern.
  • 28. Residual plot becoming thicker: equal standard deviations violated.
  • 29. 1. Construct a scatterplot and verify that the pattern of the points is approximately a straight-line pattern without outliers. (If there are outliers, consider their effects by comparing results that include the outliers to results that exclude the outliers.) 2. Construct a residual plot and verify that there is no pattern (other than a straight-line pattern) and also verify that the residual plot does not become thicker (or thinner).
  • 30. 3. Use a histogram and/or normal quantile plot to confirm that the values of the residuals have a distribution that is approximately normal. 4. Consider any effects of a pattern over time.