SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Correlation and Regression
Dr.A.Antonyraj
Example of Correlation
Is there an association between:
 Children’s IQ and Parents’ IQ
 Degree of social trust and number of membership
in voluntary association ?
 Urban growth and air quality violations?
 GRA funding and number of publication by
Ph.D. students
 Number of police patrol and number of crime
 Grade on exam and time on exam
Correlation
 Correlation coefficient: statistical index of
the degree to which two variables are
associated, or related.
 We can determine whether one variable is
related to another by seeing whether scores
on the two variables covary---whether they
vary together.
Scatterplot
 The relationship between any two variables
can be portrayed graphically on an x- and
y- axis.
 Each subject i1 has (x1, y1). When score s
for an entire sample are plotted, the result
is called scatter plot.
 Scatterplot
Direction of the relationship
Variables can be positively or negatively
correlated.
Positive correlation: A value of one variable
increase, value of other variable increase.
Negative correlation: A value of one variable
increase, value of other variable decrease.
Strength of the relationship
The magnitude of correlation:
 Indicated by its numerical value
 ignoring the sign
 expresses the strength of the linear
relationship between the variables.
r =1.00
r =.17
r = .42
r =.85
Pearson’s correlation coefficient
There are many kinds of correlation coefficients
but the most commonly used measure of
correlation is the Pearson’s correlation
coefficient. (r)
 The Pearson r range between -1 to +1.
 Sign indicate the direction.
 The numerical value indicates the strength.
 Perfect correlation : -1 or 1
 No correlation: 0
 A correlation of zero indicates the value are not linearly related.
 However, it is possible they are related in curvilinear fashion.
Standardized relationship
 The Pearson r can be thought of as a standardized measure of
the association between two variables.
 That is, a correlation between two variables equal to .64 is the
same strength of relationship as the correlation of .64 for two
entirely different variables.
 The metric by which we gauge associations is a standard
metric.
 Also, it turns out that correlation can be thought of as a
relationship between two variables that have first been
standardized or converted to z scores.
Correlation Represents
a Linear Relationship
 Correlation involves a linear relationship.
 "Linear" refers to the fact that, when we graph our
two variables, and there is a correlation, we get a
line of points.
 Correlation tells you how much two variables are
linearly related, not necessarily how much they are
related in general.
 There are some cases that two variables may have
a strong, or even perfect, relationship, yet the
relationship is not at all linear. In these cases, the
correlation coefficient might be zero.
Coefficient of Determination r2
 The percentage of shared variance is represented
by the square of the correlation coefficient, r2 .
 Variance indicates the amount of variability in a
set of data.
 If the two variables are correlated, that means that
we can account for some of the variance in one
variable by the other variable.
Coefficient of Determination r2
r2
Statistical significance of r
 A correlation coefficient calculated on a sample is
statistically significant if it has a very probability
of being zero in the population.
 In other words, to test r for significance, we test
the null hypothesis that, in the population the
correlation is zero by computing a t statistic.
 Ho: r = 0
 HA: r = 0
Some consideration in
interpreting correlation
1. Correlation represents a linear relations.
 Correlation tells you how much two variables are
linearly related, not necessarily how much they
are related in general.
 There are some cases that two variables may
have a strong perfect relationship but not linear.
For example, there can be a curvilinear
relationship.
Some consideration in
interpreting correlation
2. Restricted range (Slide: Truncated)
 Correlation can be deceiving if the full
information about each of the variable is not
available. A correlation between two variable is
smaller if the range of one or both variables is
truncated.
 Because the full variation of one variables is not
available, there is not enough information to see
the two variables covary together.
Some consideration in
interpreting correlation
3. Outliers
 Outliers are scores that are so obviously deviant
from the remainder of the data.
 On-line outliers ---- artificially inflate the
correlation coefficient.
 Off-line outliers --- artificially deflate the
correlation coefficient
On-line outlier
 An outlier which falls near where the regression
line would normally fall would necessarily
increase the size of the correlation coefficient, as
seen below.
 r = .457
Off-line outliers
 An outlier that falls some distance away from the
original regression line would decrease the size of
the correlation coefficient, as seen below:
 r = .336
Correlation and Causation
 Two things that go together may not necessarily
mean that there is a causation.
 One variable can be strongly related to another,
yet not cause it. Correlation does not imply
causality.
 When there is a correlation between X and Y.
 Does X cause Y or Y cause X, or both?
 Or is there a third variable Z causing both X and
Y , and therefore, X and Y are correlated?
Simple Linear Regression
 One objective of simple linear regression is
to predict a person’s score on a dependent
variable from knowledge of their score on
an independent variable.
 It is also used to examine the degree of
linear relationship between an independent
variable and a dependent variable.
Example of Linear Regression
 Predict “productivity” of factory workers
based on the “Test of Assembly Speed”
score.
 Predict “GPA” of college students based on
the “SAT” score.
 Examine the linear relationship between
“Blood cholesterol” and “fat intake”.
Prediction
 A perfect correlation between two variables produces a
line when plotted in a bivariate scatterplot
 In this figure, every increase of the value of X is
associated with an increase in Y without any exceptions.
If we wanted to predict values of Y based on a certain
value of X, we would have no problem in doing so with
this figure. A value of 2 for X should be associated with a
value of 10 on the Y variable, as indicated by this graph.
Error of Prediction:
“Unexplained Variance”
 Usually, prediction won't be so perfect. Most
often, not all the points will fall perfectly on the
line. There will be some error in the prediction.
 For each value of X, we know the approximate
value of Y but not the exact value.
Unexplained Variance
 We can look at how much each point falls off the line by drawing a
little line straight from the point to the line as shown below.
 If we wanted to summarize how much error in prediction we had
overall, we could sum up the distances (or deviations) represented by
all those little lines.
 The middle line is called the regression line.
The Regression Equation
 The regression equation is simply a
mathematical equation for a line. It is the
equation that describes the regression line.
In algebra, we represent the equation for a
line with something like this:
y = a + bx
Sum of Squares Residual
 Summing up the deviations of the points gives us an
overall idea of how much error in prediction there is.
 Unfortunately, this method does not work very well.
 If we choose a line that goes exactly through the middle
of the points, about half of the points that fall off of the
line should be below the line and about half should be
above. Some of the deviations will be negative and some
will be positive, and, thus the sum of all of them will
equal 0.
Sum of Squares Residual
 The (imaginary) scores that fall exactly on the
regression line are called the predicted scores,
and there is a predicted score for each value of
X. The predicted scores are represented by y^
 (sometimes referred to as "y-hat", because of the
little hat; or as "y-predict").
 So the sum of the squared deviations from the
predicted scores is represented by
Sum of Square Residual
• y scores is subtracted from the predicted score (or the line)
and then squared. Then all the squared deviations are
summed a measure of the residual variation
•sum of the squared deviations from the regression line (or
the predicted points) is a summary of the error up.
•Notice that this is a type of variation. It is the unexplained
variation in the prediction of y when x is used to predict the
y scores. Some books refer to this as the "sum of squares
residual" because it is of prediction
Regression Line
 If we want to draw a line that is perfectly through the
middle of the points, we would choose a line that had the
squared deviations from the line. Actually, we would use
the smallest squared deviations. This criterion for best line
is called the "Least Squares" criterion or Ordinary Least
Squares (OLS).
 We use the least squares criterion to pick the regression
line. The regression line is sometimes called the "line of
best fit" because it is the line that fits best when drawn
through the points. It is a line that minimizes the distance
of the actual scores from the predicted scores.
No relationship vs.
Strong relationship
•The regression line is flat when there is no ability to predict
whatsoever.
•The regression line is sloped at an angle when there is a
relationship.
Sum of Squares Regression: The
Explained Variance
 The extent to which the regression line is sloped represents the
amount we can predict y scores based on x scores, and the extent to
which the regression line is beneficial in predicting y scores over and
above the mean of the y scores.
 To represent this, we could look at how much the predicted points
(which fall on the regression line) deviate from the mean.
 This deviation is represented by the little vertical lines I've drawn in
the figure below.
Formula for Sum of Squares
Regression: Explained Variance
 The squared deviations of the predicted
scores from the mean score, or
 represent the amount of variance explained
in the y scores by the x scores.
Total Variation
 The total variation in the y score is
measured simply by the sum of the
squared deviations of the y scores from
the mean.
Total Variation
The explained sum of squares and
unexplained sum of squares add up to equal
the total sum of squares. The variation of the
scores is either explained by x or not.
Total sum of squares = explained sum of
squares + unexplained sum of squares.
R2
 The amount of variation explained by the
regression line in regression analysis is equal to
the amount of shared variation between the X and
Y variables in correlation.
R2
 We can create a ratio of the amount of
variance explained (sum or squares
regression, or SSR) relative to the overall
variation of the y variable (sum of squares
total, or SST) which will give us r-square.
SPSS Demo (Simple Regression)
Multiple Regression
 Multiple regression is an extension of a
simple linear regression.
 In multiple regression, a dependent
variable is predicted by more than one
independent variable
 Y = a + b1x1 + b2x2 + . . . + bkxk
A Hitchhiker’s Guide to Analyses
Dependent Variable
Dichotomous Continuous
Dichotomous Chi-square
Logistic Regression
Phi
Cramer'sV
t-test
ANOVA
Regression
Point-biserial Correlation
Independent
Variable
Continuous Logistic Regression
Point-biserial Correlation
Regression
Correlation

Weitere ähnliche Inhalte

Was ist angesagt?

Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression imptfreelancer
 
Topic 15 correlation spss
Topic 15 correlation spssTopic 15 correlation spss
Topic 15 correlation spssSizwan Ahammed
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regressionjasondroesch
 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & RegressionGrant Heller
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionmejikpg
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate dataUlster BOCES
 
Kendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotKendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotBharath kumar Karanam
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionAzmi Mohd Tamil
 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression AnalysisJ P Verma
 
Statistics-Correlation and Regression Analysis
Statistics-Correlation and Regression AnalysisStatistics-Correlation and Regression Analysis
Statistics-Correlation and Regression AnalysisRabin BK
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 

Was ist angesagt? (18)

Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression impt
 
Topic 15 correlation spss
Topic 15 correlation spssTopic 15 correlation spss
Topic 15 correlation spss
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & Regression
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation
CorrelationCorrelation
Correlation
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
 
Kendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotKendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plot
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear Regression
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression Analysis
 
Statistics-Correlation and Regression Analysis
Statistics-Correlation and Regression AnalysisStatistics-Correlation and Regression Analysis
Statistics-Correlation and Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 

Ähnlich wie Correlation and regression

Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptxIloveBepis
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysisFarzad Javidanrad
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation Qasim Raza
 
PPT Correlation.pptx
PPT Correlation.pptxPPT Correlation.pptx
PPT Correlation.pptxMahamZeeshan5
 
Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011Lauren Crosby
 
linear_regression_notes.pdf
linear_regression_notes.pdflinear_regression_notes.pdf
linear_regression_notes.pdfTamilarasiP13
 
Hph7310week2winter2009narr
Hph7310week2winter2009narrHph7310week2winter2009narr
Hph7310week2winter2009narrSarah
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
 
Correlation IN STATISTICS
Correlation IN STATISTICSCorrelation IN STATISTICS
Correlation IN STATISTICSKriace Ward
 
Multivariate Analysis Degree of association between two variable - Test of Ho...
Multivariate Analysis Degree of association between two variable- Test of Ho...Multivariate Analysis Degree of association between two variable- Test of Ho...
Multivariate Analysis Degree of association between two variable - Test of Ho...NiezelPertimos
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notesJapheth Muthama
 

Ähnlich wie Correlation and regression (20)

2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation
 
PPT Correlation.pptx
PPT Correlation.pptxPPT Correlation.pptx
PPT Correlation.pptx
 
Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011
 
linear_regression_notes.pdf
linear_regression_notes.pdflinear_regression_notes.pdf
linear_regression_notes.pdf
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Hph7310week2winter2009narr
Hph7310week2winter2009narrHph7310week2winter2009narr
Hph7310week2winter2009narr
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
 
Ch 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regressionCh 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regression
 
Correlation IN STATISTICS
Correlation IN STATISTICSCorrelation IN STATISTICS
Correlation IN STATISTICS
 
Multivariate Analysis Degree of association between two variable - Test of Ho...
Multivariate Analysis Degree of association between two variable- Test of Ho...Multivariate Analysis Degree of association between two variable- Test of Ho...
Multivariate Analysis Degree of association between two variable - Test of Ho...
 
Regression
RegressionRegression
Regression
 
Regression
RegressionRegression
Regression
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notes
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Applied statistics part 4
Applied statistics part  4Applied statistics part  4
Applied statistics part 4
 

Mehr von Antony Raj

Mehr von Antony Raj (12)

Wto
WtoWto
Wto
 
Sd
SdSd
Sd
 
Qualitycontrol
QualitycontrolQualitycontrol
Qualitycontrol
 
Production management
Production managementProduction management
Production management
 
Ibe
IbeIbe
Ibe
 
Banker and customer
Banker and customerBanker and customer
Banker and customer
 
6sigma
6sigma6sigma
6sigma
 
Ibe
IbeIbe
Ibe
 
Banker and customer
Banker and customerBanker and customer
Banker and customer
 
Sd
SdSd
Sd
 
Qualitycontrol
QualitycontrolQualitycontrol
Qualitycontrol
 
Production management
Production managementProduction management
Production management
 

Kürzlich hochgeladen

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 

Kürzlich hochgeladen (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Correlation and regression

  • 2. Example of Correlation Is there an association between:  Children’s IQ and Parents’ IQ  Degree of social trust and number of membership in voluntary association ?  Urban growth and air quality violations?  GRA funding and number of publication by Ph.D. students  Number of police patrol and number of crime  Grade on exam and time on exam
  • 3. Correlation  Correlation coefficient: statistical index of the degree to which two variables are associated, or related.  We can determine whether one variable is related to another by seeing whether scores on the two variables covary---whether they vary together.
  • 4. Scatterplot  The relationship between any two variables can be portrayed graphically on an x- and y- axis.  Each subject i1 has (x1, y1). When score s for an entire sample are plotted, the result is called scatter plot.
  • 6. Direction of the relationship Variables can be positively or negatively correlated. Positive correlation: A value of one variable increase, value of other variable increase. Negative correlation: A value of one variable increase, value of other variable decrease.
  • 7.
  • 8. Strength of the relationship The magnitude of correlation:  Indicated by its numerical value  ignoring the sign  expresses the strength of the linear relationship between the variables.
  • 9. r =1.00 r =.17 r = .42 r =.85
  • 10. Pearson’s correlation coefficient There are many kinds of correlation coefficients but the most commonly used measure of correlation is the Pearson’s correlation coefficient. (r)  The Pearson r range between -1 to +1.  Sign indicate the direction.  The numerical value indicates the strength.  Perfect correlation : -1 or 1  No correlation: 0  A correlation of zero indicates the value are not linearly related.  However, it is possible they are related in curvilinear fashion.
  • 11. Standardized relationship  The Pearson r can be thought of as a standardized measure of the association between two variables.  That is, a correlation between two variables equal to .64 is the same strength of relationship as the correlation of .64 for two entirely different variables.  The metric by which we gauge associations is a standard metric.  Also, it turns out that correlation can be thought of as a relationship between two variables that have first been standardized or converted to z scores.
  • 12. Correlation Represents a Linear Relationship  Correlation involves a linear relationship.  "Linear" refers to the fact that, when we graph our two variables, and there is a correlation, we get a line of points.  Correlation tells you how much two variables are linearly related, not necessarily how much they are related in general.  There are some cases that two variables may have a strong, or even perfect, relationship, yet the relationship is not at all linear. In these cases, the correlation coefficient might be zero.
  • 13.
  • 14. Coefficient of Determination r2  The percentage of shared variance is represented by the square of the correlation coefficient, r2 .  Variance indicates the amount of variability in a set of data.  If the two variables are correlated, that means that we can account for some of the variance in one variable by the other variable.
  • 16. Statistical significance of r  A correlation coefficient calculated on a sample is statistically significant if it has a very probability of being zero in the population.  In other words, to test r for significance, we test the null hypothesis that, in the population the correlation is zero by computing a t statistic.  Ho: r = 0  HA: r = 0
  • 17. Some consideration in interpreting correlation 1. Correlation represents a linear relations.  Correlation tells you how much two variables are linearly related, not necessarily how much they are related in general.  There are some cases that two variables may have a strong perfect relationship but not linear. For example, there can be a curvilinear relationship.
  • 18. Some consideration in interpreting correlation 2. Restricted range (Slide: Truncated)  Correlation can be deceiving if the full information about each of the variable is not available. A correlation between two variable is smaller if the range of one or both variables is truncated.  Because the full variation of one variables is not available, there is not enough information to see the two variables covary together.
  • 19. Some consideration in interpreting correlation 3. Outliers  Outliers are scores that are so obviously deviant from the remainder of the data.  On-line outliers ---- artificially inflate the correlation coefficient.  Off-line outliers --- artificially deflate the correlation coefficient
  • 20. On-line outlier  An outlier which falls near where the regression line would normally fall would necessarily increase the size of the correlation coefficient, as seen below.  r = .457
  • 21. Off-line outliers  An outlier that falls some distance away from the original regression line would decrease the size of the correlation coefficient, as seen below:  r = .336
  • 22. Correlation and Causation  Two things that go together may not necessarily mean that there is a causation.  One variable can be strongly related to another, yet not cause it. Correlation does not imply causality.  When there is a correlation between X and Y.  Does X cause Y or Y cause X, or both?  Or is there a third variable Z causing both X and Y , and therefore, X and Y are correlated?
  • 23. Simple Linear Regression  One objective of simple linear regression is to predict a person’s score on a dependent variable from knowledge of their score on an independent variable.  It is also used to examine the degree of linear relationship between an independent variable and a dependent variable.
  • 24. Example of Linear Regression  Predict “productivity” of factory workers based on the “Test of Assembly Speed” score.  Predict “GPA” of college students based on the “SAT” score.  Examine the linear relationship between “Blood cholesterol” and “fat intake”.
  • 25. Prediction  A perfect correlation between two variables produces a line when plotted in a bivariate scatterplot  In this figure, every increase of the value of X is associated with an increase in Y without any exceptions. If we wanted to predict values of Y based on a certain value of X, we would have no problem in doing so with this figure. A value of 2 for X should be associated with a value of 10 on the Y variable, as indicated by this graph.
  • 26. Error of Prediction: “Unexplained Variance”  Usually, prediction won't be so perfect. Most often, not all the points will fall perfectly on the line. There will be some error in the prediction.  For each value of X, we know the approximate value of Y but not the exact value.
  • 27. Unexplained Variance  We can look at how much each point falls off the line by drawing a little line straight from the point to the line as shown below.  If we wanted to summarize how much error in prediction we had overall, we could sum up the distances (or deviations) represented by all those little lines.  The middle line is called the regression line.
  • 28. The Regression Equation  The regression equation is simply a mathematical equation for a line. It is the equation that describes the regression line. In algebra, we represent the equation for a line with something like this: y = a + bx
  • 29. Sum of Squares Residual  Summing up the deviations of the points gives us an overall idea of how much error in prediction there is.  Unfortunately, this method does not work very well.  If we choose a line that goes exactly through the middle of the points, about half of the points that fall off of the line should be below the line and about half should be above. Some of the deviations will be negative and some will be positive, and, thus the sum of all of them will equal 0.
  • 30. Sum of Squares Residual  The (imaginary) scores that fall exactly on the regression line are called the predicted scores, and there is a predicted score for each value of X. The predicted scores are represented by y^  (sometimes referred to as "y-hat", because of the little hat; or as "y-predict").  So the sum of the squared deviations from the predicted scores is represented by
  • 31. Sum of Square Residual • y scores is subtracted from the predicted score (or the line) and then squared. Then all the squared deviations are summed a measure of the residual variation •sum of the squared deviations from the regression line (or the predicted points) is a summary of the error up. •Notice that this is a type of variation. It is the unexplained variation in the prediction of y when x is used to predict the y scores. Some books refer to this as the "sum of squares residual" because it is of prediction
  • 32. Regression Line  If we want to draw a line that is perfectly through the middle of the points, we would choose a line that had the squared deviations from the line. Actually, we would use the smallest squared deviations. This criterion for best line is called the "Least Squares" criterion or Ordinary Least Squares (OLS).  We use the least squares criterion to pick the regression line. The regression line is sometimes called the "line of best fit" because it is the line that fits best when drawn through the points. It is a line that minimizes the distance of the actual scores from the predicted scores.
  • 33. No relationship vs. Strong relationship •The regression line is flat when there is no ability to predict whatsoever. •The regression line is sloped at an angle when there is a relationship.
  • 34. Sum of Squares Regression: The Explained Variance  The extent to which the regression line is sloped represents the amount we can predict y scores based on x scores, and the extent to which the regression line is beneficial in predicting y scores over and above the mean of the y scores.  To represent this, we could look at how much the predicted points (which fall on the regression line) deviate from the mean.  This deviation is represented by the little vertical lines I've drawn in the figure below.
  • 35. Formula for Sum of Squares Regression: Explained Variance  The squared deviations of the predicted scores from the mean score, or  represent the amount of variance explained in the y scores by the x scores.
  • 36. Total Variation  The total variation in the y score is measured simply by the sum of the squared deviations of the y scores from the mean.
  • 37. Total Variation The explained sum of squares and unexplained sum of squares add up to equal the total sum of squares. The variation of the scores is either explained by x or not. Total sum of squares = explained sum of squares + unexplained sum of squares.
  • 38. R2  The amount of variation explained by the regression line in regression analysis is equal to the amount of shared variation between the X and Y variables in correlation.
  • 39. R2  We can create a ratio of the amount of variance explained (sum or squares regression, or SSR) relative to the overall variation of the y variable (sum of squares total, or SST) which will give us r-square.
  • 40. SPSS Demo (Simple Regression)
  • 41. Multiple Regression  Multiple regression is an extension of a simple linear regression.  In multiple regression, a dependent variable is predicted by more than one independent variable  Y = a + b1x1 + b2x2 + . . . + bkxk
  • 42. A Hitchhiker’s Guide to Analyses Dependent Variable Dichotomous Continuous Dichotomous Chi-square Logistic Regression Phi Cramer'sV t-test ANOVA Regression Point-biserial Correlation Independent Variable Continuous Logistic Regression Point-biserial Correlation Regression Correlation