SlideShare ist ein Scribd-Unternehmen logo
1 von 58
CORRELATION &REGRESSION
ANALYSIS Using SPSS
Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics)
www.paragstatistics.wordpress.com
Correlation
Correlation analysis is used to study the strength of
relationship between two or more quantitative
variables. Correlation shows the degree of linear
dependence between the two variables.
Correlation doesn’t imply causation.
If variables are not related by cause and effect
relationship but show correlation then such
correlation is called Spurious or Non-sense
correlation.
Correlation
Correlation can be positive, negative or zero
depending on the change between two variables.
If the change in two variables is in the same
direction it is positive correlation.
If the change in two variables is in the opposite
direction it is negative correlation.
If the change in one variable does not affect the
change in the other variable it is zero correlation.
Correlation
Coefficient
Correlation coefficient (r) is the measure of extent
of correlation between two variables.
There are several types of correlation coefficient
but the most popular is Karl Pearson’s correlation
coefficient.
Testing
Correlation
Coefficient
Null Hypothesis H0: 𝜌 = 0
[There is no significant linear correlation between two variables]
Alternative Hypothesis H1: 𝜌≠ 0
[There is significant linear correlation between two variables]
Test statistics: 𝐭 =
𝑟 𝑛−2
1−𝑟2
The test statistics t follows Student’s t distribution with 𝒏 − 𝟐
degrees of freedom.
Case Study
The body temperature (in 0
𝐹) for 100 adults were measured along with
their gender, age, and heart rate. Data: body_temp.xlsx .
Obtain correlation coefficient between body temperature and heart rate.
Also check its significance.
Null & Alternative
Hypothesis
Null Hypothesis H0: 𝜌 = 0
[There is no significant linear correlation between body
temperature and heart rate]
Alternative Hypothesis H1: 𝜌≠ 0
[There is significant linear correlation between body temperature
and heart rate]
Test Statistics t
and p value
Test Statistics t
and p value
Correlation coefficient (r) between two variables heart rate
and temperature is 0.448.
Here p value = 0.000 < 0.05, so null hypothesis is rejected.
Thus, there is significant linear correlation between Heart rate
and Temperature
Regression
Regression analysis is a set of statistical processes
for estimating the relationships between
a dependent variable (often called the 'outcome' or
'response' variable) and one or more independent
variables (often called 'predictors', 'covariates',
'explanatory variables' or 'features’).
Regression
Analysis
Regression analysis helps you understand how the
dependent variable changes when one of the
independent variables varies and allows to
mathematically determine which of those
variables really has an impact.
Regression analysis includes several variations,
such as linear, multiple linear, and nonlinear. The
most common models are simple linear and
multiple linear.
Types of Regression
Dependent variable Independent variable Type of
Regression
Relationship
between variables
One
(Scale )
One
(Scale)
Simple Linear Linear
One
(Scale)
Two or more
(Continuous / Categorical)
Multiple Linear Linear
One
( Categorical – binary)
Two or more
(Continuous / Categorical)
Logistic Need not be linear
One
( Categorical )
Two or more
(Continuous / Categorical)
Multinomial
Logistic
Need not be linear
Simple
Regression
The simple linear regression model is used to predict one
response (dependent) variable based on one predictor
(independent) variable.
The linear regression model can be stated as follows
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝑒𝑖 , 𝑖 = 1, 2, · · · , n.
where
• 𝑦𝑖 is value of the response variable,
• 𝑥𝑖 is the value of the predictor variable,
• 𝛽0 , 𝛽1are the parameters (regression coefficients),
• 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2.
Random Error
for this Xi value
Y
X
Observed Value
of Y for Xi
Predicted Value
of Y for Xi
i
i
1
0
i ε
x
β
β
y 


Xi
Slope = β1
Intercept = β0
εi
Graphical representation
Assumptions of
Simple
Regression
The four important assumptions for a simple linear
regression model are :
• The regression model is Linear in parameter.
• The errors are Independently distributed.
• The errors are Normally distributed.
• The errors have Equal variances. i.e. V (𝑒𝑖 ) = 𝜎2
.
( Homoscedasticity)
Method
The best line of fit can be obtained by the method of
least squares. It calculates the best line of fit for the
observed data by minimizing the sum of squares of the
vertical deviations from each data point to the line,
i.e., (𝑦𝑖 − 𝑦𝑖)2
Total variation is made up of two parts:
SSE
SSR
SST 

Total Sum of
Squares
Regression Sum
of Squares
Error Sum of
Squares
 
 2
i )
Y
Y
(
SST  
 2
i
i )
Ŷ
Y
(
SSE
 
 2
i )
Y
Ŷ
(
SSR
where: = Mean value of the dependent variable
Yi = Observed value of the dependent variable
= Predicted value of Y for the given Xi value
i
Y
ˆ
Y
• SST = total sum of squares (Total Variation)
• Measures the variation of the Yi values around their mean 𝑌
• SSR = regression sum of squares (Explained Variation)
• Variation attributable to the relationship between X and Y
• SSE = error sum of squares (Unexplained Variation)
• Variation in Y attributable to factors other than X
Measures of Variations
Xi
Y
X
Yi
SST = (Yi - Y)2
SSE = (Yi - Yi )2

SSR = (Yi - Y)2

_
_
_
Y

Y
Y
_
Y

Measures of Variations
The Coefficient of determination is the portion of the total variation in the
dependent variable that is explained by variation in the independent variable.
The coefficient of determination is denoted as R2
1
R
0 2


Note:


SST
SSR
R2
Coefficient of Determination
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
The Adjusted R-squared is a modified
version of R-squared that adjusts for
predictors that are not significant in a
regression model.
Adjusted R Square
R-squared increases every time you add an
independent variable to the model. Adjusted R-
squared value increases only when the new term
improves the model fit more than expected by
chance alone. The adjusted R-squared value
actually decreases when the term doesn’t
improve the model fit by a sufficient amount.
Multiple
Regression
The multiple linear regression model is used to predict a
response (independent) variable based on two or more
predictor variable (dependent) variable.
The multiple linear regression model can be stated as follows
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖1 + 𝛽2𝑥𝑖2 + ⋯ … … + 𝛽𝑝𝑥𝑖𝑝 + 𝑒𝑖 , 𝑖 = 1,2, · · , n.
where
• 𝑦𝑖 is 𝑖𝑡ℎvalue of the response variable,
• 𝑥𝑖𝑗 is the 𝑖𝑡ℎ
observation of 𝑗𝑡ℎ
predictor variable,
• 𝛽0, 𝛽1, 𝛽2 …. 𝛽𝑝 are the parameters (regression coefficients),
• 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2
.
Case Study 1
The body temperature (in 0
𝐹) for 100 adults were measured along with
their gender, age, and heart rate. The data is stored in body_temp.xlsx file.
Built a linear regression model for body temperature using heart rate as a
predictor.
Regression
Regression
Multiple R = Correlation Coefficient = 0.45
R Square = Coefficient of Determination = 0.20
R Square = 0.20 shows that 20% of variations in temperature due to Heart Rate.
Model Summary
p value = 0 < 0.05.
So, there is enough evidence that fitted regression model is significant.
The regression model predicts the dependent variable – Temperature,
significantly well.
ANOVA
H0: 𝛽1=0 [Regression coefficient for Heart Rate is
not significant]
H1: 𝛽1≠ 0 [Regression coefficient for Heart Rate is
significant]
p value of regression coefficient of Heart Rate = 0
< 0.05, H0 is rejected.
So , regression coefficient of Heart Rate is
significant.
Regression Coefficients
Regression Model:
Temperature = 92.391 + 0.081 Heart Rate
Checking
Assumptions
• The regression model is Linear in parameter.
• The errors are Independently distributed.
• The errors are Normally distributed.
• The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2
.
( Homoscedasticity)
Linearity Assumption
Linearity Assumption
Assumption - Errors are Independently distributed
Assumption - Errors are Independently distributed
Value of Durbin-Watson is
1.804,which is close to 2.
So, the assumption that errors
are independently distributed is
met
Normality & Homoscedasticity Assumptions
Normality Assumptions
Points are very close to the
diagonal line, so the variable -
temperature is normally distributed
Homoscedastic Assumptions
The data does not have an obvious
pattern, there are points equally
distributed above and below zero on the
X axis, and to the left and right of zero
on the Y axis.
So homoscedasticity assumption is met.
Case Study 2
The data were collected on a simple random sample of 20
patients with hypertension. The dataset is in arterialBp.csv.
The variables are
Y = mean arterial blood pressure (mm Hg)
X1 = age (years), X2 = weight (kgs)
X3 = body surface area (sq. m)
X4 = duration of hypertension (years)
X5 = basal pulse (beats /min), X6 = measure of stress
Fit an appropriate regression equation.
Case Study 2
Regression
Regression
Multiple R = Correlation Coefficient = 0.997
R Square = Coefficient of Determination = 0.995
R Square = 0.995 shows that 99.5% of variations in blood pressure is due to age,
weight, bsa, hypertension, pulse and stress.
Model Summary
p value = 0 < 0.05.
So, there is enough evidence that fitted regression model is significant.
The regression model predicts the dependent variable – blood pressure,
significantly well.
ANOVA
Regression Coefficients
Running the regression again after removing the insignificant variables:
hyper, pulse and stress
Multiple R = Correlation Coefficient = 0.997
R Square = Coefficient of Determination = 0.993
R Square = 0.993 shows that 99.3% of variations in blood pressure is due to age,
weight, bsa.
Model Summary
p value = 0 < 0.05.
So, there is enough evidence that fitted regression model is significant.
The regression model predicts the dependent variable – blood pressure,
significantly well.
ANOVA
Regression Coefficients
Regression Model:
Bp = -13.401 + 0.718 * Age + 0.896 * weight + 4.553 * bsa
Checking
Assumptions
• The regression model is Linear in parameter.
• The errors are Independently distributed.
• The errors are Normally distributed.
• The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2
.
( Homoscedasticity)
• There is no Multicollinearity
(No significant correlation between independent variables)
Linearity Assumptions
Linearity Assumptions
Linearity Assumptions
Normality & Homoscedasticity Assumptions
Normality Assumptions
Points are very close to the
diagonal line, so the variable - Bp is
normally distributed
Homoscedastic Assumptions
The data does not have an obvious
pattern, there are points equally
distributed above and below zero on the
X axis, and to the left and right of zero
on the Y axis.
So homoscedasticity assumption is met.
Assumption - Errors are Independently distributed
Assumption - Errors are Independently distributed
Value of Durbin-Watson is
1.537,which is close to 2.
So, the assumption that errors
are independently distributed
is met
Multicollinearity Assumptions
Multicollinearity Assumptions
Variance Inflation Factor(VIF) for all variables lie between 1 & 10, so there is no
multicollinearity. i.e. independent variables are do not have significant correlation between
them.
THANK YOU
Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics)
www.paragstatistics.wordpress.com

Weitere ähnliche Inhalte

Was ist angesagt? (20)

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Linear regression
Linear regression Linear regression
Linear regression
 
Multivariate Analysis
Multivariate AnalysisMultivariate Analysis
Multivariate Analysis
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSS
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
Normality
NormalityNormality
Normality
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Correlations using SPSS
Correlations using SPSSCorrelations using SPSS
Correlations using SPSS
 
Scales of Measurement
Scales of MeasurementScales of Measurement
Scales of Measurement
 
Correlation
CorrelationCorrelation
Correlation
 
multiple regression
multiple regressionmultiple regression
multiple regression
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysis
 

Ähnlich wie Correlation & Regression Analysis using SPSS

Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficientMuhamamdZiaSamad
 
STATISTICAL REGRESSION MODELS
STATISTICAL REGRESSION MODELSSTATISTICAL REGRESSION MODELS
STATISTICAL REGRESSION MODELSAneesa K Ayoob
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
Biostatistics Lecture on Correlation.pptx
Biostatistics Lecture on Correlation.pptxBiostatistics Lecture on Correlation.pptx
Biostatistics Lecture on Correlation.pptxFantahun Dugassa
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfRavinandan A P
 
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfMSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfSuchita Rawat
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regressionKhulna University
 

Ähnlich wie Correlation & Regression Analysis using SPSS (20)

Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
STATISTICAL REGRESSION MODELS
STATISTICAL REGRESSION MODELSSTATISTICAL REGRESSION MODELS
STATISTICAL REGRESSION MODELS
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptx
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Biostatistics Lecture on Correlation.pptx
Biostatistics Lecture on Correlation.pptxBiostatistics Lecture on Correlation.pptx
Biostatistics Lecture on Correlation.pptx
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Chapter13
Chapter13Chapter13
Chapter13
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Regression -Linear.pptx
Regression -Linear.pptxRegression -Linear.pptx
Regression -Linear.pptx
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdf
 
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfMSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 
Correlations
CorrelationsCorrelations
Correlations
 

Mehr von Parag Shah

Basic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxBasic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxParag Shah
 
Non- Parametric Tests
Non- Parametric TestsNon- Parametric Tests
Non- Parametric TestsParag Shah
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi squareParag Shah
 
Chi square tests using spss
Chi square tests using spssChi square tests using spss
Chi square tests using spssParag Shah
 
Chi square tests using SPSS
Chi square tests using SPSSChi square tests using SPSS
Chi square tests using SPSSParag Shah
 
t test using spss
t test using spsst test using spss
t test using spssParag Shah
 
Basics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for PharmacyBasics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for PharmacyParag Shah
 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesParag Shah
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excelParag Shah
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: EstimationParag Shah
 
Small sample test
Small sample testSmall sample test
Small sample testParag Shah
 
F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVAParag Shah
 
Testing of hypothesis - Chi-Square test
Testing of hypothesis - Chi-Square testTesting of hypothesis - Chi-Square test
Testing of hypothesis - Chi-Square testParag Shah
 
Testing of hypothesis - large sample test
Testing of hypothesis - large sample testTesting of hypothesis - large sample test
Testing of hypothesis - large sample testParag Shah
 
Statistics for Physical Education
Statistics for Physical EducationStatistics for Physical Education
Statistics for Physical EducationParag Shah
 
Career option for stats
Career option for statsCareer option for stats
Career option for statsParag Shah
 

Mehr von Parag Shah (17)

Basic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxBasic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptx
 
Non- Parametric Tests
Non- Parametric TestsNon- Parametric Tests
Non- Parametric Tests
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi square
 
Chi square tests using spss
Chi square tests using spssChi square tests using spss
Chi square tests using spss
 
Chi square tests using SPSS
Chi square tests using SPSSChi square tests using SPSS
Chi square tests using SPSS
 
t test using spss
t test using spsst test using spss
t test using spss
 
Basics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for PharmacyBasics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for Pharmacy
 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
 
Probability
Probability    Probability
Probability
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: Estimation
 
Small sample test
Small sample testSmall sample test
Small sample test
 
F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVA
 
Testing of hypothesis - Chi-Square test
Testing of hypothesis - Chi-Square testTesting of hypothesis - Chi-Square test
Testing of hypothesis - Chi-Square test
 
Testing of hypothesis - large sample test
Testing of hypothesis - large sample testTesting of hypothesis - large sample test
Testing of hypothesis - large sample test
 
Statistics for Physical Education
Statistics for Physical EducationStatistics for Physical Education
Statistics for Physical Education
 
Career option for stats
Career option for statsCareer option for stats
Career option for stats
 

Kürzlich hochgeladen

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Kürzlich hochgeladen (20)

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

Correlation & Regression Analysis using SPSS

  • 1. CORRELATION &REGRESSION ANALYSIS Using SPSS Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics) www.paragstatistics.wordpress.com
  • 2. Correlation Correlation analysis is used to study the strength of relationship between two or more quantitative variables. Correlation shows the degree of linear dependence between the two variables. Correlation doesn’t imply causation. If variables are not related by cause and effect relationship but show correlation then such correlation is called Spurious or Non-sense correlation.
  • 3. Correlation Correlation can be positive, negative or zero depending on the change between two variables. If the change in two variables is in the same direction it is positive correlation. If the change in two variables is in the opposite direction it is negative correlation. If the change in one variable does not affect the change in the other variable it is zero correlation.
  • 4. Correlation Coefficient Correlation coefficient (r) is the measure of extent of correlation between two variables. There are several types of correlation coefficient but the most popular is Karl Pearson’s correlation coefficient.
  • 5. Testing Correlation Coefficient Null Hypothesis H0: 𝜌 = 0 [There is no significant linear correlation between two variables] Alternative Hypothesis H1: 𝜌≠ 0 [There is significant linear correlation between two variables] Test statistics: 𝐭 = 𝑟 𝑛−2 1−𝑟2 The test statistics t follows Student’s t distribution with 𝒏 − 𝟐 degrees of freedom.
  • 6. Case Study The body temperature (in 0 𝐹) for 100 adults were measured along with their gender, age, and heart rate. Data: body_temp.xlsx . Obtain correlation coefficient between body temperature and heart rate. Also check its significance.
  • 7. Null & Alternative Hypothesis Null Hypothesis H0: 𝜌 = 0 [There is no significant linear correlation between body temperature and heart rate] Alternative Hypothesis H1: 𝜌≠ 0 [There is significant linear correlation between body temperature and heart rate]
  • 9.
  • 10. Test Statistics t and p value Correlation coefficient (r) between two variables heart rate and temperature is 0.448. Here p value = 0.000 < 0.05, so null hypothesis is rejected. Thus, there is significant linear correlation between Heart rate and Temperature
  • 11. Regression Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features’).
  • 12. Regression Analysis Regression analysis helps you understand how the dependent variable changes when one of the independent variables varies and allows to mathematically determine which of those variables really has an impact. Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear.
  • 13. Types of Regression Dependent variable Independent variable Type of Regression Relationship between variables One (Scale ) One (Scale) Simple Linear Linear One (Scale) Two or more (Continuous / Categorical) Multiple Linear Linear One ( Categorical – binary) Two or more (Continuous / Categorical) Logistic Need not be linear One ( Categorical ) Two or more (Continuous / Categorical) Multinomial Logistic Need not be linear
  • 14. Simple Regression The simple linear regression model is used to predict one response (dependent) variable based on one predictor (independent) variable. The linear regression model can be stated as follows 𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝑒𝑖 , 𝑖 = 1, 2, · · · , n. where • 𝑦𝑖 is value of the response variable, • 𝑥𝑖 is the value of the predictor variable, • 𝛽0 , 𝛽1are the parameters (regression coefficients), • 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2.
  • 15. Random Error for this Xi value Y X Observed Value of Y for Xi Predicted Value of Y for Xi i i 1 0 i ε x β β y    Xi Slope = β1 Intercept = β0 εi Graphical representation
  • 16. Assumptions of Simple Regression The four important assumptions for a simple linear regression model are : • The regression model is Linear in parameter. • The errors are Independently distributed. • The errors are Normally distributed. • The errors have Equal variances. i.e. V (𝑒𝑖 ) = 𝜎2 . ( Homoscedasticity)
  • 17. Method The best line of fit can be obtained by the method of least squares. It calculates the best line of fit for the observed data by minimizing the sum of squares of the vertical deviations from each data point to the line, i.e., (𝑦𝑖 − 𝑦𝑖)2
  • 18. Total variation is made up of two parts: SSE SSR SST   Total Sum of Squares Regression Sum of Squares Error Sum of Squares    2 i ) Y Y ( SST    2 i i ) Ŷ Y ( SSE    2 i ) Y Ŷ ( SSR where: = Mean value of the dependent variable Yi = Observed value of the dependent variable = Predicted value of Y for the given Xi value i Y ˆ Y • SST = total sum of squares (Total Variation) • Measures the variation of the Yi values around their mean 𝑌 • SSR = regression sum of squares (Explained Variation) • Variation attributable to the relationship between X and Y • SSE = error sum of squares (Unexplained Variation) • Variation in Y attributable to factors other than X Measures of Variations
  • 19. Xi Y X Yi SST = (Yi - Y)2 SSE = (Yi - Yi )2  SSR = (Yi - Y)2  _ _ _ Y  Y Y _ Y  Measures of Variations
  • 20. The Coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. The coefficient of determination is denoted as R2 1 R 0 2   Note:   SST SSR R2 Coefficient of Determination 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
  • 21. The Adjusted R-squared is a modified version of R-squared that adjusts for predictors that are not significant in a regression model. Adjusted R Square R-squared increases every time you add an independent variable to the model. Adjusted R- squared value increases only when the new term improves the model fit more than expected by chance alone. The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.
  • 22. Multiple Regression The multiple linear regression model is used to predict a response (independent) variable based on two or more predictor variable (dependent) variable. The multiple linear regression model can be stated as follows 𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖1 + 𝛽2𝑥𝑖2 + ⋯ … … + 𝛽𝑝𝑥𝑖𝑝 + 𝑒𝑖 , 𝑖 = 1,2, · · , n. where • 𝑦𝑖 is 𝑖𝑡ℎvalue of the response variable, • 𝑥𝑖𝑗 is the 𝑖𝑡ℎ observation of 𝑗𝑡ℎ predictor variable, • 𝛽0, 𝛽1, 𝛽2 …. 𝛽𝑝 are the parameters (regression coefficients), • 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2 .
  • 23. Case Study 1 The body temperature (in 0 𝐹) for 100 adults were measured along with their gender, age, and heart rate. The data is stored in body_temp.xlsx file. Built a linear regression model for body temperature using heart rate as a predictor.
  • 26. Multiple R = Correlation Coefficient = 0.45 R Square = Coefficient of Determination = 0.20 R Square = 0.20 shows that 20% of variations in temperature due to Heart Rate. Model Summary
  • 27. p value = 0 < 0.05. So, there is enough evidence that fitted regression model is significant. The regression model predicts the dependent variable – Temperature, significantly well. ANOVA
  • 28. H0: 𝛽1=0 [Regression coefficient for Heart Rate is not significant] H1: 𝛽1≠ 0 [Regression coefficient for Heart Rate is significant] p value of regression coefficient of Heart Rate = 0 < 0.05, H0 is rejected. So , regression coefficient of Heart Rate is significant. Regression Coefficients Regression Model: Temperature = 92.391 + 0.081 Heart Rate
  • 29. Checking Assumptions • The regression model is Linear in parameter. • The errors are Independently distributed. • The errors are Normally distributed. • The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2 . ( Homoscedasticity)
  • 32. Assumption - Errors are Independently distributed
  • 33. Assumption - Errors are Independently distributed Value of Durbin-Watson is 1.804,which is close to 2. So, the assumption that errors are independently distributed is met
  • 35. Normality Assumptions Points are very close to the diagonal line, so the variable - temperature is normally distributed
  • 36. Homoscedastic Assumptions The data does not have an obvious pattern, there are points equally distributed above and below zero on the X axis, and to the left and right of zero on the Y axis. So homoscedasticity assumption is met.
  • 37. Case Study 2 The data were collected on a simple random sample of 20 patients with hypertension. The dataset is in arterialBp.csv. The variables are Y = mean arterial blood pressure (mm Hg) X1 = age (years), X2 = weight (kgs) X3 = body surface area (sq. m) X4 = duration of hypertension (years) X5 = basal pulse (beats /min), X6 = measure of stress Fit an appropriate regression equation.
  • 41. Multiple R = Correlation Coefficient = 0.997 R Square = Coefficient of Determination = 0.995 R Square = 0.995 shows that 99.5% of variations in blood pressure is due to age, weight, bsa, hypertension, pulse and stress. Model Summary
  • 42. p value = 0 < 0.05. So, there is enough evidence that fitted regression model is significant. The regression model predicts the dependent variable – blood pressure, significantly well. ANOVA
  • 43. Regression Coefficients Running the regression again after removing the insignificant variables: hyper, pulse and stress
  • 44. Multiple R = Correlation Coefficient = 0.997 R Square = Coefficient of Determination = 0.993 R Square = 0.993 shows that 99.3% of variations in blood pressure is due to age, weight, bsa. Model Summary
  • 45. p value = 0 < 0.05. So, there is enough evidence that fitted regression model is significant. The regression model predicts the dependent variable – blood pressure, significantly well. ANOVA
  • 46. Regression Coefficients Regression Model: Bp = -13.401 + 0.718 * Age + 0.896 * weight + 4.553 * bsa
  • 47. Checking Assumptions • The regression model is Linear in parameter. • The errors are Independently distributed. • The errors are Normally distributed. • The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2 . ( Homoscedasticity) • There is no Multicollinearity (No significant correlation between independent variables)
  • 52. Normality Assumptions Points are very close to the diagonal line, so the variable - Bp is normally distributed
  • 53. Homoscedastic Assumptions The data does not have an obvious pattern, there are points equally distributed above and below zero on the X axis, and to the left and right of zero on the Y axis. So homoscedasticity assumption is met.
  • 54. Assumption - Errors are Independently distributed
  • 55. Assumption - Errors are Independently distributed Value of Durbin-Watson is 1.537,which is close to 2. So, the assumption that errors are independently distributed is met
  • 57. Multicollinearity Assumptions Variance Inflation Factor(VIF) for all variables lie between 1 & 10, so there is no multicollinearity. i.e. independent variables are do not have significant correlation between them.
  • 58. THANK YOU Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics) www.paragstatistics.wordpress.com