SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
Multicollniarity
          Venkat Reddy
Note
•   This presentation is just the lecture notes form the corporate training on
    Regression Analysis
•   The best way to treat this is as a high-level summary; the actual session went more
    in depth and contained other information.
•   Most of this material was written as informal notes, not intended for publication
•   Please send your questions/comments/corrections to
    venkat@trenwiseanalytics.com or 21.venkat@gmail.com
•   Please check my website for latest version of this document
                                                      -Venkat Reddy
Contents


• What is “Multicollinearity”?
• Causes
• Detection
• Effects
• Redemption
What is “Multicollinearity”?

• Multicollinearity (or inter correlation) exists when at least some of the
  predictor variables are correlated among themselves
• A “linear” relation between the predictors. Predictors are usually related to
  some extent, it is a matter of degree.
Multicollinearity-Illustration
• When correlation among X’s is low, OLS has lots of                        X1
  information to estimate b. This gives us confidence in our
  estimates of b                                                   Y

• What is the definition of regression coefficient by the way?              X2
• When correlation among X’s is high, OLS has very little
  information to estimate b. This makes us relatively
  uncertain about our estimate of b                                X1


                                                               Y       X2
Perfect Multicollinearity

• Recall to estimate b, the matrix (X’X)-1 had to exist
• What is OLS estimate of b or beta ?
• This meant that the matrix X had to be of full rank
• That is, none of the X’s could be a perfect linear function of any combination
  of the other X’s
• If so, then b is undefined- But this is very rare
Causes of Multicollinearity


• Statistical model specification: adding polynomial terms or trend indicators.
• Too many variables in the model – X’s measure the same conceptual
  variable.
• Data collection methods employed.
How to detect Multicollinearity

•   A high F statistic or R2 leads us to reject the joint hypothesis that all of the
    coefficients are zero, but the individual t-statistics are low. (why?)
•   VIF=1/(1-Rk2)
•   One can compute the condition number. That is, the ratio of the largest to the
    smallest root of the matrix x'x.
•   This may not always be useful as the standard errors of the estimates depend on
    the ratios of elements of the characteristic vectors to the roots.
•   High sample correlation coefficients are sufficient but not necessary for
    multicollinearity.
Effects of Multicollinearity

• Even in the presence of multicollinearity, OLS is BLUE and consistent.
• Standard errors of the estimates tend to be large.
• Large standard errors mean large confidence intervals. Large standard
  errors mean small observed test statistics. The researcher will accept too
  many null hypotheses. The probability of a type II error is large.
• Estimates of standard errors and parameters tend to be sensitive to changes
  in the data and the specification of the model.
Multicollniarity Redemption

•   Principal components estimator: This involves using a weighted average of the
    regressors, rather than all of the regressors.
•   Ridge regression technique: This involves putting extra weight on the main
    diagonal of x'x so that it produces more precise estimates. This is a biased
    estimator.
•   Drop the troublesome RHS variables. (This begs the question of specification
    error)
•   Use additional data sources. This does not mean more of the same. It means
    pooling cross section and time series.
•   Transform the data. For example, inversion or differencing.
•   Use prior information or restrictions on the coefficients.

Weitere ähnliche Inhalte

Mehr von Venkata Reddy Konasani

Mehr von Venkata Reddy Konasani (20)

GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Decision tree
Decision treeDecision tree
Decision tree
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
ARIMA
ARIMA ARIMA
ARIMA
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 

Kürzlich hochgeladen

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

Multicollinearity, Causes, Effects, Detection and Redemption

  • 1. Multicollniarity Venkat Reddy
  • 2. Note • This presentation is just the lecture notes form the corporate training on Regression Analysis • The best way to treat this is as a high-level summary; the actual session went more in depth and contained other information. • Most of this material was written as informal notes, not intended for publication • Please send your questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com • Please check my website for latest version of this document -Venkat Reddy
  • 3. Contents • What is “Multicollinearity”? • Causes • Detection • Effects • Redemption
  • 4. What is “Multicollinearity”? • Multicollinearity (or inter correlation) exists when at least some of the predictor variables are correlated among themselves • A “linear” relation between the predictors. Predictors are usually related to some extent, it is a matter of degree.
  • 5. Multicollinearity-Illustration • When correlation among X’s is low, OLS has lots of X1 information to estimate b. This gives us confidence in our estimates of b Y • What is the definition of regression coefficient by the way? X2 • When correlation among X’s is high, OLS has very little information to estimate b. This makes us relatively uncertain about our estimate of b X1 Y X2
  • 6. Perfect Multicollinearity • Recall to estimate b, the matrix (X’X)-1 had to exist • What is OLS estimate of b or beta ? • This meant that the matrix X had to be of full rank • That is, none of the X’s could be a perfect linear function of any combination of the other X’s • If so, then b is undefined- But this is very rare
  • 7. Causes of Multicollinearity • Statistical model specification: adding polynomial terms or trend indicators. • Too many variables in the model – X’s measure the same conceptual variable. • Data collection methods employed.
  • 8. How to detect Multicollinearity • A high F statistic or R2 leads us to reject the joint hypothesis that all of the coefficients are zero, but the individual t-statistics are low. (why?) • VIF=1/(1-Rk2) • One can compute the condition number. That is, the ratio of the largest to the smallest root of the matrix x'x. • This may not always be useful as the standard errors of the estimates depend on the ratios of elements of the characteristic vectors to the roots. • High sample correlation coefficients are sufficient but not necessary for multicollinearity.
  • 9. Effects of Multicollinearity • Even in the presence of multicollinearity, OLS is BLUE and consistent. • Standard errors of the estimates tend to be large. • Large standard errors mean large confidence intervals. Large standard errors mean small observed test statistics. The researcher will accept too many null hypotheses. The probability of a type II error is large. • Estimates of standard errors and parameters tend to be sensitive to changes in the data and the specification of the model.
  • 10. Multicollniarity Redemption • Principal components estimator: This involves using a weighted average of the regressors, rather than all of the regressors. • Ridge regression technique: This involves putting extra weight on the main diagonal of x'x so that it produces more precise estimates. This is a biased estimator. • Drop the troublesome RHS variables. (This begs the question of specification error) • Use additional data sources. This does not mean more of the same. It means pooling cross section and time series. • Transform the data. For example, inversion or differencing. • Use prior information or restrictions on the coefficients.