SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Linear models with R
Steve Hoang, PhD
UVA R Users Meetup
April 25, 2018
preamble
• Likely a wide range of expertise in
the audience
• This is a deep topic, and I’ll only
scratch the surface. Warning: if
you’re in the far LHS of the
distribution, this talk will be just
enough for you to be a danger to
yourself and others.
• The goal is to provide tools for
interpreting LMs, and a basic
vocabulary for pursuing deeper
topics.
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
what is a linear model?
regression ANOVA
what is a linear model?
multiple regression multi-way ANOVA
what is a linear model?
• mtcars: Let’s pretend that we would like to model 1/4 mile
time (Y, the “response”) as a function of horsepower (X, the
“predictor”) plus random noise
Y = f(X)+e
what is a linear model?
• mtcars: Let’s pretend that we would like to model 1/4 mile
time (Y, the “response”) as a function of horsepower (X, the
“predictor”) plus random noise
Y = f(X)+e
The LM: yi
= b0
+ xi
b1
+ei
what is a linear model?
yi
= b0
+ xi
b1
+ei
• Now, our task becomes a search for
parameters that minimizes the sum of the
squared residuals
• The R function that does this magic is lm()
what is a linear model?
yi
= b0
+ xi
b1
+ei
slope
residuals
intercept
what is a linear model?
• mtcars: Let’s pretend that we would like to model 1/4 mile
time (Y, the “response”) as a function of horsepower (X, the
“predictor”) plus random noise
Y = f(X)+e
yi
= b0
+ xi
b1
+ei
Y = Xb +e
The LM:
In matrix notation:
what is a linear model?
• Quick note: the “linear” in linear model refers to the fact
that the function linearly transforms the parameters
y = b0
+log(x)b1
+e
y = b0
+ x
b1
+e
y = b0
+(x2
+tanh(x))b1
+e
✔
✗
✔
valid
valid
not valid
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
regression ANOVA
Y = Xb +e
two flavors, one function:
lm()
regression ANOVA
Y = Xb +e
two flavors, one function:
lm()
regression ANOVA
Y = Xb +e
the “design matrix”
accessible through
model.matrix()
regression ANOVA
Y = Xb +e
the estimated parameters
accessible through
coef() or coefficients()
regression ANOVA
Y = Xb +e
the residuals
accessible through
resid() or residuals()
regression
regression
function call
regression
summary stats for residuals
regression
summary stats for fitted coefficients
regression
global model statistics
ANOVA
ANOVA
ANOVA
0
coef 1
coef 2
coef 3
ANOVA
no intercept
ANOVA
0
coef 1
coef 2
coef 3
no intercept
ANOVA
The broom package tidies your LMs
• Summarize model outputs into
tidy data frames: tidy()
• Quickly view model-scale
summaries: glance()
• See the original data augmented
with model statistics: augment()
• There’s more to broom, so have a
look for yourself.
ANOVA
ANOVA
ANOVA
some things to be aware of
• LMs make several assumptions about your data, look
them up. You want to be sure your data meets those
assumptions reasonably well.
– Homoscedasticity and normality of variance are the only
assumptions we will discuss.
• Look into “generalized linear models” (GLMs) and/or
quantile regression for non-normally distributed
data.
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
NOT HOMOSCEDASTIC!
testing for heteroscedasticity
The ‘car’ package is your friend (Companion to
Applied Regression) .
Use car::ncvTest() to check for heteroscedasticity
using the Breusch-Pagan test. (ncv = Non-Constant
Variance).
testing for heteroscedasticity
The ‘car’ package is your friend (Companion to
Applied Regression) .
Use car::ncvTest() to check for heteroscedasticity
using the Breusch-Pagan test. (ncv = Non-Constant
Variance).
variance-stabilizing transformations
• Variance stabilizing
transformations make it so
that the variance of Y is not
correlated with its mean
value.
• Take the Poisson
distribution, its mean is
equal to its variance. The
square root is the variance
stabilizing transformation of
a Poisson RV.
variance-stabilizing transformations
• Variance stabilizing
transformations make it so
that the variance of Y is not
correlated with its mean
value.
• Take the Poisson
distribution, its mean is
equal to its variance. The
square root is the variance
stabilizing transformation of
a Poisson RV.
variance-stabilizing transformations
• Variance stabilizing
transformations make it so
that the variance of Y is not
correlated with its mean
value.
• Take the Poisson
distribution, its mean is
equal to its variance. The
square root is the variance
stabilizing transformation of
a Poisson RV.
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
exponential model:
log Y( )= Xb +e
Y = eXb+e
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
additional thoughts
• Not everything can be transformed to be normal / homosecdastic,
and not everything necessarily needs to be.
– Consider nonparametric methods or GLMs.
– ANOVA is somewhat robust to heteroscedasticity when n and/or effect
size is relatively large.
• Use QQ plots to assess normality – qqnorm(); also Shapiro-Wilk test
– shapiro.test()
• The poly() function in conjunction with lm() can be used to fit n-
degree polynomials.
– Generally want to use raw = FALSE with poly()
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
multiple comparisons problem
multiple comparisons problem
p-value = 0.04
handling multiple comparisons
• The p.adjust() function is useful
– method = “Bonferroni” controls the “familywise error rate”
(FWER)
– method = “BH” controls the “false discovery rate” (FDR)
• The multcomp package provides a general framework
for simultaneous hyp. Testing
– Simultaneous Inference in General Parametric Models,
Hothorn et al., Biometrical Journal, 2008.
the multcomp package
the multcomp package
p-value = 0.2
the multcomp package
• Can specify contrasts with short
cuts e.g., “Dunnett” and
“Tukey”
• Can specify contrasts as strings,
e.g., “tx 7 – ctl = 0”
multcomp example: superadditivity
• Are any of the drugs synergistic?
Do any of them antagonize each
other?
multcomp example: superadditivity
multcomp example: superadditivity
lots glaring omissions
• Experimental designs
• Interaction terms
• Model parameterization
• Variable selection
• Confidence intervals
• ANCOVA models
• Random effects vs fixed effects
• Much more…
resources
• MOOCs: Lots of good LM courses out there
• Books:
– Linear models with R – Julian Faraway
– Extending the linear model with R – Julian Faraway
– Mixed-Effects Models in S and S-PLUS – Jose Pinheiro & Doug
Bates
– Mixed-Effects models and Extensions in Ecology with R – Alain
Zuur
• http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
– Ben Bolker’s GLMM FAQ (author of lme4)

Weitere ähnliche Inhalte

Ähnlich wie R Users Linear Model Guide

Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Complex sampling in latent variable models
Complex sampling in latent variable modelsComplex sampling in latent variable models
Complex sampling in latent variable modelsDaniel Oberski
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ingMatt Grant
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdfYashwanth Rm
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis pptMukesh Bisht
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis pptMukesh Bisht
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vectorDr Fereidoun Dejahang
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designssmackinnon
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Christian Robert
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
Backdoors to Satisfiability
Backdoors to SatisfiabilityBackdoors to Satisfiability
Backdoors to Satisfiabilitymsramanujan
 

Ähnlich wie R Users Linear Model Guide (20)

Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Complex sampling in latent variable models
Complex sampling in latent variable modelsComplex sampling in latent variable models
Complex sampling in latent variable models
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ing
 
Stats chapter 4
Stats chapter 4Stats chapter 4
Stats chapter 4
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdf
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis ppt
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis ppt
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designs
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
Backdoors to Satisfiability
Backdoors to SatisfiabilityBackdoors to Satisfiability
Backdoors to Satisfiability
 
An introduction to R
An introduction to RAn introduction to R
An introduction to R
 

Kürzlich hochgeladen

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Kürzlich hochgeladen (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

R Users Linear Model Guide

  • 1. Linear models with R Steve Hoang, PhD UVA R Users Meetup April 25, 2018
  • 2. preamble • Likely a wide range of expertise in the audience • This is a deep topic, and I’ll only scratch the surface. Warning: if you’re in the far LHS of the distribution, this talk will be just enough for you to be a danger to yourself and others. • The goal is to provide tools for interpreting LMs, and a basic vocabulary for pursuing deeper topics.
  • 3. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 4. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 5. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 6. what is a linear model? regression ANOVA
  • 7. what is a linear model? multiple regression multi-way ANOVA
  • 8. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e
  • 9. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e The LM: yi = b0 + xi b1 +ei
  • 10. what is a linear model? yi = b0 + xi b1 +ei • Now, our task becomes a search for parameters that minimizes the sum of the squared residuals • The R function that does this magic is lm()
  • 11. what is a linear model? yi = b0 + xi b1 +ei slope residuals intercept
  • 12. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e yi = b0 + xi b1 +ei Y = Xb +e The LM: In matrix notation:
  • 13. what is a linear model? • Quick note: the “linear” in linear model refers to the fact that the function linearly transforms the parameters y = b0 +log(x)b1 +e y = b0 + x b1 +e y = b0 +(x2 +tanh(x))b1 +e ✔ ✗ ✔ valid valid not valid
  • 14. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 15. regression ANOVA Y = Xb +e two flavors, one function: lm()
  • 16. regression ANOVA Y = Xb +e two flavors, one function: lm()
  • 17. regression ANOVA Y = Xb +e the “design matrix” accessible through model.matrix()
  • 18. regression ANOVA Y = Xb +e the estimated parameters accessible through coef() or coefficients()
  • 19. regression ANOVA Y = Xb +e the residuals accessible through resid() or residuals()
  • 23. regression summary stats for fitted coefficients
  • 25. ANOVA
  • 26. ANOVA
  • 29. ANOVA 0 coef 1 coef 2 coef 3 no intercept
  • 30. ANOVA The broom package tidies your LMs • Summarize model outputs into tidy data frames: tidy() • Quickly view model-scale summaries: glance() • See the original data augmented with model statistics: augment() • There’s more to broom, so have a look for yourself.
  • 31. ANOVA
  • 32. ANOVA
  • 33. ANOVA
  • 34. some things to be aware of • LMs make several assumptions about your data, look them up. You want to be sure your data meets those assumptions reasonably well. – Homoscedasticity and normality of variance are the only assumptions we will discuss. • Look into “generalized linear models” (GLMs) and/or quantile regression for non-normally distributed data.
  • 35. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 36.
  • 38. testing for heteroscedasticity The ‘car’ package is your friend (Companion to Applied Regression) . Use car::ncvTest() to check for heteroscedasticity using the Breusch-Pagan test. (ncv = Non-Constant Variance).
  • 39. testing for heteroscedasticity The ‘car’ package is your friend (Companion to Applied Regression) . Use car::ncvTest() to check for heteroscedasticity using the Breusch-Pagan test. (ncv = Non-Constant Variance).
  • 40. variance-stabilizing transformations • Variance stabilizing transformations make it so that the variance of Y is not correlated with its mean value. • Take the Poisson distribution, its mean is equal to its variance. The square root is the variance stabilizing transformation of a Poisson RV.
  • 41. variance-stabilizing transformations • Variance stabilizing transformations make it so that the variance of Y is not correlated with its mean value. • Take the Poisson distribution, its mean is equal to its variance. The square root is the variance stabilizing transformation of a Poisson RV.
  • 42. variance-stabilizing transformations • Variance stabilizing transformations make it so that the variance of Y is not correlated with its mean value. • Take the Poisson distribution, its mean is equal to its variance. The square root is the variance stabilizing transformation of a Poisson RV.
  • 43. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 44. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 45. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 46. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 47. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 48. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both.
  • 49. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both.
  • 50. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both. exponential model: log Y( )= Xb +e Y = eXb+e
  • 51. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both.
  • 52. additional thoughts • Not everything can be transformed to be normal / homosecdastic, and not everything necessarily needs to be. – Consider nonparametric methods or GLMs. – ANOVA is somewhat robust to heteroscedasticity when n and/or effect size is relatively large. • Use QQ plots to assess normality – qqnorm(); also Shapiro-Wilk test – shapiro.test() • The poly() function in conjunction with lm() can be used to fit n- degree polynomials. – Generally want to use raw = FALSE with poly()
  • 53. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 56. handling multiple comparisons • The p.adjust() function is useful – method = “Bonferroni” controls the “familywise error rate” (FWER) – method = “BH” controls the “false discovery rate” (FDR) • The multcomp package provides a general framework for simultaneous hyp. Testing – Simultaneous Inference in General Parametric Models, Hothorn et al., Biometrical Journal, 2008.
  • 59. the multcomp package • Can specify contrasts with short cuts e.g., “Dunnett” and “Tukey” • Can specify contrasts as strings, e.g., “tx 7 – ctl = 0”
  • 60. multcomp example: superadditivity • Are any of the drugs synergistic? Do any of them antagonize each other?
  • 63. lots glaring omissions • Experimental designs • Interaction terms • Model parameterization • Variable selection • Confidence intervals • ANCOVA models • Random effects vs fixed effects • Much more…
  • 64. resources • MOOCs: Lots of good LM courses out there • Books: – Linear models with R – Julian Faraway – Extending the linear model with R – Julian Faraway – Mixed-Effects Models in S and S-PLUS – Jose Pinheiro & Doug Bates – Mixed-Effects models and Extensions in Ecology with R – Alain Zuur • http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html – Ben Bolker’s GLMM FAQ (author of lme4)