SlideShare a Scribd company logo
1 of 58
Download to read offline
Linear Regression
      with
 1: Prepare data/specify model/read results

          2012-12-07 @HSPH
         Kazuki Yoshida, M.D.
           MPH-CLE student

                                       FREEDOM
                                       TO	
  KNOW
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
Previously in this group
n   Introduction               n   Graphics

n   Reading Data into R (1)    n   Groupwise, continuous

n   Reading Data into R (2)    n


n   Descriptive, continuous

n   Descriptive, categorical

n   Deducer
Menu


n   Linear regression
Ingredients
        Statistics                   Programming
n   Data preparation         n   within()

n   Model formula            n   factor(), relevel()

                              n   lm()

                              n   formula = Y ~ X1 + X2

                              n   summary()

                              n   anova(), car::Anova()
Open
R Studio
Create a new script
   and save it.
http://www.umass.edu/statdata/statdata/data/
We will use lowbwt dataset used in BIO213




             lowbwt.dat
http://www.umass.edu/statdata/statdata/data/lowbwt.txt
http://www.umass.edu/statdata/statdata/data/lowbwt.dat
Load dataset from web


lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat",
                  head = T, skip = 4)



                                       skip 4 rows
          header = TRUE
             to pick up
           variable names
“Fix” dataset


        lbw[c(10,39), "BWT"] <- c(2655, 3035)



            BWT column
                               Replace data points
10th,39th                  to make the dataset identical
  rows                         to BIO213 dataset
Lower case variable names


    names(lbw) <- tolower(names(lbw))



 Put them back into    Convert variable
  variable names      names to lower case
See overview
library(gpairs)
gpairs(lbw)
Recoding
Changing and creating variables
Name of newly created dataset
  (here replacing original)         Take dataset

 dataset <-
 	

within(dataset, {
 	

	

_variable manipulations_
 })         Perform variable manipulation
       You can specify by variable name
      only. No need for dataset$var_name
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})
Numeric to categorical:
                    element by element                                    1st will be reference
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})
                           1 to White                             1st will be reference
Categorize race and label: 2 to Black
                           3 to Other
Explained more in depth
factor() to create categorical variable
  Create new
variable named                               Take race variable
    race.cat
  lbw <- within(lbw, {

       ## Relabel race
       race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

  })


   Order levels 1, 2, 3
  Make 1 reference level
                                                Label levels 1, 2, 3 as
                                                White, Black, Other
Numeric to categorical:
                     range to element
lbw <- within(lbw, {
                                                                    1st will be reference
     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})                                      How breaks work

(-Inf                       0] 1 2] 3              4     5     6                     Inf    ]
             None             Normal                         Many
Reset reference level
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})

               Change reference level of ftv.cat variable
                       from None to Normal
Numeric to Boolean to Category
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))

})

       TRUE, FALSE                      ptl < 1 to FALSE, then to “0”
        vector created                  ptl >= 1 to TRUE, then to “1+”
             here                                          levels                  labels
Binary 0,1 to No,Yes
lbw <- within(lbw, {

     ## Categorize smoke ht ui
     smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes"))      One-by-one
     ht      <- factor(ht,     levels = 0:1, labels = c("No","Yes"))
     ui      <- factor(ui,    levels = 0:1, labels = c("No","Yes"))     method
})



## Alternative to above
lbw[,c("smoke","ht","ui")] <-
  lapply(lbw[,c("smoke","ht","ui")],
       function(var) {                                                 Loop method
          var <- factor(var, levels = 0:1, labels = c("No","Yes"))
       })
model formula
formula

 outcome ~ predictor1 + predictor2 + predictor3




               SAS equivalent:
model outcome = predictor1 predictor2 predictor3;
In the case of t-test

 continuous variable       grouping variable to
   to be compared            separate groups



          age ~ zyg
         Variable to be   Variable used
          explained        to explain
linear sum



Y ~ X1 + X2
n   . All variables except for the outcome

n   + X2 Add X2 term

n   - 1 Remove intercept

n   X1:X2 Interaction term between X1 and X2

n   X1*X2 Main effects and interaction term
Interaction term



Y ~ X1 + X2 + X1:X2
     Main effects   Interaction
Interaction term



Y ~ X1 * X2
   Main effects & interaction
On-the-fly variable manipulation
                        Inhibit formula
                   interpretation. For math
                         manipulation


  Y ~ X1 + I(X2 * X3)
              New variable (X2 times X3)
              created on-the-fly and used
Fit a model


lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui +
              ftv.cat + race.cat + preterm ,
             data = lbw)
See model object



   lm.full
Call: command repeated




             Coefficient for each
                  variable
See summary



summary(lm.full)
Call: command repeated         Residual
                                       distribution


                                          Coef/SE = t



                                              Dummy
                                              variables
                                               created



Model                             R^2 and adjusted R^2
F-test
ftv.catNone No 1st trimester visit people compared to
    Normal 1st trimester visit people (reference level)
ftv.catMany Many 1st trimester visit people compared to
    Normal 1st trimester visit people (reference level)
race.catBlack Black people compared to
     White people (reference level)
race.catOther Other people compared to
     White people (reference level)
Confidence intervals



confint(fit.lm)
Confidence intervals
         Lower      Upper
        boundary   boundary
ANOVA table (type I)



anova(lm.full)
ANOVA table (type I)
   degree of    Sequential   Mean SS
   freedom         SS        = SS/DF




 F = Mean SS / Mean SS of residual
Type I = Sequential SS
    1 age


          1st gets all in type I


                                               er lap
                                            ov I
                                          ut pe
                                      ll b n ty
                                    sa 1i
             las                  et n
                                 g e                    2 lwt
            on emtr           nd twe
                             2 e
              ly                b
                 in aini
                    typ ng
3 smoke                eI
ANOVA table (type III)


     library(car)
Anova(lm.full, type = 3)
ANOVA table (type III)
                 Marginal    degree of
                   SS        freedom
 Multi-
category
variables
tested as
   one




            F = Mean SS / Mean SS of residual
Type III = Marginal SS
      1 age
                           gin
                         ar I
                    ets m e II
              1s t g typ
                     in
               o nly




                                             e I in
                                          typ rg
                                                II
                                       i n ma
         las




                                     ly ets
        on    tg                                      2 lwt
                 ets


                                       dg
           ly
              in ma
                                   2n
                 typ rg
                                  on
3 smoke              e I in
                        II
Comparison

Type I            Type III
Effect plot

library(effects)
plot(allEffects(lm.full), ylim = c(2000,4000))

                                Fix Y-axis
                               values for all
                                   plots
Effect of a variable
with other covariate
   set at average
Interaction
This model is for
demonstration purpose.
                Continuous * Continuous


  lm.full.int <- lm(bwt ~ age*lwt + smoke +
    ht + ui + age*ftv.cat + race.cat*preterm,
    data = lbw)


 Continuous * Categorical
                            Categorical * Categorical
Anova(lm.full.int, type = 3)
Marginal    degree of
                   SS        freedom




Interaction
   terms




              F = Mean SS / Mean SS of residual
plot(effect("age:lwt", lm.full.int))



                                                 lwt level
Continuous * Continuous
plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE)
 Continuous * Categorical
plot(effect(c("race.cat*preterm"), lm.full.int),
x.var = "preterm", z.var = "race.cat", multiline = TRUE)
 Categorical * Categorical
Linear regression with R 1

More Related Content

What's hot

IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Barry DeCicco
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
Python for R Users
Python for R UsersPython for R Users
Python for R UsersAjay Ohri
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Functions In Scala
Functions In Scala Functions In Scala
Functions In Scala Knoldus Inc.
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In ScalaKnoldus Inc.
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 

What's hot (20)

R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
R language introduction
R language introductionR language introduction
R language introduction
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Programming in R
Programming in RProgramming in R
Programming in R
 
R language
R languageR language
R language
 
Functions In Scala
Functions In Scala Functions In Scala
Functions In Scala
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Introduction2R
Introduction2RIntroduction2R
Introduction2R
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In Scala
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 

Similar to Linear regression with R 1

Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )Ziyauddin Shaik
 
Python Performance 101
Python Performance 101Python Performance 101
Python Performance 101Ankur Gupta
 
Review session2
Review session2Review session2
Review session2NEEDY12345
 
Generic Functional Programming with Type Classes
Generic Functional Programming with Type ClassesGeneric Functional Programming with Type Classes
Generic Functional Programming with Type ClassesTapio Rautonen
 
Free Monads Getting Started
Free Monads Getting StartedFree Monads Getting Started
Free Monads Getting StartedKent Ohashi
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITiansAshish Bansal
 
Introduction to python cheat sheet for all
Introduction to python cheat sheet for allIntroduction to python cheat sheet for all
Introduction to python cheat sheet for allshwetakushwaha45
 
Declarative Thinking, Declarative Practice
Declarative Thinking, Declarative PracticeDeclarative Thinking, Declarative Practice
Declarative Thinking, Declarative PracticeKevlin Henney
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 englishyassminkhaldi1
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheetGil Cohen
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfRahul Jain
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Tip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python TypingTip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python TypingPatrick Viafore
 
Data Handling.pdf
Data Handling.pdfData Handling.pdf
Data Handling.pdfMILANOP1
 

Similar to Linear regression with R 1 (20)

Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )
 
R language tutorial.pptx
R language tutorial.pptxR language tutorial.pptx
R language tutorial.pptx
 
Python Performance 101
Python Performance 101Python Performance 101
Python Performance 101
 
Review session2
Review session2Review session2
Review session2
 
Generic Functional Programming with Type Classes
Generic Functional Programming with Type ClassesGeneric Functional Programming with Type Classes
Generic Functional Programming with Type Classes
 
Day2
Day2Day2
Day2
 
Python Lecture 11
Python Lecture 11Python Lecture 11
Python Lecture 11
 
Free Monads Getting Started
Free Monads Getting StartedFree Monads Getting Started
Free Monads Getting Started
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITians
 
Introduction to python cheat sheet for all
Introduction to python cheat sheet for allIntroduction to python cheat sheet for all
Introduction to python cheat sheet for all
 
Declarative Thinking, Declarative Practice
Declarative Thinking, Declarative PracticeDeclarative Thinking, Declarative Practice
Declarative Thinking, Declarative Practice
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 english
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheet
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdf
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Tip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python TypingTip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python Typing
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Data Handling.pdf
Data Handling.pdfData Handling.pdf
Data Handling.pdf
 

More from Kazuki Yoshida

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisKazuki Yoshida
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTKazuki Yoshida
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Kazuki Yoshida
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOKazuki Yoshida
 
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...Kazuki Yoshida
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Kazuki Yoshida
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Kazuki Yoshida
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionKazuki Yoshida
 
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Kazuki Yoshida
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataKazuki Yoshida
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with RKazuki Yoshida
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to DeducerKazuki Yoshida
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous dataKazuki Yoshida
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudioKazuki Yoshida
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISEDKazuki Yoshida
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 

More from Kazuki Yoshida (19)

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysis
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
 
Emacs Key Bindings
Emacs Key BindingsEmacs Key Bindings
Emacs Key Bindings
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSO
 
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impression
 
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing Data
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with R
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous data
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudio
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

Recently uploaded

Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 

Recently uploaded (20)

Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 

Linear regression with R 1

  • 1. Linear Regression with 1: Prepare data/specify model/read results 2012-12-07 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  • 2. Group Website is at: http://rpubs.com/kaz_yos/useR_at_HSPH
  • 3. Previously in this group n Introduction n Graphics n Reading Data into R (1) n Groupwise, continuous n Reading Data into R (2) n n Descriptive, continuous n Descriptive, categorical n Deducer
  • 4. Menu n Linear regression
  • 5. Ingredients Statistics Programming n Data preparation n within() n Model formula n factor(), relevel() n lm() n formula = Y ~ X1 + X2 n summary() n anova(), car::Anova()
  • 7. Create a new script and save it.
  • 9. We will use lowbwt dataset used in BIO213 lowbwt.dat http://www.umass.edu/statdata/statdata/data/lowbwt.txt http://www.umass.edu/statdata/statdata/data/lowbwt.dat
  • 10. Load dataset from web lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4) skip 4 rows header = TRUE to pick up variable names
  • 11. “Fix” dataset lbw[c(10,39), "BWT"] <- c(2655, 3035) BWT column Replace data points 10th,39th to make the dataset identical rows to BIO213 dataset
  • 12. Lower case variable names names(lbw) <- tolower(names(lbw)) Put them back into Convert variable variable names names to lower case
  • 15.
  • 17. Name of newly created dataset (here replacing original) Take dataset dataset <- within(dataset, { _variable manipulations_ }) Perform variable manipulation You can specify by variable name only. No need for dataset$var_name
  • 18. lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) })
  • 19. Numeric to categorical: element by element 1st will be reference lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) }) 1 to White 1st will be reference Categorize race and label: 2 to Black 3 to Other
  • 20. Explained more in depth factor() to create categorical variable Create new variable named Take race variable race.cat lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) }) Order levels 1, 2, 3 Make 1 reference level Label levels 1, 2, 3 as White, Black, Other
  • 21. Numeric to categorical: range to element lbw <- within(lbw, { 1st will be reference ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) }) How breaks work (-Inf 0] 1 2] 3 4 5 6 Inf ] None Normal Many
  • 22. Reset reference level lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) }) Change reference level of ftv.cat variable from None to Normal
  • 23. Numeric to Boolean to Category lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+")) }) TRUE, FALSE ptl < 1 to FALSE, then to “0” vector created ptl >= 1 to TRUE, then to “1+” here levels labels
  • 24. Binary 0,1 to No,Yes lbw <- within(lbw, { ## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) One-by-one ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes")) method }) ## Alternative to above lbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { Loop method var <- factor(var, levels = 0:1, labels = c("No","Yes")) })
  • 26. formula outcome ~ predictor1 + predictor2 + predictor3 SAS equivalent: model outcome = predictor1 predictor2 predictor3;
  • 27. In the case of t-test continuous variable grouping variable to to be compared separate groups age ~ zyg Variable to be Variable used explained to explain
  • 28. linear sum Y ~ X1 + X2
  • 29. n . All variables except for the outcome n + X2 Add X2 term n - 1 Remove intercept n X1:X2 Interaction term between X1 and X2 n X1*X2 Main effects and interaction term
  • 30. Interaction term Y ~ X1 + X2 + X1:X2 Main effects Interaction
  • 31. Interaction term Y ~ X1 * X2 Main effects & interaction
  • 32. On-the-fly variable manipulation Inhibit formula interpretation. For math manipulation Y ~ X1 + I(X2 * X3) New variable (X2 times X3) created on-the-fly and used
  • 33. Fit a model lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)
  • 34. See model object lm.full
  • 35. Call: command repeated Coefficient for each variable
  • 37. Call: command repeated Residual distribution Coef/SE = t Dummy variables created Model R^2 and adjusted R^2 F-test
  • 38. ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level) ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)
  • 39. race.catBlack Black people compared to White people (reference level) race.catOther Other people compared to White people (reference level)
  • 41. Confidence intervals Lower Upper boundary boundary
  • 42. ANOVA table (type I) anova(lm.full)
  • 43. ANOVA table (type I) degree of Sequential Mean SS freedom SS = SS/DF F = Mean SS / Mean SS of residual
  • 44. Type I = Sequential SS 1 age 1st gets all in type I er lap ov I ut pe ll b n ty sa 1i las et n g e 2 lwt on emtr nd twe 2 e ly b in aini typ ng 3 smoke eI
  • 45. ANOVA table (type III) library(car) Anova(lm.full, type = 3)
  • 46. ANOVA table (type III) Marginal degree of SS freedom Multi- category variables tested as one F = Mean SS / Mean SS of residual
  • 47. Type III = Marginal SS 1 age gin ar I ets m e II 1s t g typ in o nly e I in typ rg II i n ma las ly ets on tg 2 lwt ets dg ly in ma 2n typ rg on 3 smoke e I in II
  • 48. Comparison Type I Type III
  • 49. Effect plot library(effects) plot(allEffects(lm.full), ylim = c(2000,4000)) Fix Y-axis values for all plots
  • 50. Effect of a variable with other covariate set at average
  • 52. This model is for demonstration purpose. Continuous * Continuous lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw) Continuous * Categorical Categorical * Categorical
  • 54. Marginal degree of SS freedom Interaction terms F = Mean SS / Mean SS of residual
  • 55. plot(effect("age:lwt", lm.full.int)) lwt level Continuous * Continuous
  • 56. plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE) Continuous * Categorical
  • 57. plot(effect(c("race.cat*preterm"), lm.full.int), x.var = "preterm", z.var = "race.cat", multiline = TRUE) Categorical * Categorical