SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Logistic Regression in Case-
Control study using – A
statistical tool
Satish Gupta
What is R?
 The R statistical programming language is a free open
source package.
 The language is very powerful for writing programs.
 Many statistical functions are already built in.
 Contributed packages expand the functionality to
cutting edge research.
Getting Started
 Go to www.r-project.org
 Downloads: CRAN (Comprehensive R Archive
Network)
 Set your Mirror: location close to you.
 Select Windows 95 or later, MacOS or UNIX
platforms
Getting Started
Basic operators and calculations
Comparison operators
 equal: ==
 not equal: !=
 greater/less than: > <
 greater/less than or equal: >= <=
Example: 1 == 1 # Returns TRUE
Basic operators and calculations
Logical operators
 AND: &
x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'.
x > y & x > 5 # Returns TRUE where both comparisons return TRUE.
 OR: |
x == y | x != y # Returns TRUE where at least one comparison is
TRUE.
 NOT: !
!x > y # The '!' sign returns the negation (opposite) of a logical
vector.
Basic operators and calculations
Calculations
 Four basic arithmetic functions: addition, subtraction,
multiplication and division
1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic
calculations.
 Calculations on vectors
x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for
the vector x its sum, mean, standard deviation and square root.
x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element
in the vectors x and y.
R-Graphics
R provides comprehensive graphics utilities for
visualizing and exploring scientific data. It includes:
 Scatter plots
 Line plots
 Bar plots
 Pie charts
 Heatmaps
 Venn diagrams
 Density plots
 Box plots
Data handling in R
 Load data: mydata = read.csv(“/path/mydata.csv”)
 See data on screen: data(mydata)
 See top part of data: head(mydata)
 Specific number of rows and column of data:
mydata[1:10,1:3]
 To get a type of data: class(mydata)
 Changing class of data: newdata = as.matrix(mydata)
 Summary of data: summary(mydata)
 Selecting (KEEPING) variables (columns)
newdata = mydata[c(1,3:5)]
Data handling in R
 Selecting observations
newdata= subset(mydata, age>=20 | age <10,
select=c(ID, weight)
newdata= subset(mydata, sex==“Male” & age >25,
select=weight:income)
 Excluding (DROPPING) variables (columns)
newdata = mydata[c(-3,-5)]
mydata$v3 = NULL
R-Library
 There are many tools defined as “package” are present in R for
different kind of analysis including data from genetics and
genomics.
 Depending upon the availability of library, it can be
downloaded from two sources
Using CRAN (Comprehensive R Archive Network) as:
install.packages(“package_name”)
Using Bioconductor as:
source("http://bioconductor.org/biocLite.R")
biocLite(“package_name”)
R-Library
 To load a package,
library() #Lists all libraries/packages that are available on a system.
library(genetics) #Package for genetics data analysis
library(help=genetics) #Lists all functions/objects of “genetics”
package
?function #Opens documentation of a function
What is Logistic Regression?
 Logistic regression describes the relationship between
a dichotomous response variable and a set of
explanatory variables.
 Logistic regression is often used because the
relationship between the DV (a discrete variable) and
a predictor is non-linear.
 A General Model:
Logistic Regression
JJ
disease
disease
disease XX
p
p
p βββ +++=
−
= 110)
1
log()logit(
Where:
pdisease is the probability that an individual has a particular
disease.
β0 is the intercept
β1, β2 …βJ are the coefficients (effects) of genetic factors
X1, X2 …XJ are the variables of genetic factors
Assumptions
 Logistic regression does not make any assumptions
of normality, linearity, and homogeneity of variance
for the independent variables.
 Because it does not impose these requirements, it is
preferred to discriminant analysis when the data does
not satisfy these assumptions.
Questions ??
 What is the relative importance of each predictor variable?
 How does each predictor variable affect the outcome?
 Does a predictor variable make the solution better or
worse or have no effect?
 Are there interactions among predictors?
 Does adding interactions among predictors
(continuous or categorical) improve the model?
 What is the strength of association between the outcome
variable and a set of predictors?
 Often in model comparison you want non-significant
differences so strength of association is reported for
even non-significant effects.
Types of Logistic Regression
 Unconditional logistic regression
 Conditional logistic regression
** Rule of thumbs
 Use conditional logistic regression if matching has been done,
and unconditional if there has been no matching.
 When in doubt, use conditional because it always gives
unbiased results. The unconditional method is said to
overestimate the odds ratio if it is not appropriate.
Data Format
Status Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2
1 1 <60 CT TT AG AG
0 1 >60 – 70 CC CC GG GG
1 2 <60 TT CC AG AA
0 2 >70 – 80 CC CT GG GG
1 3 >80 CC CC AA AA
0 3 >60 – 70 CT TT GG GG
1 4 <60 CC CC AA AG
0 4 >70 – 80 TT TT GG GG
1 5 >80 CC CC AG AA
0 5 <60 CC CC GG GG
1 6 >70 – 80 CT TT AA AA
0 6 >80 CC CC GG AG
1 7 >60 – 70 TT CC AA AG
Data and Library loading
 Load and use data in R (Using Lung cancer data from
PLoS One 2013, 8(3):e59051).
lung = read.csv(“/path/lung.csv”, sep= “t”, header = TRUE)
 Load the library and use data for analysis
library(epicalc)
use(lung)
Data Analysis
 Performing conditional logistic regression (Case vs. Control)
clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.074
>70 – 80 0.11(0.03 – 0.33) <0.001
>80 0.10(0.03 – 0.34) <0.001
Data Analysis
 Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
GPX1: ref.=CC 0.032
CT 0.44(0.22 – 0.86) 0.017
TT 0.42(0.13 – 1.38) 0.151
Data Analysis
 Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
 
crude
OR(95%CI)
adj.
OR(95%CI)
P(Wald's
test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042
>70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001
>80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001
GPX1:ref.=CC 0.006
CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004
TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313
Environmental
Factor
Genetic Factor
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data =
.data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.054
>70 – 80 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) <0.001
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=CC 0.034
CT 0.45 (0.24 – 0.85) 0.014
TT 0.44 (0.14 – 1.36) 0.156
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data =
.data)
logistic.display(ulogit_lung)
crude
OR(95%CI)
adj.
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074
>70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001
GPX1:ref.=CC 0.024
CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01
TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
Something More 
 Changing the default reference
GPX1 = relevel(GPX1, ref = "TT")
pack()
 Saving the result
result = clogistic.display(clogit_lung)
write.csv(result$table, file=“path/result.csv“, sep = “t”)
write.table(result$table, file=“path/result.xls“, sep = “t”)
Summary: regression models
 Regression models can be used to describe the
average effect of predictors on outcomes in your data
set.
 They can tell how likely that the effect is just be due
to chance.
 They can look at each predictor “adjusting for” the
others (estimating what would happen if all others
were held constant.)
Thanks to,
Prof. Virasakdi Chongsuvivatwong
Epidemiology Unit,
Faculty of Medicine,
Prince of Songkla University, Thailand

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Odds ratio
Odds ratioOdds ratio
Odds ratio
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 
3 cross sectional study
3 cross sectional study3 cross sectional study
3 cross sectional study
 
Meta analysis ppt
Meta analysis pptMeta analysis ppt
Meta analysis ppt
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical research
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
# 1st lect 1 intro to interventional research
# 1st lect 1  intro to interventional research# 1st lect 1  intro to interventional research
# 1st lect 1 intro to interventional research
 
Network meta analysis
Network meta analysisNetwork meta analysis
Network meta analysis
 
Types of study design
Types of study designTypes of study design
Types of study design
 
Randomisation techniques
Randomisation techniquesRandomisation techniques
Randomisation techniques
 
Epidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesEpidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notes
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Significance test
Significance testSignificance test
Significance test
 

Andere mochten auch

Andere mochten auch (15)

ACCUPASS活動通 行銷廣告版位說明
ACCUPASS活動通 行銷廣告版位說明ACCUPASS活動通 行銷廣告版位說明
ACCUPASS活動通 行銷廣告版位說明
 
Spatial Data Science with R
Spatial Data Science with RSpatial Data Science with R
Spatial Data Science with R
 
Confounder and effect modification
Confounder and effect modificationConfounder and effect modification
Confounder and effect modification
 
手把手教你 R 語言分析實務
手把手教你 R 語言分析實務手把手教你 R 語言分析實務
手把手教你 R 語言分析實務
 
R統計軟體簡介
R統計軟體簡介R統計軟體簡介
R統計軟體簡介
 
Bias and confounding
Bias and confoundingBias and confounding
Bias and confounding
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Dummy variable
Dummy variableDummy variable
Dummy variable
 
CM KaggleTW Share
CM KaggleTW ShareCM KaggleTW Share
CM KaggleTW Share
 
R programming
R programmingR programming
R programming
 
Antenatal care
Antenatal careAntenatal care
Antenatal care
 
Variables
VariablesVariables
Variables
 
Variables
 Variables Variables
Variables
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
SAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORSSAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORS
 

Ähnlich wie Logistic Regression in Case-Control Study

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysisRaman Kannan
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxGairuzazmiMGhani
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression modelsHamideh Iraj
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data AnalyticsABHISHEKDAHALE
 
Accounting serx
Accounting serxAccounting serx
Accounting serxzeer1234
 
Accounting serx
Accounting serxAccounting serx
Accounting serxzeer1234
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learningAkhilesh Joshi
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Adrian Olszewski
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptxefrembeyene4
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization Andrea Rubio
 

Ähnlich wie Logistic Regression in Case-Control Study (20)

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression models
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
 
Gene expression profiling ii
Gene expression profiling  iiGene expression profiling  ii
Gene expression profiling ii
 
spss teaching
spss teachingspss teaching
spss teaching
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptx
 
working with python
working with pythonworking with python
working with python
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization
 
Quality data management
Quality data managementQuality data management
Quality data management
 

Kürzlich hochgeladen

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Kürzlich hochgeladen (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

Logistic Regression in Case-Control Study

  • 1. Logistic Regression in Case- Control study using – A statistical tool Satish Gupta
  • 2. What is R?  The R statistical programming language is a free open source package.  The language is very powerful for writing programs.  Many statistical functions are already built in.  Contributed packages expand the functionality to cutting edge research.
  • 3. Getting Started  Go to www.r-project.org  Downloads: CRAN (Comprehensive R Archive Network)  Set your Mirror: location close to you.  Select Windows 95 or later, MacOS or UNIX platforms
  • 5. Basic operators and calculations Comparison operators  equal: ==  not equal: !=  greater/less than: > <  greater/less than or equal: >= <= Example: 1 == 1 # Returns TRUE
  • 6. Basic operators and calculations Logical operators  AND: & x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'. x > y & x > 5 # Returns TRUE where both comparisons return TRUE.  OR: | x == y | x != y # Returns TRUE where at least one comparison is TRUE.  NOT: ! !x > y # The '!' sign returns the negation (opposite) of a logical vector.
  • 7. Basic operators and calculations Calculations  Four basic arithmetic functions: addition, subtraction, multiplication and division 1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic calculations.  Calculations on vectors x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for the vector x its sum, mean, standard deviation and square root. x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element in the vectors x and y.
  • 8. R-Graphics R provides comprehensive graphics utilities for visualizing and exploring scientific data. It includes:  Scatter plots  Line plots  Bar plots  Pie charts  Heatmaps  Venn diagrams  Density plots  Box plots
  • 9. Data handling in R  Load data: mydata = read.csv(“/path/mydata.csv”)  See data on screen: data(mydata)  See top part of data: head(mydata)  Specific number of rows and column of data: mydata[1:10,1:3]  To get a type of data: class(mydata)  Changing class of data: newdata = as.matrix(mydata)  Summary of data: summary(mydata)  Selecting (KEEPING) variables (columns) newdata = mydata[c(1,3:5)]
  • 10. Data handling in R  Selecting observations newdata= subset(mydata, age>=20 | age <10, select=c(ID, weight) newdata= subset(mydata, sex==“Male” & age >25, select=weight:income)  Excluding (DROPPING) variables (columns) newdata = mydata[c(-3,-5)] mydata$v3 = NULL
  • 11. R-Library  There are many tools defined as “package” are present in R for different kind of analysis including data from genetics and genomics.  Depending upon the availability of library, it can be downloaded from two sources Using CRAN (Comprehensive R Archive Network) as: install.packages(“package_name”) Using Bioconductor as: source("http://bioconductor.org/biocLite.R") biocLite(“package_name”)
  • 12. R-Library  To load a package, library() #Lists all libraries/packages that are available on a system. library(genetics) #Package for genetics data analysis library(help=genetics) #Lists all functions/objects of “genetics” package ?function #Opens documentation of a function
  • 13. What is Logistic Regression?  Logistic regression describes the relationship between a dichotomous response variable and a set of explanatory variables.  Logistic regression is often used because the relationship between the DV (a discrete variable) and a predictor is non-linear.
  • 14.  A General Model: Logistic Regression JJ disease disease disease XX p p p βββ +++= − = 110) 1 log()logit( Where: pdisease is the probability that an individual has a particular disease. β0 is the intercept β1, β2 …βJ are the coefficients (effects) of genetic factors X1, X2 …XJ are the variables of genetic factors
  • 15. Assumptions  Logistic regression does not make any assumptions of normality, linearity, and homogeneity of variance for the independent variables.  Because it does not impose these requirements, it is preferred to discriminant analysis when the data does not satisfy these assumptions.
  • 16. Questions ??  What is the relative importance of each predictor variable?  How does each predictor variable affect the outcome?  Does a predictor variable make the solution better or worse or have no effect?  Are there interactions among predictors?  Does adding interactions among predictors (continuous or categorical) improve the model?  What is the strength of association between the outcome variable and a set of predictors?  Often in model comparison you want non-significant differences so strength of association is reported for even non-significant effects.
  • 17. Types of Logistic Regression  Unconditional logistic regression  Conditional logistic regression ** Rule of thumbs  Use conditional logistic regression if matching has been done, and unconditional if there has been no matching.  When in doubt, use conditional because it always gives unbiased results. The unconditional method is said to overestimate the odds ratio if it is not appropriate.
  • 18. Data Format Status Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2 1 1 <60 CT TT AG AG 0 1 >60 – 70 CC CC GG GG 1 2 <60 TT CC AG AA 0 2 >70 – 80 CC CT GG GG 1 3 >80 CC CC AA AA 0 3 >60 – 70 CT TT GG GG 1 4 <60 CC CC AA AG 0 4 >70 – 80 TT TT GG GG 1 5 >80 CC CC AG AA 0 5 <60 CC CC GG GG 1 6 >70 – 80 CT TT AA AA 0 6 >80 CC CC GG AG 1 7 >60 – 70 TT CC AA AG
  • 19. Data and Library loading  Load and use data in R (Using Lung cancer data from PLoS One 2013, 8(3):e59051). lung = read.csv(“/path/lung.csv”, sep= “t”, header = TRUE)  Load the library and use data for analysis library(epicalc) use(lung)
  • 20. Data Analysis  Performing conditional logistic regression (Case vs. Control) clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.074 >70 – 80 0.11(0.03 – 0.33) <0.001 >80 0.10(0.03 – 0.34) <0.001
  • 21. Data Analysis  Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) GPX1: ref.=CC 0.032 CT 0.44(0.22 – 0.86) 0.017 TT 0.42(0.13 – 1.38) 0.151
  • 22. Data Analysis  Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung)   crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042 >70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001 >80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001 GPX1:ref.=CC 0.006 CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004 TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313 Environmental Factor Genetic Factor
  • 23. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.054 >70 – 80 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) <0.001
  • 24. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=CC 0.034 CT 0.45 (0.24 – 0.85) 0.014 TT 0.44 (0.14 – 1.36) 0.156
  • 25. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074 >70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001 GPX1:ref.=CC 0.024 CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01 TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
  • 26. Something More   Changing the default reference GPX1 = relevel(GPX1, ref = "TT") pack()  Saving the result result = clogistic.display(clogit_lung) write.csv(result$table, file=“path/result.csv“, sep = “t”) write.table(result$table, file=“path/result.xls“, sep = “t”)
  • 27. Summary: regression models  Regression models can be used to describe the average effect of predictors on outcomes in your data set.  They can tell how likely that the effect is just be due to chance.  They can look at each predictor “adjusting for” the others (estimating what would happen if all others were held constant.)
  • 28. Thanks to, Prof. Virasakdi Chongsuvivatwong Epidemiology Unit, Faculty of Medicine, Prince of Songkla University, Thailand

Hinweis der Redaktion

  1. Coeffcients are calculated my MLE
  2. In order to test hypotheses in logistic regression, we have used the likelihood ratio test and the Wald test.
  3. If the confidence interval includes 0 we can say that there is no significant difference between the means of the two populations, at a given level of confidence. The width of the confidence interval gives us some idea about how uncertain we are about the difference in the means. A very wide interval may indicate that more data should be collected before anything definite can be said. A confidence interval that includes 1.0 means that the association between the exposure and outcome could have been found by chance alone and that the association is not statistically significant.
  4. Binomial is specifying a choice of variance and link functions. Variance is binomial and link is logit function.