SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Categorical
Data Analysis
KRISHNAKUMAR D
AVMC&H
Categorical Data Analysis
Text Book: “ An
Introduction to
Categorical Data
Analysis”
By “ Alan Agresti”
introduction CDA.pptx
Scales Of Measurement
 Four Scales Of Measurement:
 Nominal : No Order (e.g)- gender
 Ordinal: Order (e.g)- Income Status
 Ratio: Equal intervals with no True
0 (e.g): Height
 Interval: Equal intervals with True 0
(e.g): Temperature
Categorical Data Analysis
Categorical Data: Analysis
Strategies
 Hypothesis Testing: Is there any association?
 Chi Square Test, Fishers Exact test, etc
 Chapter- 1, 2 , 3.
 Modeling: What is the nature of Association?
 Logistic Regression, Log linear Models
 Chapter- 4, 5, 6,7
What is categorical data?
The measurement scale for the response
consists of a number of categories
Variable Measurement Scale
Farm system Organic & non organic
Education Good , average, poor
Food texture
Very soft, Soft, Hard,
Very hard
Nutrition
status
Grade 1, 2, 3
KAP- public
health
“yes” or “No”
Data Analysis considered:
 Response variable(s) –( Dependent Variable or Y variable)
is categorical
 Explanatory variable(s) –(Independent or X variable)
may be categorical or continuous or both
Example: Diabetes (categorical response) depend on the
explanatory variables?
Sex (categorical)
Age (continuous)
Example:
Y = Diabetes( Present, absent/ Normal, mild , moderate,
severe- Independent)
X’s = Income, Education, gender, age, Sedentary life style,
Hereditary etc.
Important Note
 Methods designed for nominal variables give the same results no
matter how the categories are listed
 Methods for ordinal variables utilize the category ordering. Whether we
list the categories from low to high or from high to low is irrelevant in
terms of substantive conclusions, but results would change if the
categories were reordered in any other way.
 Methods designed for ordinal variables cannot be used with nominal
variables
 However, Methods designed for nominal variables can be used with
nominal or ordinal variables
 If used, it results in serious loss of power.
•nominal < ordinal < interval
Probability Distributions
 For continuous response variable – Normal distribution
 For Categorical response variable – Binomial
distribution or multinomial distribution
Binomial Distribution
 n Bernoulli trials - two possible outcomes for each
(success, failure)
 ∏ = P(success), 1 − ∏ = P(failure) for each trial
 Y = number of successes out of n trials
 Trials are independent
Y has binomial distribution
, y= 0,1, 2,…, n
Example: Binomial
Distribution
 Vote (Democrat, Republican)
 Suppose = prob(Democrat) = 0.50.
For n = 3 persons, let y = number of
Democratic votes
then, p(0) = 0.125
p(1) = 0.375
p(2)= 0.375
p(3) = 0.125
Multinomial distribution
 When each trial has >2 possible outcomes, no of
outcomes in various categories have multinomial
distribution.
 Let c denote the number of outcome categories
 The binomial distribution is the special case with c = 2
categories.
Properties of the
Multinomial Experiment
1. The experiment consists of n identical trials.
2. There are k possible outcomes to each trial. These
outcomes are called classes, categories, or cells.
3. The probabilities of the k outcomes, denoted by p1,
p2,…, pk, remain the same from trial to trial,where
p1 + p2 + … + pk = 1.
4. The trials are independent.
5. The random variables of interest are the cell
counts, n1, n2, …, nk, of the number of
observations that fall in each of the k classes.
Statistical Inference for a
proportion
 The parameters of a Binomial and Multinomial
distribution are estimated using the sample data.
 Methods of estimation is “Maximum Likelihood
Estimation” (ML Estimation)
 The likelihood function(denoted by l) is the probability
of the observed data, expressed as a function of the
parameter value.
Contd…
Example:
Consider a Binomial case, n = 2, observe y = 1
 The likelihood function defined for between 0 and 1
 If = 0, probability is l (0) = 0 of getting y = 1
 If = 0.5, probability is l(0.5) = 0.5 of getting y = 1
Maximum Likelihood
 The maximum likelihood (ML) estimate is the
parameter value at which the likelihood function takes
its maximum.
 Example
l( ) = 2(1 − ) maximized at ˆ = 0.5
 i.e., y = 1 in n = 2 trials is most likely if = 0.5.
ML estimate of is ˆ = 0.50.
 In general, ML estimate of is p= y/n.
Binomial Likelihood functions for y=0
successes and y=6 successes in n
=10 trials
The result y = 6 in n = 10 trials is more likely to
occur when π = 0.60 than when π equals any other value.
Significance Test for
binomial parameter
 A significance test merely indicates whether a
particular value for a parameter is plausible.
 The ML estimator for the Binomial Distribution is the
sample proportion , p.
Confidence interval and
significance tests
 Three different test methods to find CI and test
statistic:
 Wald Method
 Likelihood-ratio method
 Score method
Wald Test
 Let be the ML estimator. Then the Wald Test
statistic to test is given by
Where SE is the Standard Error of the ML estimate
and this follows standard normal distribution and Z2
follows Chisquare distribution with d.f = 1.
 The z or chi-squared test using this test statistic is
called a Wald test.
Likelihood Ratio Test
This alternative test uses the likelihood function
through the ratio of two maximizations of it:
1. the maximum over the possible parameter values
that assume the null hypothesis,
2. the maximum over the larger set of possible
parameter values, permitting the null or the
alternative hypothesis to be true.
Contd..
Let l0 denote the maximized value of the likelihood
function under the null hypothesis, and let l1 denote
the maximized value more generally.
For instance, when there is a single parameter β, l0 is
the likelihood function calculated at β0, and 1 is the
likelihood function calculated at the ML estimate ˆ β.
Then l1 is always at least as large as l0, because l1
refers to maximizing over a larger set of possible
parameter values.
Remarks
 For ordinary regression models assuming a normal
distribution for Y , the three tests provide identical results.
 In other cases, for large samples they have similar
behaviour when H0 is true.
 Wald CI often has poor performance in categorical data
analysis unless n quite large.
 For inference about proportions, score method tends to
perform better than Wald method, in terms of having
actual error rates closer to the advertised levels.
 In practice, Wald inference is popular because of
simplicity, ease of forming it using software output
Thank you

Más contenido relacionado

Ähnlich wie introduction CDA.pptx

Presentation chi-square test & Anova
Presentation   chi-square test & AnovaPresentation   chi-square test & Anova
Presentation chi-square test & AnovaSonnappan Sridhar
 
inferentialstatistics-210411214248.pdf
inferentialstatistics-210411214248.pdfinferentialstatistics-210411214248.pdf
inferentialstatistics-210411214248.pdfChenPalaruan
 
Test of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testTest of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testdr.balan shaikh
 
What So Funny About Proportion Testv3
What So Funny About Proportion Testv3What So Funny About Proportion Testv3
What So Funny About Proportion Testv3ChrisConnors
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learningSadia Zafar
 
Week 7 spss 2 2013
Week 7 spss 2 2013Week 7 spss 2 2013
Week 7 spss 2 2013wawaaa789
 
Testing hypothesis
Testing hypothesisTesting hypothesis
Testing hypothesisAmit Sharma
 
PAGE O&M Statistics – Inferential Statistics Hypothesis Test.docx
PAGE  O&M Statistics – Inferential Statistics Hypothesis Test.docxPAGE  O&M Statistics – Inferential Statistics Hypothesis Test.docx
PAGE O&M Statistics – Inferential Statistics Hypothesis Test.docxgerardkortney
 
Statistical Significance Tests.pptx
Statistical Significance Tests.pptxStatistical Significance Tests.pptx
Statistical Significance Tests.pptxAldofChrist
 

Ähnlich wie introduction CDA.pptx (20)

Hypothesis
HypothesisHypothesis
Hypothesis
 
Presentation chi-square test & Anova
Presentation   chi-square test & AnovaPresentation   chi-square test & Anova
Presentation chi-square test & Anova
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
 
inferentialstatistics-210411214248.pdf
inferentialstatistics-210411214248.pdfinferentialstatistics-210411214248.pdf
inferentialstatistics-210411214248.pdf
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Test of significance
Test of significanceTest of significance
Test of significance
 
BS 723_Class 6(5).pptx
BS 723_Class 6(5).pptxBS 723_Class 6(5).pptx
BS 723_Class 6(5).pptx
 
Test of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testTest of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square test
 
Stat topics
Stat topicsStat topics
Stat topics
 
What So Funny About Proportion Testv3
What So Funny About Proportion Testv3What So Funny About Proportion Testv3
What So Funny About Proportion Testv3
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
 
Week 7 spss 2 2013
Week 7 spss 2 2013Week 7 spss 2 2013
Week 7 spss 2 2013
 
biostat__final_ppt_unit_3.pptx
biostat__final_ppt_unit_3.pptxbiostat__final_ppt_unit_3.pptx
biostat__final_ppt_unit_3.pptx
 
TEST OF SIGNIFICANCE.pptx
TEST OF SIGNIFICANCE.pptxTEST OF SIGNIFICANCE.pptx
TEST OF SIGNIFICANCE.pptx
 
Data science
Data scienceData science
Data science
 
Testing hypothesis
Testing hypothesisTesting hypothesis
Testing hypothesis
 
K.A.Sindhura-t,z,f tests
K.A.Sindhura-t,z,f testsK.A.Sindhura-t,z,f tests
K.A.Sindhura-t,z,f tests
 
PAGE O&M Statistics – Inferential Statistics Hypothesis Test.docx
PAGE  O&M Statistics – Inferential Statistics Hypothesis Test.docxPAGE  O&M Statistics – Inferential Statistics Hypothesis Test.docx
PAGE O&M Statistics – Inferential Statistics Hypothesis Test.docx
 
Meta analysis with R
Meta analysis with RMeta analysis with R
Meta analysis with R
 
Statistical Significance Tests.pptx
Statistical Significance Tests.pptxStatistical Significance Tests.pptx
Statistical Significance Tests.pptx
 

Mehr von Krishna Krish Krish (20)

VUR & Reflux Nephropathy.pptx
VUR & Reflux Nephropathy.pptxVUR & Reflux Nephropathy.pptx
VUR & Reflux Nephropathy.pptx
 
DIALYSIS IN PREGNANCY.ppsx
DIALYSIS IN PREGNANCY.ppsxDIALYSIS IN PREGNANCY.ppsx
DIALYSIS IN PREGNANCY.ppsx
 
RESPIRATORY SYSTEM.pptx
RESPIRATORY  SYSTEM.pptxRESPIRATORY  SYSTEM.pptx
RESPIRATORY SYSTEM.pptx
 
KIDNEY DISORDER IN PREGNANCY.pptx
KIDNEY DISORDER IN PREGNANCY.pptxKIDNEY DISORDER IN PREGNANCY.pptx
KIDNEY DISORDER IN PREGNANCY.pptx
 
Hyperoxaluria.pptx
Hyperoxaluria.pptxHyperoxaluria.pptx
Hyperoxaluria.pptx
 
Renal stones.pptx
Renal stones.pptxRenal stones.pptx
Renal stones.pptx
 
Tibia (Shinbone) Shaft Fractures.pptx
Tibia (Shinbone) Shaft Fractures.pptxTibia (Shinbone) Shaft Fractures.pptx
Tibia (Shinbone) Shaft Fractures.pptx
 
Chisquared test.pptx
Chisquared test.pptxChisquared test.pptx
Chisquared test.pptx
 
Two – Way Contingency tables.ppt
Two – Way Contingency tables.pptTwo – Way Contingency tables.ppt
Two – Way Contingency tables.ppt
 
water management (1).pptx
water management (1).pptxwater management (1).pptx
water management (1).pptx
 
tracheostomy.pptx
tracheostomy.pptxtracheostomy.pptx
tracheostomy.pptx
 
Endotracheal tubes.pptx
Endotracheal tubes.pptxEndotracheal tubes.pptx
Endotracheal tubes.pptx
 
Water resources management in India.pptx
Water resources management in India.pptxWater resources management in India.pptx
Water resources management in India.pptx
 
Integrated Industrial Water Management –.pptx
Integrated Industrial Water Management –.pptxIntegrated Industrial Water Management –.pptx
Integrated Industrial Water Management –.pptx
 
Syncope1.pptx
Syncope1.pptxSyncope1.pptx
Syncope1.pptx
 
Nasopharyngeal Airway.pptx
Nasopharyngeal Airway.pptxNasopharyngeal Airway.pptx
Nasopharyngeal Airway.pptx
 
Oropharyngeal Airway.pptx
Oropharyngeal Airway.pptxOropharyngeal Airway.pptx
Oropharyngeal Airway.pptx
 
Basic Ventilation
Basic VentilationBasic Ventilation
Basic Ventilation
 
Normal Childbirth.pptx
Normal Childbirth.pptxNormal Childbirth.pptx
Normal Childbirth.pptx
 
PPH Postpartum hemorrhage.pptx
PPH Postpartum hemorrhage.pptxPPH Postpartum hemorrhage.pptx
PPH Postpartum hemorrhage.pptx
 

Último

Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxShammiRai3
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performancePrithaVashisht1
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfdcphostmaster
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsGain Insights
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMMarco Wobben
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseThinkInnovation
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfmxlos0
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxEmmanuel Dauda
 

Último (20)

Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptx
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performance
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdf
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded Analytics
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IM
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
Target_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110millionTarget_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110million
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdf
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potx
 

introduction CDA.pptx

  • 2. Categorical Data Analysis Text Book: “ An Introduction to Categorical Data Analysis” By “ Alan Agresti”
  • 4. Scales Of Measurement  Four Scales Of Measurement:  Nominal : No Order (e.g)- gender  Ordinal: Order (e.g)- Income Status  Ratio: Equal intervals with no True 0 (e.g): Height  Interval: Equal intervals with True 0 (e.g): Temperature Categorical Data Analysis
  • 5. Categorical Data: Analysis Strategies  Hypothesis Testing: Is there any association?  Chi Square Test, Fishers Exact test, etc  Chapter- 1, 2 , 3.  Modeling: What is the nature of Association?  Logistic Regression, Log linear Models  Chapter- 4, 5, 6,7
  • 6. What is categorical data? The measurement scale for the response consists of a number of categories Variable Measurement Scale Farm system Organic & non organic Education Good , average, poor Food texture Very soft, Soft, Hard, Very hard Nutrition status Grade 1, 2, 3 KAP- public health “yes” or “No”
  • 7. Data Analysis considered:  Response variable(s) –( Dependent Variable or Y variable) is categorical  Explanatory variable(s) –(Independent or X variable) may be categorical or continuous or both Example: Diabetes (categorical response) depend on the explanatory variables? Sex (categorical) Age (continuous) Example: Y = Diabetes( Present, absent/ Normal, mild , moderate, severe- Independent) X’s = Income, Education, gender, age, Sedentary life style, Hereditary etc.
  • 8. Important Note  Methods designed for nominal variables give the same results no matter how the categories are listed  Methods for ordinal variables utilize the category ordering. Whether we list the categories from low to high or from high to low is irrelevant in terms of substantive conclusions, but results would change if the categories were reordered in any other way.  Methods designed for ordinal variables cannot be used with nominal variables  However, Methods designed for nominal variables can be used with nominal or ordinal variables  If used, it results in serious loss of power. •nominal < ordinal < interval
  • 9. Probability Distributions  For continuous response variable – Normal distribution  For Categorical response variable – Binomial distribution or multinomial distribution
  • 10. Binomial Distribution  n Bernoulli trials - two possible outcomes for each (success, failure)  ∏ = P(success), 1 − ∏ = P(failure) for each trial  Y = number of successes out of n trials  Trials are independent Y has binomial distribution , y= 0,1, 2,…, n
  • 11. Example: Binomial Distribution  Vote (Democrat, Republican)  Suppose = prob(Democrat) = 0.50. For n = 3 persons, let y = number of Democratic votes then, p(0) = 0.125 p(1) = 0.375 p(2)= 0.375 p(3) = 0.125
  • 12. Multinomial distribution  When each trial has >2 possible outcomes, no of outcomes in various categories have multinomial distribution.  Let c denote the number of outcome categories  The binomial distribution is the special case with c = 2 categories.
  • 13. Properties of the Multinomial Experiment 1. The experiment consists of n identical trials. 2. There are k possible outcomes to each trial. These outcomes are called classes, categories, or cells. 3. The probabilities of the k outcomes, denoted by p1, p2,…, pk, remain the same from trial to trial,where p1 + p2 + … + pk = 1. 4. The trials are independent. 5. The random variables of interest are the cell counts, n1, n2, …, nk, of the number of observations that fall in each of the k classes.
  • 14. Statistical Inference for a proportion  The parameters of a Binomial and Multinomial distribution are estimated using the sample data.  Methods of estimation is “Maximum Likelihood Estimation” (ML Estimation)  The likelihood function(denoted by l) is the probability of the observed data, expressed as a function of the parameter value.
  • 15. Contd… Example: Consider a Binomial case, n = 2, observe y = 1  The likelihood function defined for between 0 and 1  If = 0, probability is l (0) = 0 of getting y = 1  If = 0.5, probability is l(0.5) = 0.5 of getting y = 1
  • 16. Maximum Likelihood  The maximum likelihood (ML) estimate is the parameter value at which the likelihood function takes its maximum.  Example l( ) = 2(1 − ) maximized at ˆ = 0.5  i.e., y = 1 in n = 2 trials is most likely if = 0.5. ML estimate of is ˆ = 0.50.  In general, ML estimate of is p= y/n.
  • 17. Binomial Likelihood functions for y=0 successes and y=6 successes in n =10 trials The result y = 6 in n = 10 trials is more likely to occur when π = 0.60 than when π equals any other value.
  • 18. Significance Test for binomial parameter  A significance test merely indicates whether a particular value for a parameter is plausible.  The ML estimator for the Binomial Distribution is the sample proportion , p.
  • 19. Confidence interval and significance tests  Three different test methods to find CI and test statistic:  Wald Method  Likelihood-ratio method  Score method
  • 20. Wald Test  Let be the ML estimator. Then the Wald Test statistic to test is given by Where SE is the Standard Error of the ML estimate and this follows standard normal distribution and Z2 follows Chisquare distribution with d.f = 1.  The z or chi-squared test using this test statistic is called a Wald test.
  • 21. Likelihood Ratio Test This alternative test uses the likelihood function through the ratio of two maximizations of it: 1. the maximum over the possible parameter values that assume the null hypothesis, 2. the maximum over the larger set of possible parameter values, permitting the null or the alternative hypothesis to be true.
  • 22. Contd.. Let l0 denote the maximized value of the likelihood function under the null hypothesis, and let l1 denote the maximized value more generally. For instance, when there is a single parameter β, l0 is the likelihood function calculated at β0, and 1 is the likelihood function calculated at the ML estimate ˆ β. Then l1 is always at least as large as l0, because l1 refers to maximizing over a larger set of possible parameter values.
  • 23. Remarks  For ordinary regression models assuming a normal distribution for Y , the three tests provide identical results.  In other cases, for large samples they have similar behaviour when H0 is true.  Wald CI often has poor performance in categorical data analysis unless n quite large.  For inference about proportions, score method tends to perform better than Wald method, in terms of having actual error rates closer to the advertised levels.  In practice, Wald inference is popular because of simplicity, ease of forming it using software output