Understanding inferential statistics

Understanding InferentialUnderstanding Inferential
Statistics—An OverviewStatistics—An Overview
of Important Conceptsof Important Concepts

Important DefinitionsImportant Definitions
 PopulationPopulation
– The complete set of individuals or objects that theThe complete set of individuals or objects that the
investigator is interested in studyinginvestigator is interested in studying
 SampleSample
– A subset of the population that is actually being studiedA subset of the population that is actually being studied
 VariableVariable
– A characteristic of an individual or object that can haveA characteristic of an individual or object that can have
different values (as opposed to a constant)different values (as opposed to a constant)
 Independent variableIndependent variable
– The variable that is systematically manipulated orThe variable that is systematically manipulated or
measured by the investigator to determine its impact onmeasured by the investigator to determine its impact on
the outcome.the outcome.

Important DefinitionsImportant Definitions
 Dependent variableDependent variable
– The outcome variable of interestThe outcome variable of interest
 DataData
– The measurements that are collected by theThe measurements that are collected by the
investigatorinvestigator
 StatisticStatistic
– Summary measure of a sampleSummary measure of a sample
 ParameterParameter
– Summary measure of a populationSummary measure of a population

Two branches of theTwo branches of the
science of statisticsscience of statistics
 Descriptive StatisticsDescriptive Statistics
 Inferential StatisticsInferential Statistics

Descriptive StatisticsDescriptive Statistics
 Concerned with describing orConcerned with describing or
characterizing the obtained samplecharacterizing the obtained sample
datadata
 Use of summary measures—typicallyUse of summary measures—typically
measures of central tendency andmeasures of central tendency and
spreadspread

Descriptive StatisticsDescriptive Statistics
 Measures of central tendency includeMeasures of central tendency include
the mean, median, and modethe mean, median, and mode
 Measures of spread include the range,Measures of spread include the range,
variance, and standard deviation.variance, and standard deviation.
 These summary measures of obtainedThese summary measures of obtained
sample data are called statisticssample data are called statistics

Inferential StatisticsInferential Statistics
 Involves using obtained sampleInvolves using obtained sample
statistics to estimate thestatistics to estimate the
corresponding population parameterscorresponding population parameters
 Most common inference is using aMost common inference is using a
sample mean to estimate a populationsample mean to estimate a population
mean (surveys, opinion polls)mean (surveys, opinion polls)

Planning a studyPlanning a study
 Suppose you were interested inSuppose you were interested in
determining whether treatment X hasdetermining whether treatment X has
an effect on outcome Y—there arean effect on outcome Y—there are
several issues that need to beseveral issues that need to be
addressed so that a sound inferenceaddressed so that a sound inference
can be made from the study resultcan be made from the study result

 What is the population?What is the population?
 How will you select a sample that isHow will you select a sample that is
representative of that population?representative of that population?
– There are many ways to produce aThere are many ways to produce a
sample, but not all of them will lead tosample, but not all of them will lead to
sound inferencesound inference

Sampling StrategiesSampling Strategies
 Probability samples—result whenProbability samples—result when
subjects have a known probability ofsubjects have a known probability of
entering the sampleentering the sample
– Simple random samplingSimple random sampling
– Stratified samplingStratified sampling
– Cluster samplingCluster sampling

 Non-probability samples—result whenNon-probability samples—result when
subjects do not have a knownsubjects do not have a known
probability of entering the sampleprobability of entering the sample
– Quota samplingQuota sampling
– Convenience samplingConvenience sampling

 Probability samples can be made toProbability samples can be made to
be representative of a populationbe representative of a population
 Non-probability samples may or mayNon-probability samples may or may
not be representative of a populationnot be representative of a population
—it may be difficult to convince—it may be difficult to convince
someone that the sample results applysomeone that the sample results apply
to any larger populationto any larger population

Planning a study—Planning a study—
Validity IssuesValidity Issues
Internal validityInternal validity
– The extent to which the observed effectThe extent to which the observed effect
on the dependent variable is actuallyon the dependent variable is actually
caused by the independent variablecaused by the independent variable
– Depends on carefully controlling otherDepends on carefully controlling other
potential causes of an effectpotential causes of an effect
– Excessive control may result in artificialExcessive control may result in artificial
circumstancescircumstances

 External validityExternal validity
– The extent to which one would expect theThe extent to which one would expect the
results from a study to be duplicated inresults from a study to be duplicated in
the real world—in the larger populationthe real world—in the larger population
– Depends on the representativeness of theDepends on the representativeness of the
samplesample
– Also depends on artificiality of the studyAlso depends on artificiality of the study

 Always a tension between maximizingAlways a tension between maximizing
internal vs. external validityinternal vs. external validity
 Efficacy studiesEfficacy studies
– Studies designed to determine the maximumStudies designed to determine the maximum
effectiveness of a treatment under idealeffectiveness of a treatment under ideal
conditions—internal validityconditions—internal validity
 Effectiveness studiesEffectiveness studies
– Studies designed to determine the likely effect ofStudies designed to determine the likely effect of
a treatment in the real world—external validitya treatment in the real world—external validity

 Clinical trials are generally designed toClinical trials are generally designed to
be efficacy trials—highly controlledbe efficacy trials—highly controlled
situations that maximize internalsituations that maximize internal
validityvalidity
 We want to design a study to test theWe want to design a study to test the
effect of treatment X on outcome Y,effect of treatment X on outcome Y,
and try to make sure that anyand try to make sure that any
difference in Y is due to Xdifference in Y is due to X

 The simplest design would involve two groups—anThe simplest design would involve two groups—an
experimental group and a control group—that areexperimental group and a control group—that are
created through random assignment. In addition,created through random assignment. In addition,
neither the subjects nor the experimenter knows theneither the subjects nor the experimenter knows the
group assignment (double blind)group assignment (double blind)
 Two groups to address the possibility of change inTwo groups to address the possibility of change in
Y occurring regardless of treatment XY occurring regardless of treatment X
 Random assignment to address the possibility thatRandom assignment to address the possibility that
the two groups were different to begin withthe two groups were different to begin with
 Blinding to address the possibility that patient orBlinding to address the possibility that patient or
experimenter expectations play a role in theexperimenter expectations play a role in the
outcomeoutcome

 At the end of this study you observe aAt the end of this study you observe a
difference in outcome Y between thedifference in outcome Y between the
experimental group and the control group.experimental group and the control group.
 All of the effort in designing the study withAll of the effort in designing the study with
strict control is for one reason—at the end ofstrict control is for one reason—at the end of
the study you want only two plausiblethe study you want only two plausible
explanations for the observed outcomeexplanations for the observed outcome
– ChanceChance
– Real effect of treatment XReal effect of treatment X

 The reason you want only these two explanations isThe reason you want only these two explanations is
because if you can rule out chance, you canbecause if you can rule out chance, you can
conclude that treatment X must have been theconclude that treatment X must have been the
reason for the difference in outcome Yreason for the difference in outcome Y
 All inferential statistical tests are used to estimateAll inferential statistical tests are used to estimate
the probability of the observed outcome assumingthe probability of the observed outcome assuming
chance alone is the reason for the difference.chance alone is the reason for the difference.
 If there are multiple competing explanations for theIf there are multiple competing explanations for the
observed result, then ruling out chance offers littleobserved result, then ruling out chance offers little
information about the effectiveness of treatment Xinformation about the effectiveness of treatment X

Two ways of usingTwo ways of using
Inferential statisticsInferential statistics
 Hypothesis testing—answering theHypothesis testing—answering the
question of whether or not treatmentquestion of whether or not treatment
X may have no effect on outcome YX may have no effect on outcome Y
 Point estimation—determining whatPoint estimation—determining what
the likely effect of treatment X is onthe likely effect of treatment X is on
outcome Youtcome Y

Hypothesis TestingHypothesis Testing
 The goal of hypothesis testing isThe goal of hypothesis testing is
somewhat twisted—it is to disprovesomewhat twisted—it is to disprove
something you don’t believesomething you don’t believe
 In this case you are trying to disproveIn this case you are trying to disprove
that treatment X has no effect onthat treatment X has no effect on
outcome Youtcome Y
 You start out with two hypothesesYou start out with two hypotheses

 Null Hypothesis (HNull Hypothesis (HOO))
– Treatment X has no effect on outcome YTreatment X has no effect on outcome Y
 Alternative Hypothesis (HAlternative Hypothesis (HAA))
– Treatment X has an effect on outcome YTreatment X has an effect on outcome Y

 If the trial has been carefully controlled,If the trial has been carefully controlled,
there are only two explanations for athere are only two explanations for a
difference between treatment groups—difference between treatment groups—
efficacy of X, and chanceefficacy of X, and chance
 Assuming that the null hypothesis is correct,Assuming that the null hypothesis is correct,
we can use a statistical test to calculate thatwe can use a statistical test to calculate that
the observed difference would havethe observed difference would have
occurred. This is known as the significanceoccurred. This is known as the significance
level, or p-value of the test.level, or p-value of the test.

 P-valueP-value
– The probability of the observed outcome,The probability of the observed outcome,
assuming that chance alone was involvedassuming that chance alone was involved
in creating the outcome. In other words,in creating the outcome. In other words,
assuming the null hypothesis is correct,assuming the null hypothesis is correct,
what is the probability that we would havewhat is the probability that we would have
seen the observed outcome.seen the observed outcome.
– This is only meaningful if chance is theThis is only meaningful if chance is the
only competing plausible explanation.only competing plausible explanation.

 If the p-value is small, meaning theIf the p-value is small, meaning the
observed outcome would have beenobserved outcome would have been
unlikely, we will reject that chanceunlikely, we will reject that chance
played the only role in the observedplayed the only role in the observed
difference between groups anddifference between groups and
conclude that treatment X does in factconclude that treatment X does in fact
have an effect on outcome Yhave an effect on outcome Y
 How small is small?How small is small?

Reality ->Reality ->
DecisionDecision
HHOO is trueis true HHOO is falseis false
Retain HRetain HOO CorrectCorrect
DecisionDecision
Type II ErrorType II Error
((ββ))
(.2, .1)(.2, .1)
Reject HReject HOO Type I ErrorType I Error
((αα))
(.05, .01)(.05, .01)
CorrectCorrect
DecisionDecision

 Rules of thumb for effect sizes:Rules of thumb for effect sizes:
– Small=.2Small=.2
– Medium=.5Medium=.5
– Large=.8Large=.8
 So, if you want 80% chance of detecting aSo, if you want 80% chance of detecting a
medium effect, using a .05medium effect, using a .05 αα value,value,
N= 4(1.96+.84)N= 4(1.96+.84)22
/.5/.52 =2 =
about 126, or 63 inabout 126, or 63 in
each groupeach group

Point EstimationPoint Estimation
 Hypothesis testing can only tell you whetherHypothesis testing can only tell you whether
or not the effect of X is zero, it does not tellor not the effect of X is zero, it does not tell
you how large or small the effect is.you how large or small the effect is.
 Important—a p-value is not an indication ofImportant—a p-value is not an indication of
the size of an effect, it depends greatly onthe size of an effect, it depends greatly on
sample sizesample size
 If you want an estimate of the actual effect,If you want an estimate of the actual effect,
you need confidence intervalsyou need confidence intervals

 Confidence intervals give you an idea ofConfidence intervals give you an idea of
what the actual effect is likely to be in thewhat the actual effect is likely to be in the
population of interestpopulation of interest
 The most common confidence interval isThe most common confidence interval is
95% and gives an upper and lower bound95% and gives an upper and lower bound
on what the effect is likely to be.on what the effect is likely to be.
 The size of the interval depends on theThe size of the interval depends on the
sample size, variability of the measure, andsample size, variability of the measure, and
the degree of confidence you want that thethe degree of confidence you want that the
interval contains the true effect.interval contains the true effect.

 Many people prefer confidence intervals toMany people prefer confidence intervals to
hypothesis testing, because confidencehypothesis testing, because confidence
intervals contain more informationintervals contain more information
 Not only can you tell whether the effectNot only can you tell whether the effect
could be zero (is zero contained in thecould be zero (is zero contained in the
interval of possible effect values?) but youinterval of possible effect values?) but you
also have the entire range of possiblealso have the entire range of possible
values the effect could bevalues the effect could be
 So, a confidence interval gives you all theSo, a confidence interval gives you all the
information of a hypothesis test and a wholeinformation of a hypothesis test and a whole
lot more.lot more.

Choosing the right testChoosing the right test
 Typically one is interested inTypically one is interested in
comparing group means.comparing group means.
 If the outcome is continuous, and oneIf the outcome is continuous, and one
independent variable:independent variable:
– Two groups—t-testTwo groups—t-test
– Three or more groups--ANOVAThree or more groups--ANOVA

 If the outcome is continuous and thereIf the outcome is continuous and there
is more than one independentis more than one independent
variable:variable:
– ANOVA, if all independent variables areANOVA, if all independent variables are
categoricalcategorical
– ANCOVA or multiple linear regression, ifANCOVA or multiple linear regression, if
some independent variables aresome independent variables are
continuouscontinuous

 If the outcome is binary:If the outcome is binary:
– Logistic regressionLogistic regression
 If outcome is time until a specifiedIf outcome is time until a specified
outcome:outcome:
– Survival analysis—Cox proportionalSurvival analysis—Cox proportional
hazards regressionhazards regression

Parametric vs. Non-Parametric vs. Non-
parametric testsparametric tests
 Parametric tests are tests that use a knownParametric tests are tests that use a known
probability distribution to assess the p-value of theprobability distribution to assess the p-value of the
outcome.outcome.
 Most outcomes do fairly closely follow a knownMost outcomes do fairly closely follow a known
probability distribution, and many tests are robust toprobability distribution, and many tests are robust to
violations of distributional assumptions, so theviolations of distributional assumptions, so the
assigned p-value will be fairly accurate in manyassigned p-value will be fairly accurate in many
situationssituations
 For unique situations, such as specializedFor unique situations, such as specialized
outcomes or very skewed distributions, one canoutcomes or very skewed distributions, one can
generate their own probability distribution togenerate their own probability distribution to
calculate a p-value. (jacknife, bootstrap, etc.)calculate a p-value. (jacknife, bootstrap, etc.)
Computers make these techniques very fast andComputers make these techniques very fast and
easyeasy

Understanding inferential statistics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Understanding inferential statistics

Similar to Understanding inferential statistics (20)

More from Hanimarcelo slideshare

More from Hanimarcelo slideshare (14)

Recently uploaded

Recently uploaded (20)

Understanding inferential statistics