Reading Birnbaum's (1962) paper, by Li Chenlu

On The Foundations Of Statistical Inference

by ALLAN BIRNBAUM

LI Chenlu

2013.01.22

1 / 33

content

1 Introduction

2 / 33

content

1 Introduction
2 Part1
Statistical Evidence
The Principle of Suﬃciency
The Principle of Conditionality
The Likelihood Principle

2 / 33

content

1 Introduction
2 Part1
3 Part2
Binary Experiments
Finite Parameter Spaces
More General Parameter Spaces
Bayesian Methods: An Interpretation of the Principle of Insuﬃ-
cient Reason

2 / 33

content

1 Introduction
2 Part1
3 Part2
Binary Experiments
Bayesian Methods: An Interpretation of the Principle of Insuﬃ-
cient Reason
4 Conclusion

2 / 33

Introduction

The paper studies the likelihood principle(LP)and how the
likelihood function can be used to mesure the evidence in the data
about an unkown parameter.
• The main aim of the paper is to show and discuss the implication
of the fact that the LP is a consequence of the concepts of
conditional frames of the reference and suﬃciency.
• The second aim of the paper is to describe how and why these
principles are appropriate ways to characterize statistical evidence
in parametric models for inference purposes.

3 / 33

Part1

1 Statistical Evidence
2 The Principle of Suﬃciency
3 The Principle of Conditionality
4 The Likelihood Principle

4 / 33


An experiment E is deﬁned as E={Ω,S,f(x,θ)},where f is a
density,θ is the unknown parameter,Ω is the parameter space and S
the sample space of outcomes x of E.The likelihood function
determined by an observed outcome is Lx =f(x,θ)

Birnbaum states that the central purpose of the paper is to clarify
the essential structure and properites of statistical evidence,termed
the evidential meaning of (E,x) and denoted by Ev(E,x),in various
instances.
Ev(E,x)is the evidence about θ supplied by x and E

5 / 33


The Principle of Sufficiency (S)
Let E be any experiment,with sample space{x},and let t(x)be any
sufficient statistic. Let E denote the derived experiment,having
the same parameter space,such that when any outcome x of E is
observed the corresponding outcome t = t(x) of E is observe.Then
for each x,Ev (E , x) = Ev (E , t),where t = t(x)

∗ If t(x) is a sufficient statistic for θ, then any inference about θ
should depend on the sample x only through the value t(x)

6 / 33


If x is any specified outcome of any specified experiment E,the
likelihood function determined by x is the function of θ:cf(x,θ),where
c is any positive constant value
If for some positive constant c we have f (x, θ) = cg (y , θ),for
all θ,x and y are said to determine the same likelihood function
If two outcomes x,x of one experiment determine the same
likelihood function,f(x,θ)=cf(x ,θ) for all θ ,then there exists a
sufficient statistic t such that t(x) = t(x )

7 / 33


lemma1
if two outcomes x, x of any experiment E determine the same
likelihood function,then they have the same evidential meaning:
Ev (E , x) = Ev (E , x )

8 / 33


the deﬁniton of the mixture experiment
An experiment E is called a mixture ,with componens{Eh },if it is
mathematically equivalent to a two-stage experiment of the follow-
ing form:

1 An observation h is taken on a random variable H having a
ﬁxed and know distribution G (G does not depend on unknow
parameter values.)
2 The corresponding component experiment Eh is carried out
,yielding an outcomes xh
Thus each outcomes of E is a pair(Eh ,xh )

9 / 33


The Principle of Conditionality (C)
If an experiment E is a mixture G of components{Eh },with possible
outcomes(Eh ,xh ),then

Ev (E , (Eh , xh )) = Ev (Eh , xh )

That is,the evidential meaning of any outcome(Eh ,xh )of any ex-
periment E having a mixture structure is the same as: the eviden-
tial meaning of the corresponding outcome xh of the corresponding
component experiment Eh ,ignoring otherwise the over-all structure
of the original experiment E

10 / 33


Exemple
suppose that two instruments(h=1 or 2) are available for use in
an experment,respectives probabilities p1 =0.73,p2 =0.27 of being s-
elected for use. each instrument gives the observations y=1,or y=0.

Consider the assertion :Ev(E,(E1 ,1))=Ev(E1 ,1),by accepting the
experimental conditions,suppose that E leads to selection of the
ﬁrst instrument(h=1).
.
In the hypothetical situation,it would be prepared to report
either(E1 ,0)or(E1 ,1)as a complete description of the statistical
evidence obtained.

11 / 33


Exemple

For purpose of informative inference ,if y=1 is observed with the
ﬁrst instument,then the report (E1 ,1) seems to be an appropriate
and complete description of the statistical evidence obtained.
and the”more complete” report(E,(E1 ,1))seems to diﬀer from it
only by the addition of recognizably redundant elements irrelevant
to the evidential meaning and evidential interpretation of this
outcomes of E.

12 / 33


The Likelihood Principle (L)
If E and E are any two experiments with a common parameter
space,and if x and y any respective outcomes which determine
likelihood functions satisfying f (x, θ) = cg (y , θ) for some positive
constant c=c(x,y) and all θ,thenEv (E , x) = Ev (E , y )

That is ,the evidential meaning Ev(E,x)of any outcome x
of any experiment E is characterized completely by the likelihood
function cf(x,θ),and is otherwise independent of the structure of
(E,x)

13 / 33


Lemma2
(S)and(C)⇐⇒(L)

Prove⇐:
•That(L)implies(C)follows immediately from the fact that in all cas-
es the likelihood functions determined respectively by (E,(Eh ,xh ))and
(Eh ,xh )are proportional.
•That(L)implies(S)follows immediately from Lemma 1.

14 / 33

Lemma2
(S)and(C)⇐⇒(L)

Prove⇒:
Let E and E denote any two experiments,having the same parameter spaceΩ={θ},and
represented by probability density functions f(x,θ),g(y,θ)on their respective sample
spaces S={x},S ={y}.consider the mixture experiment E whose components are just
E and E ,taken with equal probabilities.let z denote the sample point of E ,and let C
denote any set of points z;then C=A B,where A⊂ S and B⊂ S
1 1
Prob(Z∈|θ)= 2 Prob(A|θ,E)+ 2 Prob(B|θ,E )
the probability density function representing E be denoted by:

1

2
f (x, θ) if z=x∈ S,
h=
1

2
g (y , θ) if z=y∈ S

From(C),it follows that:Ev(E ,(E,x))=Ev(E,x), for each x∈ S
. Ev(E ,(E ,y))=Ev(E ,y),for each y∈ S (a)
15 / 33


Prove⇒:
Let x y be any two outcomes of E,E respectively which determine the same likelihood
function : f(x,θ)=cg(y,θ) for all θ.
where c is some positive constant.Then we have h(x,θ)=ch(y,θ) for all θ,
the two outcomes(E,x),(E ,y)of E determine the same likelihood function.Then it fol-
lows from(S)and Lemma1: Ev(E ,(E,x))=Ev(E ,(E ,y)) (b)
from(a)and(b)it follows that:

Ev(E,x)=Ev(E ,y).

The consequence states that any two outcomes x,y of any two experiments E,E (with
the same parameter space)have the same evidential meaning if they determine the same
likelihood function.

16 / 33


impact of the principle
•The implication⇒ is the most important part of the equiva-
lence,because this means that if you do not accept(L),you have to
discard either(S)or(C),two widely accepted principles.

•The most important consequence of (L) seems to be that
evidential measures based on a speciﬁc experimental frame of refer-
ece(like p-values and conﬁdence levels) are somewhat unsatisfactory.

• In other words, (L) eliminates the need to consider the
sample space or any part of it once the data are observed.Lemma
2 truly was a ”breakthrough” in the foundations of statistical
inference and made (L) stand on its own ground,independent of a
Bayesian argument.

17 / 33

Part2

1 Binary Experiments
2 Finite Parameter Spaces
3 More General Parameter Spaces
4 Bayesian Methods

18 / 33

Binary Experiments

let Ω=(θ1 ,θ2 ).In this case,(L)means that all information lie in the
likelihood ratio, λ(x)=f(x,θ2 )/f(x,θ1 ).
The question is now what evidential meaning we can attach to the
numberλ(x)?
To answer this,Birnbaum ﬁrst considers a binary experiment in
which the sample space has only two points.denoted(+)and(-),and
1
such that p(+|θ1 )=p(-|θ2 )=α for an α ≤ 2 . Such an experiment
is called a symmetric simple binary experiment and is characterized
by the”error” probability α.

19 / 33

Binary Experiments

For such an experiment,λ(+)=(1-α)/α ≥ 1 ,α=1/(1+λ(+))
andλ(-)= α/(1-α) ≤ 1.The important point now is that according
to (L),two experiments with the same value of λ have the same
evidential meaning about the value of α .
Therefore,the evidential meaning of λ(x)≥1 from any binary
experiment E is the same as the evidential meaning of the
(+)outcome from a symmetric simple binary experiment with
α(x)=1/(1+λ(x)). α(x)is called the intrinsic signigicance level and
is measure of evidence that satisﬁes(L).

20 / 33


If E is any experiment with a parameter space containing only a
finite number k of points,θ=i=1,2...k. Any observed outcome x of
E determines a likelihood function L(i)=cf(x,i),i=1,....k. We can
assume that k L(i)=1
i=1
Any experiment E with a finite sample space j=1,...m, and finite
parameter space is represented by a stochastic matrix
 
p11 . . . p1m
E = (Pij ) =  . .. . 
 . . 
. . .
pk1 . . . pkm
m
where j=1 Pij =1 and pij =Prob[j|i],for each i,j.Here the i th row is the discrete
probability distribution pij given by parameter value i,and the j th column is
proportional to the likelihooh function L(i)=L(i|j)=cpij ,i=1,...k,determined by
outcome j.

21 / 33

Finite Parameter Spaces:Qualitative evidential interpretation

Exemple1:
experiment with only two points j=1,2.
we can deﬁne Prob[j=1|i]=L(i) and Prob[j=2|i]=1-L(i).For i=1,...k.
for exemple,the likelihood function L(i)= 1 ,i=1,2,3 represents the
3
possible outcome j=1 of the experiment
 1 2 
3 3
 
1 2
 
E=
 3 3


 
1 2
3 3

•Since this experiment gives the same distribution on the two-point sample space under
esch hypothesis,it is completely uninformative.According to the likelihood principle,we
can therefore conclude that the given likelihood function has a simple evidential inter-
pretation,regardless of the structure of the experiment,that is represents a completely
uninformative outcome.
22 / 33


Exemple2:
The likelihood function( 1 , 1 ,0)(that is ,L(1)=L(2)= 1 ,L(3)=0,on the
2 2 2
3-points parameter spacei=1,2,3.)
this represents the possible outcome j=1 of the experiment
 1 1 
2 2
 
1 1
 
E=
 2 2


 
0 1

•this outcome of E is impossible under i=3,and hence supports without risk of error the
conclusion that i=3
• E prescribes identical distributions under i=1,and 2.and hence the experiment E ,and
each of its possible outcomes ,is completely uninformative as between i=1,and 2.

23 / 33


Exemple 3:some likelihood functions on a given parameter space can
be compared and ordered in a natural way
Consider the likelihood functions (0.8,0.1,0.1) and (0.45,0.275,0.275)
The interpretation that the ﬁrst is more informative than the second is supported
as follows:  
0.8 0.2
E =  0.1 0.9  = (Pij )
 

0.1 0.9

when outcome j=2 of E is observed,we report w=1 with probability 1 ,w=2 with
2
probability 1 .when outcome j=1 of E is observed, the report w=1 is given.
2

 
0.9 0.1
 
E =  0.55
 0.45  = (Piw )

0.55 0.45

The experiment E is less informative than E

24 / 33

Finite Parameter Spaces:Intrinsic confidence methods

Exemple 4
consider the likelihood function(0.9,0.09,0.01)defined on the param-
eter space i=1,2,3.This represents the possible outcome j=1 of the
experiment.
 
0.9 0.01 0.09
 
 
E =  0.09 0.9 0.01  = (Pij )
 
 
0.01 0.09 0.9

•In this experiment,a confidence set estimator of the parameter i is
given by taking,for each possible outcomes j,the two values of i
having greatest likelihoods L(i | j).
•we can verify that under each value of i, the probability is 0.99
that the confidence sets determined in this way have confidence
coefficient 0.99.
25 / 33

Finite Parameter Spaces:Intrinsic confidence methods

The general form of the intrinsic confidence methods
for any likelihood function L(i) defined on a finite parameter space
i=1,...k,and such that k L(i)=1
i=1
if there is a unique least likely value i1 of i,let c1 =1-L(i1 ).Then the remaining
(k-1)parameter points will be called an intrinsic confidence set with intrinsic confidence
coefficient c1 ;If there is a pair of values of i,say i1 ,i2 ,with likelihoods strictly smaller
than those of the remainning (k-2) points,call the latter set of points an intrinsic
confidence set,with intrinsic confidence level c2 =1-L(i1 )-L(i2 ).and so on.

 
L(1) L(k) L(k − 1) . . .
 
 L(2) L(1) L(k) . . . 
 
 
E =  L(3) L(2) L(1) . . .  = (Pij )
 

 .
. .
. .
.



 . . . 

L(k) L(k − 1) L(k − 2) . . .

26 / 33


on the finite parameter space:
•For finite parameter spaces,significance levels, confidence sets,and
confidence levels can be based on the observed Lx (θ),hence
satisfying(L),defined as regular such methods and concepts for a
constructed experiment with a likelihood function identical to Lx (θ).

•Therefore,in the case of finite parameter spaces,a clear and
logical evidential interpretation of the likelihood function can be
given through intrinsic methods and concepts.

27 / 33


• This section deals mainly with the case where Ω is the real line.
Given E,x,and Lx (θ),a hypothetical experiment E consisting of a
single observation of Y with density g(y,θ)=cLx (θ-y)is then
constructed.
• Then(E,x)has the same likelihood function as (E ,0),and(L)
implies that the same inference should be used in (E,x)as in (E ,0).
For exemple,if a regular(1-α) conﬁdence interval in E is used, then
this interval estimate(for y=0)should be the one used also for (E,x)
and is called a (1-α) intrinsic conﬁdence interval for (E,x).

28 / 33


As a general comment,Birnbaum emphasizes that intrinsic methods
and concepts can ,in light of (L),be nothing more than methods of
expressing evidential meaning already implicit in Lx (θ)itself.
In the discussion,Birnbaum does not recommend intrinsic methods
as statistical methods in practice.The value of these methods is
conceptual,and the main use of intrinsic concepts is to show that
likelihood functions as such are evidentially meaningful.

29 / 33

Bayesian Methods:An Interpretation of the principle of
Insuﬃcient Reason

Birnbaum views the Bayes approach as not directed to informative
inference,but rather as a way to determine an appropriate ﬁnal
synthesis of availabe,information based on prior availbale
information and data.It is observed that in determining the postrior
distribution ,the contribution of the data and E is L x (θ)only,so the
Bayes approach implies(L).

30 / 33

Conclusion

•Birnbaum’s main result,that LP follows from sufficiency and
conditionality principles that most statisticians accept,must be
regarded as one of the deepest theorems of theoretical statistics,
yet the proof is unbelievably simple.
•The result had a decisive influence on how many statisticians
came to view the likelihood function as a basic quantity in
statistical analysis.
•It has also affected in a general way how we view the science of
statistics.Birnbaum introduced principles of equivalence within and
between experiments, showing various relationships between these
principles.This made it possible to disccuss the different concepts
from alternative viewpoints.

31 / 33

References

Allan Birnbaum ,”On the foundations of statistical inference:binary
experiment” Institute of Mathematical Sciences,New York Uni-
versity
Daniel Steel ,”Beyasian conﬁrmation theory and the likelihood
principle” Michigan State University
Royall,R” Statistical Evidence:A likelihood paradigm” Chapman
and Hall,London
Jan F Bjφrnstad,” Breakthroughs in Statistics Volume I -foundationa
ans Basic Theory” The university of Trondheim

32 / 33

the end!thank you!

33 / 33

Reading Birnbaum's (1962) paper, by Li Chenlu

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Reading Birnbaum's (1962) paper, by Li Chenlu

Ähnlich wie Reading Birnbaum's (1962) paper, by Li Chenlu (20)

Mehr von Christian Robert

Mehr von Christian Robert (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Reading Birnbaum's (1962) paper, by Li Chenlu