SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Lecture 4: NBERMetrics.
                          Ariel Pakes
                         July 17, 2012




  On The Use of Moment Inequalities in Discrete
                Choice Models.


   From the point of view of consumer theory the work on es-
timating preference parameters from moment inequalities is a
follow up to theory literature on “revealed preference” begun
by Paul Samuelson (1938, Economica); see the review by Hal
Varian (2005, in Samuelsonian Economics and the 21st Cen-
tury.). The difference between the theory treatment of revealed
preference, and the approach I will sketch here is our interest
in estimation and therefore our attention to the sources of dis-
turbances that enter the revealed preference approach when it
is brought to data. The approach used here dates to Pakes,
Porter, Ho and Ishii (2011, working paper ) henceforth PPHI,
and Pakes (2011, Econometrica).
   It differs from standard empirical models of choice by work-
ing directly with the inequalities which define optimal behavior.
I.e. we see how far we can get by just assuming that we know a

                               1
counterfactual which agents considered and discarded, and then
assuming that, at least on average, the utility from the actual
choice should be larger than the utility from the counterfac-
tual. Only certain values of the parameter vector will make
this statement true, and we accept any parameter values which
insure that the average difference is positive.
   Typically if one value satisfies the inequalities, so will values
close to it. So there will be many values of the parameter vec-
tor that satisfy the inequalities and we will obtain a “set” of
acceptable values. If the model is correct, what we will know is
that the true value is in the set (at least asymptotically).
   I have found it easiest to explain what is going on in this
approach by starting out a simple single agent example.

Single Agent Example: Due to M. Katz (2007) Es-
timate the costs shoppers assign to driving to a supermarket.
This is a topic of importance to the analysis of zoning regula-
tions, public transportation projects, and the like. Moreover, it
has proven difficult to analyze empirically with standard choice
models because of the complexity of the choice set facing con-
sumers: all possible bundles of goods at all possible supermar-
kets.
   We shall see that though large choice sets make discrete
choice analysis difficult, the larger the choice set the easier it is
to use the moment inequality approach.
   Assume that the agents’ utility functions are additively sep-
arable functions of;
  • utility from the basket of goods bought,
                                 2
• expenditure on that basket, and
  • drive time to the supermarket.
   Let di = (bi, si) the vector of decisions made by the agent,
bi = b(di) be the basket of goods bought, si = s(di) is the store
chosen, and zi are individual characteristics


      U (di, zi, θ) = W (bi, zi, θb) − e(bi, si) − θidt(si, zi),
where W (·) is the utility from the bundle of goods bought,
e(·) provides expenditure, dt(·) provides drive time, and I have
used the free normalization that comes from the equivalence of
affine transforms on expenditure (so the cost of drive time are
in dollars).

Standard Discrete Choice Model’s For This Prob-
lem. There are really two aspects that make a behavioral (or
structural) model of this decision difficult, and two approaches
to the analysis.
   A standard full information structural model would assume
that the agent knows the prices of all bundles of goods at all
stores, and makes its choice by
  • computing the basket that maximizes the utility at each
    store, and substituting that into the U (·), and then
  • maximizing over stores.
This has both behavioral and computational aspects which are
troubling.

                                  3
• Behaviorally it endows the agent with both; an incredible
    amount of information, and massive computational abili-
    ties.
  • Computationally it requires specification of and computa-
    tion of the value of a huge choice set, and if the U (bi, θ)
    is specified in a reasonably rich way, simultaneous estima-
    tion of a large number of parameters. Both of these make
    obtaining consistent estimates a difficult task.
  Probably a more realistic way of looking at the problem
would be through the lenses of a two period model.
  • In the first period the agent choses which store to travel to,
    perhaps without full knowledge of the prices at each store.
  • In the second period the agent, having gone to the store
    chosen, choses the bundle of goods it wishes to buy from
    the goods available at the store.
This does not ameliorate the computational complexity of the
problem (see below). However it does get rid of the behavioral
problem. On the other hand, it replaces the behavioral problem
with a specification problem. That is to proceed in this way the
econometrician
  • needs to specify the agent’s prior probability for each pos-
    sible price at each store, and
  • then compute the integral of the bundle chosen given each
    possible price vector at each store
Econometricians never see priors, and seldom have knowledge
of what the agent can condition on when it formulates its prior.
                               4
So the allowance for incomplete information replaces the behav-
ioral problem with a specification problem. Moreover if any-
thing, it adds to the computational complexity of the analysis
by requiring the computation of expectations1.

Behavioral Model vs. Descriptive Summary Statistics. I
have focused here on estimating the parameters of the behav-
ioral model. I did this because the interest was in a parameter
vector which was then to be used for policy analysis (which pre-
sumably involves counterfactuals). If we were not interested in
the behavioral parameters, but rather were interested in sum-
marizing the data on store choice in an intuitive way (say a way
that subsequent research might use to build stylized models for
the choice) , there is a form of a standard discrete choice model
that makes sense (at least for the single agent problems we are
concerned with today). We would
   • group choices of baskets of goods somehow,
   • project the utility from the basket down on “variables of in-
     terest” which leaves a residual which is orthogonal to those
     variables.
   • evaluate the possible baskets at different stores.
Then if we were willing to assume the distribution of the residual
satisfied some parametric form we would be back to the familiar
discrete choice mdoel. However
   • the coefficients could not be relied upon to provide an ad-
     equate approximation to what would happen were we to
   1
   One way of getting around these problems is to obtain survey measures of expectations (see the
Manski, 2001, review in Econometrica), but this is often impracticle.


                                               5
change a variable of interest (we come back to an example
    of this below), and
  • it is not clear what one is doing in using a more complicated
    discrete choice model, for example a random coefficients
    model, in this context.

The Revealed Preference Approach. We compare the
utility from the choice the individual made to that of an alter-
native feasible choice. Our theoretical assumption is that the
agent expects this difference to be positive.

The analogue to the “sample design” question here is:

   Which alternative should the econometrician choose?

The answer depends on the parameter of interest. Given this
interest, the larger the choice set, the easier it will be to find an
alternative that does this. This is the sense in which increasing
the size of the choice set can not but help the moment inequality
analysis.

Our interest is in analyzing the cost of drive time, so we chose an
alternative which will allow us to analyze that without making
the assumptions required to estimate W (b, z, θ).




                                 6
For a particular di, choose d (di, zi) to be the purchase of
  • the same basket of goods,
  • at a store which is further away from the consumer’s home
    then the store the consumer shopped at.
Notice that choosing the same basket at the alterantive store is
dominated by choosing the optimal basket for that store which,
at least before going to the store, is revealed to be inferior to
choosing the optimal basket for the store chosen. So transitivity
of preferences gives us the desired inequality.

With this alternative we need not specify the utility from differ-
ent baskets of goods; i.e. it allows us to hold fixed the dimension
of the choice that generated the problem with the size of the
choice set, and investigate the impact of the dimension of inter-
est (travel time) in isolation. Notice also that
  • We need not specify the whole choice set, another issue
    which is typically a difficult specification issue,
  • The altenative chosen differs with the agent’s choice and
    characteristics.

The Analysis. Let E(·) be the agent’s expectation operator,
and for any function f (x, d), let
               ∆f (x, d, d ) ≡ f (x, d) − f (x, d ).
Then the behaviroal assumption we will make is that if di was
chosen and d (di, zi) was feasible when that decision was made

                                 7
E[∆U (di, d (di, zi), z, θ)|Ji] =
  −E[∆e(di, d (di, zi))|Ji] − θi E[∆dt(di, d (di, zi))|Ji] ≥ 0.

Notice that
  • I have not assumed that the agent’s perceptions of prices
    are “correct” in any sense; I will come back to what I need
    here below
  • I have not had to specify the agent’s priors or the informa-
    tion those priors rely on; and so have avoided the difficult
    specification issue referred to above.

The Inequalities.
We develop inequalities to be used in estimation for two separate
cases.

Case 1: θi = θ0. More generally all determinants of drive
time are captured by variables the econometrician observes and
includes in the specification (in terms of our previous notation,
there is no random coefficient on drive time). Assume that


 N −1           E[∆e(di, d (di, zi))] − N −1        ∆e(di, d (di, zi)) →P 0,
            i                                   i

N −1            E[∆dt(di, d (di, zi))] − N −1       ∆dt(di, d (di, zi)) →P 0
        i                                       i
where →P denotes convergence in probability. This would be
true if, for example, agents were correct on average. Recall that
                                        8
our assumption on the properties of the choice implied that
      −E[∆e(di, d (di, zi))] − θ E[∆dt(di, d (di, zi))] ≥ 0
so the assumption made above that agents do not err on average
gives us
                 i ∆e(di , d (di , zi ))
            −                            →p θ ≤ θ0.
                i ∆dt(di, d (di, zi))

If we would have also taken an alternative store which was closer
to the individual (say alternative d ) then

                 i ∆e(di , d
                          (di, zi))
             −                      →p θ ≥ θ0.
              i ∆dt(di, d (di, zi))
and we would have consistent estimates of [θ, θ] which bound
θ0.

Case 2: θi = (θ0 +νi), νi = 0. This case allows for a deter-
minant of the cost of drive times (νi) that is known to the agent
at the time the agent makes its decision (since the agent condi-
tions on it when it makes its decision), but is not known to the
econometrician. It corresponds to the breakdown used in the
prior lecture of a coefficient which is a function of observables
and an unobservable term (or the random coefficient); i.e.
                         θi = ziβz + νi.
Of course were we to actually introduce additional observable
determinants of θi we would typically want to introduce also
additional inequalities. We come back to this below, but for
simplicity we now concentrate on estimating an (unconditional)
                                9
mean; i.e. θ0. Though this specification looks the same as that
in the last lecture, the assumptions on the random terms made
here and the conclusions we can draw from our estimator will
be different now.

Now provided dt(di) and dt(d (di, zi)) are known to the agent
when it makes its decision
   −E[∆e(di, d (di, zi))] − (θ0 + νi)[∆dt(di, d (di, zi))] ≥ 0
which, since ∆dt(di, d (di, zi)) ≤ 0, implies
                −∆e(di, d (di, zi))
            E                       − (θ0 + νi) ≤ 0.
                ∆dt(di, d (di, zi))
So provided agents’ expectations on expenditures are not “sys-
tematically” biased

           1     ∆e(di, d (di, zi))
                                    →P θ ≤ θ0.
          N i ∆dt(di, d (di, zi))
and we could get a lower bound as above.

There are two points about this latter derivation that should
be kept in mind.
  • First we did not need to assume that νi is independent of
    zi. This would have also been true in a richer model where
    we assumed θi = ziβz + νi. This is particularly useful in
    models for purchase of a good that has a subsequent cost
    of use like a car or an air conditioner. Then utilization (or
    expected utilization) would be a right hand side variable
    and one would think that unobservable determinants of the

                               10
importance of efficiency in use in product choice would be
   correlated with the determinants of utilitzation (in the car
   example, consumers who care more about miles per gallon
   are likely to be consumers who expect to drive further).
 • Relatedly we did not have to assume a distribution for the
   νi. The flip side of this is that neither have we developed
   an estimator for the variance of the random term. Though,
   with additional assumptions we could develop such an es-
   timator, in this lecture I am not going to do that (though
   variance in the observable determinants of the aversion will
   be considered below).


Case 1 vs. Case 2.
 • Case 1 estimates using the ratio of averages, while case 2
   estimates by using the average of the ratios. Both of these
   are trivially easy to compute.
 • Case 2 allows for unobserved heterogeneity in the coefficient
   of interest and does not need to specify what the distribu-
   tion of that unobservable is. Case 1 ignores the possiblity
   of unobserved heterogeneity in tastes.
 • If the unobserved determinant of drive time costs (νi) is
   correlated with drive time (dt), then Case 1 and Case 2
   estimators should be different. If not, they should be the
   same. So there is a test for whether any unobserved differ-
   ences in preferences are correlated with the “independent”
   variable.

                              11
Empirical Results.

Data. Nielsen Homescan Panel, 2004 & data on store char-
acteristics from Trade Dimensions. Chooses families from Mas-
sachusetts.


 Discrete Choice Comparison Model. The multino-
mial model divides observations into expenditure classes, and
then uses a typical expenditure bundle for that class to form
the expenditure level (the “price index” for each outlet). Other
x’s are drive time, store characteristics, and individual charac-
teristics. Note that
  • the prices for the expenditure class need not reflect the
    prices of the goods the individual actually is interested in
    (so there is an error in price, and it is likely to be negatively
    correlated with price itself.)
  • it assumes that the agents knew the goods available in the
    store and their prices exactly when they decided which store
    to choose (i.e. it does not allow for expectational error)
  • it does not allow for unobserved heterogeneity in the effects
    of drive time. We could allow for a random coefficient on
    drive time, but then we would need a conditional distribu-
    tion for the drive time coefficient....
For these reasons, as well as the aggregation, its estimates are
the estimates of a model which generates “summary statistics”


                                 12
models as discussed above (colloquially referred to as a reduced
form model), and should not be interpreted as causal.

Focus. Allows drive time coefficient to vary with household
characteristics. Focus is on the average of the drive time coef-
ficient for the median characteristics (about forty coefficients;
chain dummies, outlet size, employees, amenities...).

Multinomial Model : median cost of drive time per hour was
$240 (when the median wage in this region is $17). Also sev-
eral coefficients have the “wrong” sign or order (nearness to a
subway stop, several amenities, and chain dummies).

Inequality estimators. The inequality estimators were ob-
tained from differences between the chosen store and four differ-
ent counterfactual store choices (chosen to reflect price and dis-
tance differences with the chosen store). Each comparison was
interacted with positive functions of twenty six “instruments” ,
producing over a hundred moment inequalities. We come back
to a more formal treatment of instruments below but for now I
simply note that what we require of each instrument, say h(x),
is that it be
  • known to the agent when it made its decision (so when
    the agent makes its decision the agent can condition on the
    instrument’s value)
  • uncorrelated with the sources of the unobservable that the
    agent knew when it made its decision (in our case νi)
  • and non-negative.
                               13
If these conditions are satisfied and
                 −∆e(di, d (di, zi))
             E                       − (θ0 + νi) ≤ 0.
                 ∆dt(di, d (di, zi))
then so will be
                   −∆e(di, d (di, zi))
         Eh(xi)                        − (θ0 + νi) ≤ 0.
                   ∆dt(di, d (di, zi))
So, again, provided agents’ expectations are not “systemati-
cally” biased
         1        h(xi)∆e(di, d (di, zi))
                                            →P θ ≤ θ0.
         N   i      ∆dt(di, d (di, zi))
and we have generated an additional moment inequality.
   As is not unusual for problems with many more inequalities
than bounds to estimate, the inequality estimation routine gen-
erated point (rather than interval) estimates for the coefficients
of interest (there was no value of the parameter vector that sat-
isfied all of the moment inequalities). However tests, that we
come back to below, indicated that one could accept the null
that this result was due to sampling error. The numbers beside
the estimates are confidence intervals with the same interpre-
tation as confidence intervals as we are used to; under the null
the confidence interval will capture the true parameter value in
95% of samples. The confidence intervals presented here are
from a very conservative procedure and could be tightened.

• Inequality estimates with
          θi = θ0 : .204 [.126, .255]. ⇒ $4/hour,

                                 14
• Inequality estimates with
        θi = θ0 + νi : .544 [.257, .666], ⇒ $14/hour
and other coefficients straighten out.
   Apparently the unobserved component of the coefficient of
drive time is negatively correlated with observed drive time dif-
ferences.
   Katz only estimated the mean of the drive time coefficient
conditional on observable demographics, and then used it, plus
the distribution of demographics to evaluate policy counterfac-
tuals. If one were after more details on the W (·) function, one
would consider different counterfactuals (and presumably addi-
tional functional form restrictions). If one observed the same
household over time and was willing to assume stability of pref-
erence parameters over time, many other options are available,
some mimicing what we did with second choice data in the last
lecture.

Estimation, Testing, and C.I.’s: An Introduction.

The finding that there is no value of the parameter vector that
satisfies all the inequalities is not unusual in moment inequality
problems with many inequalities. Consider the one parameter
case.
   When there are many moment inequalities there are many
upper and lower bounds for that parameter. The estimation
routine forms an interval estimate from the least upper and the
greatest lower bound. In finite samples the moments distribute
normally about their true value and as a result there will be a
                               15
negative bias in choosing the smallest of the upper bounds due
to finite sample sampling error. Similarly there will be a posi-
tive bias in choosing the largest of the lower bounds. It is easy
for these two to cross, even if the model is correctly specified.
So there is an interest in building a test which distinguishes
between the possibility that we obtain a point because of sam-
pling error, and the possibility that we obtain a point because
the model is miss-specified.
   We begin, however, by setting up the estimation problem a
bit more formally. For more on what follows see Chernozhukov,
Hong and Tamer (Econometirca, 2007), Andrews and Soares
(Econometrica, 2010), and the articles cited above.
Our model delivers the condition that
              E [∆U (di, d (di, zi), zi, θ0) ⊗ h(xi)]
             ≡ E[m(di, zi, xi, θ)] ≥ 0, at θ = θ0.
where both ∆U (·) and h(·), and hence m(·), may be vectors
and ⊗ is the Kronecker product operator.
Estimator. Essentially we form a sample analog which pe-
nalizes values of θ that does not satisfy these conditions, but
accepts all those that do. More formally the sample moments
are
                            1
                m(Pn, θ) =       m(di, zi, xi, θ)
                            n i
and their variance is
               Σ(Pn, θ) = V ar(m(zi, di, xi, θ)).


                                16
The corresponding population moments are (m(P, θ), Σ(P, θ))
and our model assumes
                         m(P, θ0) ≥ 0.
The set of values of θ that satisfy the inequalities is denoted
                  Θ0 = {θ : m(P, θ) ≥ 0 },
and called the identified set. This is all we can hope to estimate.

Estimator. For now I am going to assume that Σ0, the covari-
ance matrix of the moments is the identity matrix. This will
lead to an estimator which can be improved upon by dividing
each moment by an estimate of its standard error, and repeat-
ing what I do here, but that makes the notation more difficult
to follow.
   Recall that our model does not distinguish between values of
θ that make the moments positive. All the model says is that
at θ0 the moments are positive (at least in the limit). So all we
want to do is to penalize values of θ that lead to negative values
of the moments. More formally if f (·)− ≡ min(0, f (·)) and for
any vector f , f denotes the sum of squares of f , then our
estimator is
                  Θn = arg min m(Pn, θ)− .
                              θ
This could be a set, or, as in Katz’s case, it could be a single
parameter value. Several papers prove that Θn converges to Θ0
in an apprpriate norm.



                                  17
Measures of Precision. There are several different ways
of conceptualizing measures of the precisions of your (set) esti-
mator. We could attempt to
  • get a confidence set for the set; i.e. a set which would
    cover the identified set 95% of the time (starts with Cher-
    nozhukov Hong and Tamer, Econometrica 2007)
  • Get a confidence set for the point θ0 (starts with Imbens
    and Manski, Econometrica, 2004).
  • get confidence interval for intervals defined for a particular
    direction in the parameter space; simplest case is directions
    defined by each component of the parameter space so we get
    CI’s which cover the different components of the parameter
    vector 95% of the time (analogous to reporting standard
    errors component by component, see PPHI, 2011)
There are also several ways to obtain CI’s for each of the differ-
ent concepts. I am going to give a conceptually simple method
for getting a confidence set for the point θ0. The procedure
I will go over will also be simple computationally when the θ
vector is low dimensional. For high dimensional θ vectors you
will often have to go to PPHI. For precision improvements on
the techniques described here see Andrews and Guggenberger,
(Econometric Theory, 2009), and the literature cited there.

           Confidence Sets for the Point, θ0.

  The procedure here tests each “possible value” of θ to see if it
can reasonably be assumed to satisfy the moment inequalities.
                                18
To do this we have to start out by defining a grid of possible
values say {Θp} = {θp,j , j = 1, . . . Jp}. It is because we have
to do the following procedure at every one of the points in the
grid that this procedure often becomes impractical when there
is a large dimensional parameter vector.
   To accept a value of θp the value of the objective function
when evaluated at θp (or m(θp, Pn)− ) must be within sam-
pling error of zero. The problem is that the distribution of
this object is not analytic. However if we are willing to assume
a value for the mean of m(θp, Pn), then it is easy to simu-
late a good approximation to the distribution of m(θp, Pn)− ,
as the central limit theorem insures that m(θp, Pn) distributes
normally about that mean with a variance that is well approx-
imated by V ar(m(θp, Pn)).
   Assuming temporarily that the mean m(θp, Pn) = 0 we con-
struct the distribution of m(θp, Pn)− under that assump-
tion. Take random draws from a normal distribution with mean
zero and variance-covariance V ar(m(θp, Pn)). For each ran-
                  ˜
dom draw, say m(θp), form the Euclidean norm of its negative
part, that is compute m(θp, Pn)− , and keep track of the dis-
                         ˜
tribution of these numbers. Let c(θp) be the (1 − α) quantile
of this distribution. c(θp) is the critical value for our test. Note
that it is greater than the critical value that would obtain from
a similar procedure but an assumption that the expectation of
m(θp, Pn) is any positive number.
   Now go back to the original sample and compute the actual
value of m(θp, Pn) . If it is less than c(θp), θp is accepted into
the estimate of the confidence set for θ0. If not it is rejected.

                                 19
Because we have used a critical value that is larger than the
critical value for any other mean assumption that is consistent
with the model, the probability of rejecting the null when it
is true is less than α no matter the true value of the mean of
m(θp, Pn). If there is no value of θp that is accepted, we reject
the null that the model is correctly specified.

Moment Inequalities vs. Standard Discrete Choice.

The limits of the moment inequality methods are still being
actively explored. Its advantages are that
  • It typically requires weaker behavioral assumptions than
    the standard theory. In particular it need not specify infor-
    mation sets and prior probabilties when it is likely that the
    agent does not have perfect information on the properties
    of the choice set, and allows agents to make errors, provided
    they are not consistently wrong. That is, it has a way of
    controlling for expectational errors provided they are mean
    zero.
  • Something I have not stressed here, but is important in
    many applications, is that the fact that moment inequali-
    ties only use averages implies that it also controls for zero
    mean measurement errors (see Ho and Pakes 2012, working
    paper.).
  • It does not require specificaiton of the entire choice set; just
    one counterfactual choice that the agent knew was feasible
    when it made the choice that it did make.

                                20
The biggest disadvantage of moment inequalities is that
  • It has limited ability to handle variables that the agent does
    know when it makes its choice, but the econometrician does
    not observe.
In the Katz example we allowed for two types of unobservables
that the agent knew when it made its decision, but we did not
observe.
  • One was the class of factors that determined the utility
    from the bundle of goods bought. In our notation we did
    not need to specify W (bi, zi, θ), so any determinant of that
    quantity could be unobserved and not impede the estima-
    tion algorithm.
  • In case 2 we also did not observe a determinant of the value
    of drive time, our νi.
   We were able to account for the determinants of the utility
from the bundle of goods bought because of:
  • the assumed additivity of the utility function in that bun-
    dle, and
  • the fact that we had a very large choice set, a choice set
    that allowed us to hold that bundle constant at the same
    time as changing the dimension of the choice whose impact
    we wanted to investigate.
   We were able to account for the an unobserved determinant
of the importance of drive time that the agent knew but the
econometrician did not, (our θi) because
                               21
• we could find a counterfactual which was linear in the effect
    of θi for every individual in the sample.
We then estimated the average effect of these variables.
   Had we only been able to find such a transform for a subset of
the consumers in the sample, say those that satisfy a particular
condition, we would have had to deal with a selected sample of
θi, and we would not have been able to estimate the mean of
interest (at least not without either further assumptions, or a
wider bound than provided here).

A More General Model. A more general model would
have the agent maximizing
                     ˜
E[U (di, zi, θ)] = E[W (bi, zi, θb)] − E[˜(bi, si)] − θidt(si, zi) +
                                         e                             i,s

where
  • a tilde over a variable indicates that it is random when the
    decision of which store to go to is taken and
  •   i,s is a disturbance which differs by store that is known
      to the agent as it makes its store choice but not to the
      econometrician.
We observe the actual

               ˜
  W (·) (not E[W (bi, zi, θb)]), and e(·) (not E[˜(bi, si)]).
                                                 e

The difference can be caused by either zero mean expectational
or measurement error.

                                  22
• The discrete choice model does not allow for these differ-
    ences between the observed right hand side variables and
    their expectations. It does however allow for a parametric
    form for the distribution of i,j .
  • The inequality model presented here does not allow for the
     i,s , but does allow for the differences between the observed
    right hand side variables and their expectations.
This means that when you are using discrete choice you should
pay particular attention to; (i) the possiblitiy of differences be-
tween observed values of right hand side variables and the per-
ceptions that motivated the choice and to (ii) functional forms
for the distribution of unobservables. If you are using this ver-
sion of moment inequalities, you should pay particular attention
to observable determinants of store characteristics; for example
if the agent went to Whole Foods, you might want to look for a
Whole Foods store (rather than any store) that is farther away
(at least to see if that makes a difference).

Note. There is active research directed at maintaining the
weaker assumptions of the moment inequality approach, and
accomodating choice specific unobservables (see Dickstein and
Morales, 2012, Pakes and Porter 2012.). It will clearly make the
bounds wider but the extent of that is unclear. Indeed none of
this will change the fact that the moment inequality methods
will not give you a point estimate of a parameter, while standard
discrete choice will. So if the standard discrete choice methods
gives an acceptable approximation for the problem at hand, it
is preferred.
                                23

Weitere ähnliche Inhalte

Was ist angesagt?

Chapter 8 Marketing Research Malhotra
Chapter 8 Marketing Research MalhotraChapter 8 Marketing Research Malhotra
Chapter 8 Marketing Research Malhotra
AADITYA TANTIA
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSS
ANSHU TIWARI
 
Chapter 11 Marketing Reserach , Malhotra
Chapter 11 Marketing Reserach , MalhotraChapter 11 Marketing Reserach , Malhotra
Chapter 11 Marketing Reserach , Malhotra
AADITYA TANTIA
 

Was ist angesagt? (20)

Demand Estimation
Demand EstimationDemand Estimation
Demand Estimation
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
Statistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling DistributionStatistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling Distribution
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data sets
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
Mba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendixMba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendix
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
Multivariate
MultivariateMultivariate
Multivariate
 
Chapter 8 Marketing Research Malhotra
Chapter 8 Marketing Research MalhotraChapter 8 Marketing Research Malhotra
Chapter 8 Marketing Research Malhotra
 
Cluster analysis in prespective to Marketing Research
Cluster analysis in prespective to Marketing ResearchCluster analysis in prespective to Marketing Research
Cluster analysis in prespective to Marketing Research
 
Chap019
Chap019Chap019
Chap019
 
FSRM 582 Project
FSRM 582 ProjectFSRM 582 Project
FSRM 582 Project
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSS
 
Feature selection
Feature selectionFeature selection
Feature selection
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
Chapter 11 Marketing Reserach , Malhotra
Chapter 11 Marketing Reserach , MalhotraChapter 11 Marketing Reserach , Malhotra
Chapter 11 Marketing Reserach , Malhotra
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
 

Andere mochten auch

Measurement of Consumer Welfare
Measurement of Consumer WelfareMeasurement of Consumer Welfare
Measurement of Consumer Welfare
NBER
 
Applications and Choice of IVs.
Applications and Choice of IVs.Applications and Choice of IVs.
Applications and Choice of IVs.
NBER
 
Estimation of Static Discrete Choice Models Using Market Level Data
Estimation of Static Discrete Choice Models Using Market Level DataEstimation of Static Discrete Choice Models Using Market Level Data
Estimation of Static Discrete Choice Models Using Market Level Data
NBER
 
Acemoglu lecture4
Acemoglu lecture4Acemoglu lecture4
Acemoglu lecture4
NBER
 
Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3
NBER
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
NBER
 
Chapter 1 nonlinear
Chapter 1 nonlinearChapter 1 nonlinear
Chapter 1 nonlinear
NBER
 
Lecture on solving1
Lecture on solving1Lecture on solving1
Lecture on solving1
NBER
 
Csvfrictions
CsvfrictionsCsvfrictions
Csvfrictions
NBER
 
Lecture on nk [compatibility mode]
Lecture on nk [compatibility mode]Lecture on nk [compatibility mode]
Lecture on nk [compatibility mode]
NBER
 
Dynare exercise
Dynare exerciseDynare exercise
Dynare exercise
NBER
 
Optimalpolicyhandout
OptimalpolicyhandoutOptimalpolicyhandout
Optimalpolicyhandout
NBER
 
Chapter 5 heterogeneous
Chapter 5 heterogeneousChapter 5 heterogeneous
Chapter 5 heterogeneous
NBER
 

Andere mochten auch (20)

Measurement of Consumer Welfare
Measurement of Consumer WelfareMeasurement of Consumer Welfare
Measurement of Consumer Welfare
 
Applications and Choice of IVs.
Applications and Choice of IVs.Applications and Choice of IVs.
Applications and Choice of IVs.
 
Estimation of Static Discrete Choice Models Using Market Level Data
Estimation of Static Discrete Choice Models Using Market Level DataEstimation of Static Discrete Choice Models Using Market Level Data
Estimation of Static Discrete Choice Models Using Market Level Data
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
High-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural EffectsHigh-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural Effects
 
Acemoglu lecture4
Acemoglu lecture4Acemoglu lecture4
Acemoglu lecture4
 
Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
 
Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1
 
Chapter 1 nonlinear
Chapter 1 nonlinearChapter 1 nonlinear
Chapter 1 nonlinear
 
Lecture on solving1
Lecture on solving1Lecture on solving1
Lecture on solving1
 
Csvfrictions
CsvfrictionsCsvfrictions
Csvfrictions
 
Lecture on nk [compatibility mode]
Lecture on nk [compatibility mode]Lecture on nk [compatibility mode]
Lecture on nk [compatibility mode]
 
Dynare exercise
Dynare exerciseDynare exercise
Dynare exercise
 
Optimalpolicyhandout
OptimalpolicyhandoutOptimalpolicyhandout
Optimalpolicyhandout
 
Chapter 5 heterogeneous
Chapter 5 heterogeneousChapter 5 heterogeneous
Chapter 5 heterogeneous
 

Ähnlich wie Lecture 4: NBERMetrics

Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
HarshitGoel87
 
5 introduction to decision making methods
5 introduction to decision making methods 5 introduction to decision making methods
5 introduction to decision making methods
avira1980
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
Zoha Qureshi
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
Remas Mohamed
 
Single Resource Revenue Management Problems withDependent De.docx
Single Resource Revenue Management Problems withDependent De.docxSingle Resource Revenue Management Problems withDependent De.docx
Single Resource Revenue Management Problems withDependent De.docx
budabrooks46239
 
Assignment oprations research luv
Assignment oprations research luvAssignment oprations research luv
Assignment oprations research luv
Ashok Sharma
 

Ähnlich wie Lecture 4: NBERMetrics (20)

Berg k. Continuous learning methods in two-buyer pricing
Berg k. Continuous learning methods in two-buyer pricingBerg k. Continuous learning methods in two-buyer pricing
Berg k. Continuous learning methods in two-buyer pricing
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
 
Fundamentals of Quantitative Analysis
Fundamentals of Quantitative AnalysisFundamentals of Quantitative Analysis
Fundamentals of Quantitative Analysis
 
5 introduction to decision making methods
5 introduction to decision making methods 5 introduction to decision making methods
5 introduction to decision making methods
 
10.1.1.86.6292
10.1.1.86.629210.1.1.86.6292
10.1.1.86.6292
 
Introduction to decision making methods
Introduction to decision making methodsIntroduction to decision making methods
Introduction to decision making methods
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
Factor Analysis.ppt
Factor Analysis.pptFactor Analysis.ppt
Factor Analysis.ppt
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
 
Single Resource Revenue Management Problems withDependent De.docx
Single Resource Revenue Management Problems withDependent De.docxSingle Resource Revenue Management Problems withDependent De.docx
Single Resource Revenue Management Problems withDependent De.docx
 
Masket Basket Analysis
Masket Basket AnalysisMasket Basket Analysis
Masket Basket Analysis
 
A Recommender System Sensitive to Intransitive Choice and Preference Reversals
A Recommender System Sensitive to Intransitive Choice and Preference ReversalsA Recommender System Sensitive to Intransitive Choice and Preference Reversals
A Recommender System Sensitive to Intransitive Choice and Preference Reversals
 
Factor analysis using spss 2005
Factor analysis using spss 2005Factor analysis using spss 2005
Factor analysis using spss 2005
 
Assignment oprations research luv
Assignment oprations research luvAssignment oprations research luv
Assignment oprations research luv
 
SN- Lecture 4
SN- Lecture 4SN- Lecture 4
SN- Lecture 4
 
Axioms
AxiomsAxioms
Axioms
 

Mehr von NBER

FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATESFISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
NBER
 
Business in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They PayBusiness in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They Pay
NBER
 
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
NBER
 
The Distributional E ffects of U.S. Clean Energy Tax Credits
The Distributional Effects of U.S. Clean Energy Tax CreditsThe Distributional Effects of U.S. Clean Energy Tax Credits
The Distributional E ffects of U.S. Clean Energy Tax Credits
NBER
 
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
NBER
 

Mehr von NBER (11)

FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATESFISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
 
Business in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They PayBusiness in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They Pay
 
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
 
The Distributional E ffects of U.S. Clean Energy Tax Credits
The Distributional Effects of U.S. Clean Energy Tax CreditsThe Distributional Effects of U.S. Clean Energy Tax Credits
The Distributional E ffects of U.S. Clean Energy Tax Credits
 
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
The NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua GansThe NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua Gans
 
The NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia GoldinThe NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia Goldin
 
The NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James PoterbaThe NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James Poterba
 
The NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott SternThe NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott Stern
 
The NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn EllisonThe NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn Ellison
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Lecture 4: NBERMetrics

  • 1. Lecture 4: NBERMetrics. Ariel Pakes July 17, 2012 On The Use of Moment Inequalities in Discrete Choice Models. From the point of view of consumer theory the work on es- timating preference parameters from moment inequalities is a follow up to theory literature on “revealed preference” begun by Paul Samuelson (1938, Economica); see the review by Hal Varian (2005, in Samuelsonian Economics and the 21st Cen- tury.). The difference between the theory treatment of revealed preference, and the approach I will sketch here is our interest in estimation and therefore our attention to the sources of dis- turbances that enter the revealed preference approach when it is brought to data. The approach used here dates to Pakes, Porter, Ho and Ishii (2011, working paper ) henceforth PPHI, and Pakes (2011, Econometrica). It differs from standard empirical models of choice by work- ing directly with the inequalities which define optimal behavior. I.e. we see how far we can get by just assuming that we know a 1
  • 2. counterfactual which agents considered and discarded, and then assuming that, at least on average, the utility from the actual choice should be larger than the utility from the counterfac- tual. Only certain values of the parameter vector will make this statement true, and we accept any parameter values which insure that the average difference is positive. Typically if one value satisfies the inequalities, so will values close to it. So there will be many values of the parameter vec- tor that satisfy the inequalities and we will obtain a “set” of acceptable values. If the model is correct, what we will know is that the true value is in the set (at least asymptotically). I have found it easiest to explain what is going on in this approach by starting out a simple single agent example. Single Agent Example: Due to M. Katz (2007) Es- timate the costs shoppers assign to driving to a supermarket. This is a topic of importance to the analysis of zoning regula- tions, public transportation projects, and the like. Moreover, it has proven difficult to analyze empirically with standard choice models because of the complexity of the choice set facing con- sumers: all possible bundles of goods at all possible supermar- kets. We shall see that though large choice sets make discrete choice analysis difficult, the larger the choice set the easier it is to use the moment inequality approach. Assume that the agents’ utility functions are additively sep- arable functions of; • utility from the basket of goods bought, 2
  • 3. • expenditure on that basket, and • drive time to the supermarket. Let di = (bi, si) the vector of decisions made by the agent, bi = b(di) be the basket of goods bought, si = s(di) is the store chosen, and zi are individual characteristics U (di, zi, θ) = W (bi, zi, θb) − e(bi, si) − θidt(si, zi), where W (·) is the utility from the bundle of goods bought, e(·) provides expenditure, dt(·) provides drive time, and I have used the free normalization that comes from the equivalence of affine transforms on expenditure (so the cost of drive time are in dollars). Standard Discrete Choice Model’s For This Prob- lem. There are really two aspects that make a behavioral (or structural) model of this decision difficult, and two approaches to the analysis. A standard full information structural model would assume that the agent knows the prices of all bundles of goods at all stores, and makes its choice by • computing the basket that maximizes the utility at each store, and substituting that into the U (·), and then • maximizing over stores. This has both behavioral and computational aspects which are troubling. 3
  • 4. • Behaviorally it endows the agent with both; an incredible amount of information, and massive computational abili- ties. • Computationally it requires specification of and computa- tion of the value of a huge choice set, and if the U (bi, θ) is specified in a reasonably rich way, simultaneous estima- tion of a large number of parameters. Both of these make obtaining consistent estimates a difficult task. Probably a more realistic way of looking at the problem would be through the lenses of a two period model. • In the first period the agent choses which store to travel to, perhaps without full knowledge of the prices at each store. • In the second period the agent, having gone to the store chosen, choses the bundle of goods it wishes to buy from the goods available at the store. This does not ameliorate the computational complexity of the problem (see below). However it does get rid of the behavioral problem. On the other hand, it replaces the behavioral problem with a specification problem. That is to proceed in this way the econometrician • needs to specify the agent’s prior probability for each pos- sible price at each store, and • then compute the integral of the bundle chosen given each possible price vector at each store Econometricians never see priors, and seldom have knowledge of what the agent can condition on when it formulates its prior. 4
  • 5. So the allowance for incomplete information replaces the behav- ioral problem with a specification problem. Moreover if any- thing, it adds to the computational complexity of the analysis by requiring the computation of expectations1. Behavioral Model vs. Descriptive Summary Statistics. I have focused here on estimating the parameters of the behav- ioral model. I did this because the interest was in a parameter vector which was then to be used for policy analysis (which pre- sumably involves counterfactuals). If we were not interested in the behavioral parameters, but rather were interested in sum- marizing the data on store choice in an intuitive way (say a way that subsequent research might use to build stylized models for the choice) , there is a form of a standard discrete choice model that makes sense (at least for the single agent problems we are concerned with today). We would • group choices of baskets of goods somehow, • project the utility from the basket down on “variables of in- terest” which leaves a residual which is orthogonal to those variables. • evaluate the possible baskets at different stores. Then if we were willing to assume the distribution of the residual satisfied some parametric form we would be back to the familiar discrete choice mdoel. However • the coefficients could not be relied upon to provide an ad- equate approximation to what would happen were we to 1 One way of getting around these problems is to obtain survey measures of expectations (see the Manski, 2001, review in Econometrica), but this is often impracticle. 5
  • 6. change a variable of interest (we come back to an example of this below), and • it is not clear what one is doing in using a more complicated discrete choice model, for example a random coefficients model, in this context. The Revealed Preference Approach. We compare the utility from the choice the individual made to that of an alter- native feasible choice. Our theoretical assumption is that the agent expects this difference to be positive. The analogue to the “sample design” question here is: Which alternative should the econometrician choose? The answer depends on the parameter of interest. Given this interest, the larger the choice set, the easier it will be to find an alternative that does this. This is the sense in which increasing the size of the choice set can not but help the moment inequality analysis. Our interest is in analyzing the cost of drive time, so we chose an alternative which will allow us to analyze that without making the assumptions required to estimate W (b, z, θ). 6
  • 7. For a particular di, choose d (di, zi) to be the purchase of • the same basket of goods, • at a store which is further away from the consumer’s home then the store the consumer shopped at. Notice that choosing the same basket at the alterantive store is dominated by choosing the optimal basket for that store which, at least before going to the store, is revealed to be inferior to choosing the optimal basket for the store chosen. So transitivity of preferences gives us the desired inequality. With this alternative we need not specify the utility from differ- ent baskets of goods; i.e. it allows us to hold fixed the dimension of the choice that generated the problem with the size of the choice set, and investigate the impact of the dimension of inter- est (travel time) in isolation. Notice also that • We need not specify the whole choice set, another issue which is typically a difficult specification issue, • The altenative chosen differs with the agent’s choice and characteristics. The Analysis. Let E(·) be the agent’s expectation operator, and for any function f (x, d), let ∆f (x, d, d ) ≡ f (x, d) − f (x, d ). Then the behaviroal assumption we will make is that if di was chosen and d (di, zi) was feasible when that decision was made 7
  • 8. E[∆U (di, d (di, zi), z, θ)|Ji] = −E[∆e(di, d (di, zi))|Ji] − θi E[∆dt(di, d (di, zi))|Ji] ≥ 0. Notice that • I have not assumed that the agent’s perceptions of prices are “correct” in any sense; I will come back to what I need here below • I have not had to specify the agent’s priors or the informa- tion those priors rely on; and so have avoided the difficult specification issue referred to above. The Inequalities. We develop inequalities to be used in estimation for two separate cases. Case 1: θi = θ0. More generally all determinants of drive time are captured by variables the econometrician observes and includes in the specification (in terms of our previous notation, there is no random coefficient on drive time). Assume that N −1 E[∆e(di, d (di, zi))] − N −1 ∆e(di, d (di, zi)) →P 0, i i N −1 E[∆dt(di, d (di, zi))] − N −1 ∆dt(di, d (di, zi)) →P 0 i i where →P denotes convergence in probability. This would be true if, for example, agents were correct on average. Recall that 8
  • 9. our assumption on the properties of the choice implied that −E[∆e(di, d (di, zi))] − θ E[∆dt(di, d (di, zi))] ≥ 0 so the assumption made above that agents do not err on average gives us i ∆e(di , d (di , zi )) − →p θ ≤ θ0. i ∆dt(di, d (di, zi)) If we would have also taken an alternative store which was closer to the individual (say alternative d ) then i ∆e(di , d (di, zi)) − →p θ ≥ θ0. i ∆dt(di, d (di, zi)) and we would have consistent estimates of [θ, θ] which bound θ0. Case 2: θi = (θ0 +νi), νi = 0. This case allows for a deter- minant of the cost of drive times (νi) that is known to the agent at the time the agent makes its decision (since the agent condi- tions on it when it makes its decision), but is not known to the econometrician. It corresponds to the breakdown used in the prior lecture of a coefficient which is a function of observables and an unobservable term (or the random coefficient); i.e. θi = ziβz + νi. Of course were we to actually introduce additional observable determinants of θi we would typically want to introduce also additional inequalities. We come back to this below, but for simplicity we now concentrate on estimating an (unconditional) 9
  • 10. mean; i.e. θ0. Though this specification looks the same as that in the last lecture, the assumptions on the random terms made here and the conclusions we can draw from our estimator will be different now. Now provided dt(di) and dt(d (di, zi)) are known to the agent when it makes its decision −E[∆e(di, d (di, zi))] − (θ0 + νi)[∆dt(di, d (di, zi))] ≥ 0 which, since ∆dt(di, d (di, zi)) ≤ 0, implies −∆e(di, d (di, zi)) E − (θ0 + νi) ≤ 0. ∆dt(di, d (di, zi)) So provided agents’ expectations on expenditures are not “sys- tematically” biased 1 ∆e(di, d (di, zi)) →P θ ≤ θ0. N i ∆dt(di, d (di, zi)) and we could get a lower bound as above. There are two points about this latter derivation that should be kept in mind. • First we did not need to assume that νi is independent of zi. This would have also been true in a richer model where we assumed θi = ziβz + νi. This is particularly useful in models for purchase of a good that has a subsequent cost of use like a car or an air conditioner. Then utilization (or expected utilization) would be a right hand side variable and one would think that unobservable determinants of the 10
  • 11. importance of efficiency in use in product choice would be correlated with the determinants of utilitzation (in the car example, consumers who care more about miles per gallon are likely to be consumers who expect to drive further). • Relatedly we did not have to assume a distribution for the νi. The flip side of this is that neither have we developed an estimator for the variance of the random term. Though, with additional assumptions we could develop such an es- timator, in this lecture I am not going to do that (though variance in the observable determinants of the aversion will be considered below). Case 1 vs. Case 2. • Case 1 estimates using the ratio of averages, while case 2 estimates by using the average of the ratios. Both of these are trivially easy to compute. • Case 2 allows for unobserved heterogeneity in the coefficient of interest and does not need to specify what the distribu- tion of that unobservable is. Case 1 ignores the possiblity of unobserved heterogeneity in tastes. • If the unobserved determinant of drive time costs (νi) is correlated with drive time (dt), then Case 1 and Case 2 estimators should be different. If not, they should be the same. So there is a test for whether any unobserved differ- ences in preferences are correlated with the “independent” variable. 11
  • 12. Empirical Results. Data. Nielsen Homescan Panel, 2004 & data on store char- acteristics from Trade Dimensions. Chooses families from Mas- sachusetts. Discrete Choice Comparison Model. The multino- mial model divides observations into expenditure classes, and then uses a typical expenditure bundle for that class to form the expenditure level (the “price index” for each outlet). Other x’s are drive time, store characteristics, and individual charac- teristics. Note that • the prices for the expenditure class need not reflect the prices of the goods the individual actually is interested in (so there is an error in price, and it is likely to be negatively correlated with price itself.) • it assumes that the agents knew the goods available in the store and their prices exactly when they decided which store to choose (i.e. it does not allow for expectational error) • it does not allow for unobserved heterogeneity in the effects of drive time. We could allow for a random coefficient on drive time, but then we would need a conditional distribu- tion for the drive time coefficient.... For these reasons, as well as the aggregation, its estimates are the estimates of a model which generates “summary statistics” 12
  • 13. models as discussed above (colloquially referred to as a reduced form model), and should not be interpreted as causal. Focus. Allows drive time coefficient to vary with household characteristics. Focus is on the average of the drive time coef- ficient for the median characteristics (about forty coefficients; chain dummies, outlet size, employees, amenities...). Multinomial Model : median cost of drive time per hour was $240 (when the median wage in this region is $17). Also sev- eral coefficients have the “wrong” sign or order (nearness to a subway stop, several amenities, and chain dummies). Inequality estimators. The inequality estimators were ob- tained from differences between the chosen store and four differ- ent counterfactual store choices (chosen to reflect price and dis- tance differences with the chosen store). Each comparison was interacted with positive functions of twenty six “instruments” , producing over a hundred moment inequalities. We come back to a more formal treatment of instruments below but for now I simply note that what we require of each instrument, say h(x), is that it be • known to the agent when it made its decision (so when the agent makes its decision the agent can condition on the instrument’s value) • uncorrelated with the sources of the unobservable that the agent knew when it made its decision (in our case νi) • and non-negative. 13
  • 14. If these conditions are satisfied and −∆e(di, d (di, zi)) E − (θ0 + νi) ≤ 0. ∆dt(di, d (di, zi)) then so will be −∆e(di, d (di, zi)) Eh(xi) − (θ0 + νi) ≤ 0. ∆dt(di, d (di, zi)) So, again, provided agents’ expectations are not “systemati- cally” biased 1 h(xi)∆e(di, d (di, zi)) →P θ ≤ θ0. N i ∆dt(di, d (di, zi)) and we have generated an additional moment inequality. As is not unusual for problems with many more inequalities than bounds to estimate, the inequality estimation routine gen- erated point (rather than interval) estimates for the coefficients of interest (there was no value of the parameter vector that sat- isfied all of the moment inequalities). However tests, that we come back to below, indicated that one could accept the null that this result was due to sampling error. The numbers beside the estimates are confidence intervals with the same interpre- tation as confidence intervals as we are used to; under the null the confidence interval will capture the true parameter value in 95% of samples. The confidence intervals presented here are from a very conservative procedure and could be tightened. • Inequality estimates with θi = θ0 : .204 [.126, .255]. ⇒ $4/hour, 14
  • 15. • Inequality estimates with θi = θ0 + νi : .544 [.257, .666], ⇒ $14/hour and other coefficients straighten out. Apparently the unobserved component of the coefficient of drive time is negatively correlated with observed drive time dif- ferences. Katz only estimated the mean of the drive time coefficient conditional on observable demographics, and then used it, plus the distribution of demographics to evaluate policy counterfac- tuals. If one were after more details on the W (·) function, one would consider different counterfactuals (and presumably addi- tional functional form restrictions). If one observed the same household over time and was willing to assume stability of pref- erence parameters over time, many other options are available, some mimicing what we did with second choice data in the last lecture. Estimation, Testing, and C.I.’s: An Introduction. The finding that there is no value of the parameter vector that satisfies all the inequalities is not unusual in moment inequality problems with many inequalities. Consider the one parameter case. When there are many moment inequalities there are many upper and lower bounds for that parameter. The estimation routine forms an interval estimate from the least upper and the greatest lower bound. In finite samples the moments distribute normally about their true value and as a result there will be a 15
  • 16. negative bias in choosing the smallest of the upper bounds due to finite sample sampling error. Similarly there will be a posi- tive bias in choosing the largest of the lower bounds. It is easy for these two to cross, even if the model is correctly specified. So there is an interest in building a test which distinguishes between the possibility that we obtain a point because of sam- pling error, and the possibility that we obtain a point because the model is miss-specified. We begin, however, by setting up the estimation problem a bit more formally. For more on what follows see Chernozhukov, Hong and Tamer (Econometirca, 2007), Andrews and Soares (Econometrica, 2010), and the articles cited above. Our model delivers the condition that E [∆U (di, d (di, zi), zi, θ0) ⊗ h(xi)] ≡ E[m(di, zi, xi, θ)] ≥ 0, at θ = θ0. where both ∆U (·) and h(·), and hence m(·), may be vectors and ⊗ is the Kronecker product operator. Estimator. Essentially we form a sample analog which pe- nalizes values of θ that does not satisfy these conditions, but accepts all those that do. More formally the sample moments are 1 m(Pn, θ) = m(di, zi, xi, θ) n i and their variance is Σ(Pn, θ) = V ar(m(zi, di, xi, θ)). 16
  • 17. The corresponding population moments are (m(P, θ), Σ(P, θ)) and our model assumes m(P, θ0) ≥ 0. The set of values of θ that satisfy the inequalities is denoted Θ0 = {θ : m(P, θ) ≥ 0 }, and called the identified set. This is all we can hope to estimate. Estimator. For now I am going to assume that Σ0, the covari- ance matrix of the moments is the identity matrix. This will lead to an estimator which can be improved upon by dividing each moment by an estimate of its standard error, and repeat- ing what I do here, but that makes the notation more difficult to follow. Recall that our model does not distinguish between values of θ that make the moments positive. All the model says is that at θ0 the moments are positive (at least in the limit). So all we want to do is to penalize values of θ that lead to negative values of the moments. More formally if f (·)− ≡ min(0, f (·)) and for any vector f , f denotes the sum of squares of f , then our estimator is Θn = arg min m(Pn, θ)− . θ This could be a set, or, as in Katz’s case, it could be a single parameter value. Several papers prove that Θn converges to Θ0 in an apprpriate norm. 17
  • 18. Measures of Precision. There are several different ways of conceptualizing measures of the precisions of your (set) esti- mator. We could attempt to • get a confidence set for the set; i.e. a set which would cover the identified set 95% of the time (starts with Cher- nozhukov Hong and Tamer, Econometrica 2007) • Get a confidence set for the point θ0 (starts with Imbens and Manski, Econometrica, 2004). • get confidence interval for intervals defined for a particular direction in the parameter space; simplest case is directions defined by each component of the parameter space so we get CI’s which cover the different components of the parameter vector 95% of the time (analogous to reporting standard errors component by component, see PPHI, 2011) There are also several ways to obtain CI’s for each of the differ- ent concepts. I am going to give a conceptually simple method for getting a confidence set for the point θ0. The procedure I will go over will also be simple computationally when the θ vector is low dimensional. For high dimensional θ vectors you will often have to go to PPHI. For precision improvements on the techniques described here see Andrews and Guggenberger, (Econometric Theory, 2009), and the literature cited there. Confidence Sets for the Point, θ0. The procedure here tests each “possible value” of θ to see if it can reasonably be assumed to satisfy the moment inequalities. 18
  • 19. To do this we have to start out by defining a grid of possible values say {Θp} = {θp,j , j = 1, . . . Jp}. It is because we have to do the following procedure at every one of the points in the grid that this procedure often becomes impractical when there is a large dimensional parameter vector. To accept a value of θp the value of the objective function when evaluated at θp (or m(θp, Pn)− ) must be within sam- pling error of zero. The problem is that the distribution of this object is not analytic. However if we are willing to assume a value for the mean of m(θp, Pn), then it is easy to simu- late a good approximation to the distribution of m(θp, Pn)− , as the central limit theorem insures that m(θp, Pn) distributes normally about that mean with a variance that is well approx- imated by V ar(m(θp, Pn)). Assuming temporarily that the mean m(θp, Pn) = 0 we con- struct the distribution of m(θp, Pn)− under that assump- tion. Take random draws from a normal distribution with mean zero and variance-covariance V ar(m(θp, Pn)). For each ran- ˜ dom draw, say m(θp), form the Euclidean norm of its negative part, that is compute m(θp, Pn)− , and keep track of the dis- ˜ tribution of these numbers. Let c(θp) be the (1 − α) quantile of this distribution. c(θp) is the critical value for our test. Note that it is greater than the critical value that would obtain from a similar procedure but an assumption that the expectation of m(θp, Pn) is any positive number. Now go back to the original sample and compute the actual value of m(θp, Pn) . If it is less than c(θp), θp is accepted into the estimate of the confidence set for θ0. If not it is rejected. 19
  • 20. Because we have used a critical value that is larger than the critical value for any other mean assumption that is consistent with the model, the probability of rejecting the null when it is true is less than α no matter the true value of the mean of m(θp, Pn). If there is no value of θp that is accepted, we reject the null that the model is correctly specified. Moment Inequalities vs. Standard Discrete Choice. The limits of the moment inequality methods are still being actively explored. Its advantages are that • It typically requires weaker behavioral assumptions than the standard theory. In particular it need not specify infor- mation sets and prior probabilties when it is likely that the agent does not have perfect information on the properties of the choice set, and allows agents to make errors, provided they are not consistently wrong. That is, it has a way of controlling for expectational errors provided they are mean zero. • Something I have not stressed here, but is important in many applications, is that the fact that moment inequali- ties only use averages implies that it also controls for zero mean measurement errors (see Ho and Pakes 2012, working paper.). • It does not require specificaiton of the entire choice set; just one counterfactual choice that the agent knew was feasible when it made the choice that it did make. 20
  • 21. The biggest disadvantage of moment inequalities is that • It has limited ability to handle variables that the agent does know when it makes its choice, but the econometrician does not observe. In the Katz example we allowed for two types of unobservables that the agent knew when it made its decision, but we did not observe. • One was the class of factors that determined the utility from the bundle of goods bought. In our notation we did not need to specify W (bi, zi, θ), so any determinant of that quantity could be unobserved and not impede the estima- tion algorithm. • In case 2 we also did not observe a determinant of the value of drive time, our νi. We were able to account for the determinants of the utility from the bundle of goods bought because of: • the assumed additivity of the utility function in that bun- dle, and • the fact that we had a very large choice set, a choice set that allowed us to hold that bundle constant at the same time as changing the dimension of the choice whose impact we wanted to investigate. We were able to account for the an unobserved determinant of the importance of drive time that the agent knew but the econometrician did not, (our θi) because 21
  • 22. • we could find a counterfactual which was linear in the effect of θi for every individual in the sample. We then estimated the average effect of these variables. Had we only been able to find such a transform for a subset of the consumers in the sample, say those that satisfy a particular condition, we would have had to deal with a selected sample of θi, and we would not have been able to estimate the mean of interest (at least not without either further assumptions, or a wider bound than provided here). A More General Model. A more general model would have the agent maximizing ˜ E[U (di, zi, θ)] = E[W (bi, zi, θb)] − E[˜(bi, si)] − θidt(si, zi) + e i,s where • a tilde over a variable indicates that it is random when the decision of which store to go to is taken and • i,s is a disturbance which differs by store that is known to the agent as it makes its store choice but not to the econometrician. We observe the actual ˜ W (·) (not E[W (bi, zi, θb)]), and e(·) (not E[˜(bi, si)]). e The difference can be caused by either zero mean expectational or measurement error. 22
  • 23. • The discrete choice model does not allow for these differ- ences between the observed right hand side variables and their expectations. It does however allow for a parametric form for the distribution of i,j . • The inequality model presented here does not allow for the i,s , but does allow for the differences between the observed right hand side variables and their expectations. This means that when you are using discrete choice you should pay particular attention to; (i) the possiblitiy of differences be- tween observed values of right hand side variables and the per- ceptions that motivated the choice and to (ii) functional forms for the distribution of unobservables. If you are using this ver- sion of moment inequalities, you should pay particular attention to observable determinants of store characteristics; for example if the agent went to Whole Foods, you might want to look for a Whole Foods store (rather than any store) that is farther away (at least to see if that makes a difference). Note. There is active research directed at maintaining the weaker assumptions of the moment inequality approach, and accomodating choice specific unobservables (see Dickstein and Morales, 2012, Pakes and Porter 2012.). It will clearly make the bounds wider but the extent of that is unclear. Indeed none of this will change the fact that the moment inequality methods will not give you a point estimate of a parameter, while standard discrete choice will. So if the standard discrete choice methods gives an acceptable approximation for the problem at hand, it is preferred. 23