SlideShare ist ein Scribd-Unternehmen logo
1 von 37
On the problem of bias amplification
        of the instrumental calibration estimator
                with missing survey data

                                    Éric LESAGE

                            Laboratoire de statistique d’enquête
                                      CREST-ENSAI
        Joint work with David HAZIZA (Université de Montréal and CREST-ENSAI)


                                  31 janvier 2013


                         Séminaire de Statistique ENSAE-ENSAI
                                    CREST, Paris

Éric LESAGE   (CREST-ENSAI)       CREST(ENSAE-ENSAI)               31 janvier 2013   1 / 37
Outlines



1   Introduction


2   Underlying models


3   Bias amplification of the instrumental calibration estimators


4   Simulation study




Éric LESAGE   (CREST-ENSAI)   CREST(ENSAE-ENSAI)            31 janvier 2013   2 / 37
Introduction


Context
     Nonresponse is a major problem in survey
     In presence of nonresponse, the usual complete data estimators may
     be biased when respondents and nonrespondents are different with
     respect to the survey variables.
     A weighting approach that has received a lot of attention recently is
     the so-called single-step approach which uses calibration.
     See Deville (1998, 2002), Sautory (2003), Särndal and Lundström
     (2005), Kott (2006, 2009, 2012), among others.

Issue
     We examine the properties of instrument vector calibration estimators,
     where the instrumental variables (related to the response propensity)
     are available for the responding units only.
     More specifically, the problem of bias amplification is illustrated.

Éric LESAGE   (CREST-ENSAI)    CREST(ENSAE-ENSAI)           31 janvier 2013   3 / 37
Introduction



     Consider a finite population U of size N .
     The objective is to estimate the population total ty =                yk , of a
                                                                     k∈U
     variable of interest y(e.g., incomes).
     A sample, s, of size n, is selected from U according to a given
     sampling design p(s).
     A complete data estimator of ty is the expansion estimator

                                         ˆ
                                         tπ =           dk yk ,
                                                  k∈s

     where
              dk = 1/πk denotes the design weight attached to unit k
              and πk = P (k ∈ s) denotes its first-order probability of inclusion in the
              sample.

     In the presence of unit nonresponse, only a subset sr of s is observed,
                                            ˆ
     which makes it impossible to compute tπ .
Éric LESAGE    (CREST-ENSAI)        CREST(ENSAE-ENSAI)               31 janvier 2013   4 / 37
Introduction


     To define a nonresponse adjusted estimator of ty , we assume that
              a vector of auxiliary variables x is available for k ∈ sr ;
              the vector of population totals tx = k∈U xk is known;
              In practice, the x-vector is often defined by survey managers, who wish
              to ensure consistency between survey weighted estimates and known
              population totals for some important variables (e.g., age and sex).

     In addition, we assume that
              a vector of instrumental variables z is available for k ∈ sr ;
              same dimension as x,;
              The z-vector needs only to be available for the respondents.
              The instrumental variables are believed to be associated with the
              propensity of units to respond to the survey.

     Let Rk be a response indicator attached to unit k such that

                           Rk = 1 if unit k is a respondent
                           Rk = 0 otherwise.

Éric LESAGE    (CREST-ENSAI)       CREST(ENSAE-ENSAI)              31 janvier 2013   5 / 37
Introduction


                                   ˆ
Instrumental calibration estimator tC
     We consider an instrumental calibration estimator (Deville(1998,
     2002)) of the form
                             ˆ
                             tC =       wk Rk yk ,
                                                 k∈s
     where
                                    wk = dk F λ⊤ zk ,
                                               r


              F (.) is a function which is monotonic and twice differentiable.
     F λ⊤ zk : weighting adjustment factor which is essentially an
          r
     estimate of the inverse of the response probability for unit k.
     The weights wk are constructed so that the calibration constraints
                                           wk xk =           xk
                                    k∈sr               k∈U

     are satisfied.
Éric LESAGE    (CREST-ENSAI)        CREST(ENSAE-ENSAI)              31 janvier 2013   6 / 37
Introduction


Remarks:



     Linear weighting is a special case for which the weights wk are given by

                              wk = dk 1 + λ⊤ zk .
                                           r




     When x is used in the calibration instead of z then we have a usual
     calibration estimator and wk = dk F λ⊤ xk .
                                            r




Éric LESAGE   (CREST-ENSAI)    CREST(ENSAE-ENSAI)           31 janvier 2013   7 / 37
Introduction


Error decomposition


                        ˆ
     The total error of tC can be expressed as

                        ˆ          ˆ
                        tC − ty = (tπ − ty ) +         ˆ    ˆ
                                                      (tC − tπ )       .
                                  sampling error   nonresponse error


     Since the sampling error does not depend on nonresponse, we focus on
     the nonresponse error in the sequel.

     Without loss of generality, we consider the case of a census s = U so
                              ˆ
     that the sampling error, tπ − ty , is equal to zero.




Éric LESAGE   (CREST-ENSAI)      CREST(ENSAE-ENSAI)                    31 janvier 2013   8 / 37
Introduction


First approach: good specification of the y model

                                                                             ˆ
     Regardless of the choice of F (.), the instrument vector calibration, tC ,
     perfectly estimates ty if the variable of interest y is perfectly explained
     by the x-vector, i.e.,
                                     yk = x⊤ β
                                            k

     for some vector β.
                        ˆ
     Hence, we expect tC , to exhibit a small bias if the y-variable and the
     x-vector are linearly related and the relationship is strong.

     However, in multipurpose surveys, the number of variables of interest
     is typically large (possibly few hundred) and therefore, it is unrealistic
     to presume that the x-vector is linearly related to all y-variables, in
     which case some estimates could suffer from bias.


Éric LESAGE   (CREST-ENSAI)    CREST(ENSAE-ENSAI)             31 janvier 2013   9 / 37
Introduction


Second approach: estimation of the propensity of response


     For linear weighting, Särndal and Lundström (2005, Chapter 9)
                   ˆ
     showed that, tC is asymptotically unbiased for ty for every y-variable
     provided that the response probability of unit k, pk , is such that
                                −1
                               pk = 1 + λ⊤ zk      for all k ∈ U ;                     (1)


              for a vector of unknown constants λ;
              see also Kott and Liao (2012) for a discussion for nonlinear weighting.
     However, in practice, it is not clear how to validate the form of the
     relationship in (1) since the z-vector is available for the respondents
     only.



Éric LESAGE    (CREST-ENSAI)        CREST(ENSAE-ENSAI)               31 janvier 2013   10 / 37
Introduction




     The purpose of this presentation is to examine the so-called problem
     of bias amplification in the context of instrument vector calibration.

     In the context of epidemiological studies, it has been found that,
     including instrumental variables in the set of conditioning variables,
     can increase unmeasured confounding bias; see Bahattacharya and
     Vogt (2007), Wooldridge (2009), Pearl (2010) and Myers et al.
     (2011).

     We argue that the same is true in the context of instrument vector
     calibration.
     Some preliminary studies in this direction can be found in Lesage
     (2012) and Osier (2012).




Éric LESAGE   (CREST-ENSAI)    CREST(ENSAE-ENSAI)           31 janvier 2013   11 / 37
Underlying models


Superpopulation model

     Let (yk , zk )⊤ be a realisation of the vector of random variables
     (Yk , Zk )⊤ , k ∈ U.
     Without loss of generality, we assume that
              E (Zk ) = 0
              and V (Zk ) = 1.

     Further, we assume that the relationship between Y and Z can be
     modeled using
                                                 y
                            Yk = β0 + β1 Zk + εk
      such that
                                         E(εy | Zk ) = 0.
                                            k


              This model is often called a prediction model or outcome regression
              model.
Éric LESAGE    (CREST-ENSAI)          CREST(ENSAE-ENSAI)          31 janvier 2013   12 / 37
Underlying models


Nonresponse model

     We also assume the following nonresponse model:

                                     Rk = γ0 + γ1 Zk + εR
                                                        k

                                         E(εR | Zk ) = 0.
                                            k

     We assume that y is not a direct explanatory variable of nonresponse

                                      cov (Yk Rk | Zk ) = 0.

     Remarks
              The nonresponse model states that the response indicators Rk are
              linearly related to Z. Although this relationship may seem awkward, it
              will be useful to study the problem of bias amplification.
              A more realistic nonresponse model, namely the logistic model, is
              considered in the empirical study.

Éric LESAGE    (CREST-ENSAI)          CREST(ENSAE-ENSAI)          31 janvier 2013   13 / 37
Underlying models




                                                          Y
                                            β1


                              Z


                                            γ1

                                                          R


                       Figure: Graph of the variables y, z et R


Éric LESAGE   (CREST-ENSAI)          CREST(ENSAE-ENSAI)           31 janvier 2013   14 / 37
Bias amplification of the instrumental calibration estimators


  Naive estimator

          We consider the naive estimator
                                                                    k∈U y k Rk
                                            ˆ
                                            tnaive = N ×                       .
                                                                     k∈U Rk



          We have:
                                    ˆ
                                    tnaive   ty                 cov(Yk Rk )
                                           −             =                  + oP (1)
                                      N      N                       γ0
                                                                β1 γ1
                                                         =            + oP (1)
                                                                 γ0
                                                          √
          Example: with γ0 = 0.5, γ1 =                         3/10, β0 = 10 and β1 = 2
                                                    γ1 β1
                                                      ×   = 6.9%.
                                                    γ0 β0
    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)                31 janvier 2013   15 / 37
Bias amplification of the instrumental calibration estimators


  Instrument vector calibration estimators
          We suppose that a proxy variable of z, denoted x, is available.

   Definition
   A proxy variable of z , in nonresponse context, is a variable x such that:
      1   x is an auxiliary variable which we know the population total tx ;
      2   cor(Xk , Zk ) = 0;
      3   cov (Xk Zk | [Rk = 1]) = 0.

          We assume that the relationship between X and Z can be modeled
          using
                                 Xk = α0 + α1 Zk + εx
                                                    k
                                                     E (εx | Zk ) = 0,
                                                         k
                                                V (εx ) = σx = 1 − α2
                                                    k
                                                           2
                                                                    1
          Remarks: V (Xk ) = 1 and cor(Xk , Zk ) = α1 .
    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)      31 janvier 2013   16 / 37
Bias amplification of the instrumental calibration estimators


  Instrument vector calibration estimators




          We assume also that x is not a direct explanatory variable of the
          nonresponse
                                 cov (Xk Rk | Zk ) = 0.




    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)   31 janvier 2013   17 / 37
Bias amplification of the instrumental calibration estimators




                                                                      Y
                                                         β1


                                          Z


                                                         γ1

                                                                      R


                               Figure: Graph of the variables y, z, x et R


    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)          31 janvier 2013   18 / 37
Bias amplification of the instrumental calibration estimators




                                                                      Y
                                                         β1

                                                           α1
                                          Z                           X


                                                         γ1

                                                                      R


                               Figure: Graph of the variables y, z, x et R


    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)          31 janvier 2013   19 / 37
Bias amplification of the instrumental calibration estimators


  Instrument vector calibration estimators

          We consider the instrument vector calibration estimator with linear
          weighting
                                                                    −1
                                   ˆ
                                   tC      = t⊤ 
                                              x                 z k x⊤ 
                                                                     k            z k yk
                                                         k∈sr              k∈sr

          where
                  xk = (1, xk )⊤ ;
                  z k = (1, zk )⊤ ;
                  tx = (N, tx )⊤ .

          Since cov (Xk , Rk | Zk ) = 0, we have:

                                                ˆ
                                                tC   ty
                                                   −    = oP (1).
                                                N    N
    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)                    31 janvier 2013   20 / 37
Bias amplification of the instrumental calibration estimators


  What if...




          It is not trivial to verifiy if cov (Xk , Rk | Zk ) = 0, since the variable z
          is available only for the respondents.

          What if cov (Xk , Rk | Zk ) = 0?




    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)   31 janvier 2013   21 / 37
Bias amplification of the instrumental calibration estimators




                                                               Y
                                       β1

                                         α1
                        Z                                      X              U


                                       γ1

                                                               R


                             Figure: Graph of the variables y, z, x, u et R


    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)       31 janvier 2013   22 / 37
Bias amplification of the instrumental calibration estimators




                                                               Y
                                       β1

                                         α1                           α2
                        Z                                      X              U


                                       γ1                              γ2

                                                               R



                                          cov(Rk Xk | Z k ) = α2 γ2

    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)         31 janvier 2013   23 / 37
Bias amplification of the instrumental calibration estimators




          We now assume that it exists a non-observe variable u, independent of
          z and y, that is an explanatory variable in the nonresponse model.
          Without loss of generality, we assume that
                  E (Uk | Zk , Yk ) = 0
                  and V (Uk | Zk , Yk ) = 1.

          The nonresponse model is rewritten

                                      Rk = γ0 + γ1 Zk + γ2 Uk + εR
                                                                 k
                                                     E εR | Zk , Uk = 0.
                                                        k

          We still assume that

                                                 cov (Yk Rk | Zk ) = 0.



    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)        31 janvier 2013   24 / 37
Bias amplification of the instrumental calibration estimators




          Moreover, we assume that the variable x is linked to the variable u

                                         Xk = α0 + α1 Zk + α2 Uk + εx
                                                                    k


                  E εX | Zk , Uk = 0
                      k
                  V (εx ) = σx = 1 − α2 − α2
                      k
                             2
                                      1    2
                      R x
                  E εk εk | Zk , Uk = 0

          Then we have cov(Rk Xk | Z k ) = α2 γ2 .




    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)     31 janvier 2013   25 / 37
Bias amplification of the instrumental calibration estimators



   We have:
          ˆ
          tC   ty              β1                      1     E (Zk Xk | [Rk = 1])
             −             = −    cov(Rk Xk | Z k )
          N    N               α1                   E (Rk ) cov(Zk Xk | [Rk = 1])
                           + oP (1)


                                                                           γ1
                                                                α1 + α0
                                        β1 α2 γ2                           γ0
                              = −
                                        γ0 α1                  γ1        γ1      γ2
                                                       α1 −         α1      + α2
                                                               γ0        γ0      γ0
                              + oP (1)


          If α2 γ2 = 0 then the instruments vector calibration is “biased”;
          The “bias” is amplified if α1 is small (i.e. weak proxy).

    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)                   31 janvier 2013   26 / 37
Bias amplification of the instrumental calibration estimators


  Bias amplification for weak proxy with the instrument vector
  calibration estimator
                                H
                                  HH α2
                                        0                  0.1     0.3     0.7
                                 α1 HHH
                                       0.7            0    -0.8     -2.3    -5.8
                                       0.3            0    -1.8     -5.8   -15.5
                                       0.1            0    -5.8    -21.7   -101


                                                               Y
                                               β1
                                                 α1                α2
                                       Z                       X           U

                                               γ1                    γ2
                                                               R


    Éric LESAGE              Figure: Graph CREST(ENSAE-ENSAI) z, x, u et R31 janvier 2013
                    (CREST-ENSAI)           of the variables y,                             27 / 37
Bias amplification of the instrumental calibration estimators


  Usual calibration estimators



          We have seen that instruments vector calibration could lead to
          estimators with large biases.

          Would a simple calibration protect against such bias amplification risk?

                                                                   −1
                                ˆ
                                tC
                                       = t⊤ 
                                          x                    xk x⊤ 
                                                                   k             xk y k .                     (2)
                                N
                                                      k∈sr                k∈sr




    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)                         31 janvier 2013   28 / 37
Bias amplification of the instrumental calibration estimators




          The simple calibration estimator is asymptotically biased

                        ˆ                         2
                        tC   ty            β1 γ1 σx − α2 (α1 γ2 − α2 γ1 ) + B
                           −            =
                        N    N            γ0 β0            γ1      γ2 2
                                                  1 − α1 + α2
                                                           γ0      γ0
                                        + oP (1),

          where B = α0 α2 γ2 (α1 γ1 + α2 γ2 ) − γ1 1 − (α2 + α2 )
                                                         1    2                 is a nul
          term when α0 = 0,

          but it offers a protection against bias amplification.




    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)   31 janvier 2013   29 / 37
Bias amplification of the instrumental calibration estimators


  No bias amplification for weak proxy with the simple
  calibration estimator

          The usual calibration has a bias similar to the bias of the naive
          estimator.
          This bias is not amplified with the decrease of the correlation, α1 ,
          between x and z.

                         α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.7
                         0.7 3.8      3.5      2.8      1.5
                         0.3 6.4      6.3      6.1      5.7
                         0.1 6.9      6.8      6.8      6.8
   Table: Asymptotic relative bias (in %) of the simple calibration for different
   values of the parameters α1 and α2



    Éric LESAGE     (CREST-ENSAI)                CREST(ENSAE-ENSAI)   31 janvier 2013   30 / 37
Simulation study


Simulation study

     We generated a population U of size N = 1 000 consisting of
              a variable of interest Y ,
              several proxy variables denoted X (α1 ,α2 ) where
              α1 ∈ {0.2, 0.3, 0.5, 0.7} and α2 ∈ {0, 0.1, 0.3, 0.5},
              an instrumental variable Z
              and an unobserved variable U.
     First, the variables Z and U were generated from a uniform
                     √ √
     distribution − 3, 3 , which led to mean equal to zero and variance
     equal to 1.
     Then, given the z-values, the y-values were generated according to the
     linear regression model
              Yk = 10 + 2zk + εy ,
                                k
              where εy is normally distributed with mean 0 and variance 1.
                     k
              The resulting coefficient of determination was equal to 79.2%.


Éric LESAGE    (CREST-ENSAI)          CREST(ENSAE-ENSAI)               31 janvier 2013   31 / 37
Simulation study



     Finally, the proxy variables x(α1 ,α2 ) -values were generated according to
     the linear regression models
                (α ,α )                             (α ,α )
              Xk 1 2 = α1 zk + α2 uk + σ(α1 ,α2 ) εk 1 2
                                 2
              where σ1 (α1 , α2 ) = 1 − α2 − α2
                                          1      2
              and ε(α1 ,α2 ) was normally distributed with mean 0 and variance 1.

     In order to focus on the nonresponse error, we considered the census
     case; i.e., n = N = 1 000.

     Each unit was assigned a response probability by

                                   logit(pk ) = 1.5zk + uk

     Then, the response indicators Rk for k ∈ U were generated
     independently from a Bernoulli distribution with parameter pk .

     This whole process was repeated K = 10 000 times leading to
     K = 10 000 sets of respondents.

Éric LESAGE    (CREST-ENSAI)         CREST(ENSAE-ENSAI)            31 janvier 2013   32 / 37
Simulation study




For each simulation, we computed instruments vector calibration estimators
        ˆ
denoted tC (α1 , α2 ) where α1 ∈ {0.2, 0.3, 0.5, 0.7} and
α2 ∈ {0, 0.1, 0.3, 0.5}:

                                                                     −1
                                            ⊤
       ˆ                          N                          (α1 ,α2 )⊤ 
       tC (α1 , α2 ) =                                  z k xk                     z k yk .
                              tx(α1 ,α2 )
                                                  k∈sr                       k∈sr

We computed:
                                                  ˆ
     the Monte Carlo percent relative bias: RBM C tC (α1 , α2 )
     the Monte Carlo percent coefficient of variation (CV):
            ˆ
     CVM C tC (α1 , α2 )




Éric LESAGE   (CREST-ENSAI)           CREST(ENSAE-ENSAI)                    31 janvier 2013    33 / 37
Simulation study




     Monte Carlo relative bias
                                             ˆ
                                             tC (α1 , α2 ) − ty
           ˆ
     RBM C tC (α1 , α2 ) = EM C                                   × 100.
                                                     ty
     Monte Carlo CV
                                          ˆ
                                     VM C tC (α1 , α2 ) − ty
           ˆ
     CVM C tC (α1 , α2 ) =                                        × 100.
                                                 EM C (ty )




Éric LESAGE   (CREST-ENSAI)         CREST(ENSAE-ENSAI)               31 janvier 2013   34 / 37
Simulation study


       ˆ
RBM C (tC ) (in %)
        ˆ
 CVM C (tC ) (in %)


                  α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.5
                  0.7 0.02    −0.9     −2.8     −4.9
                      (0.9)   (0.9)    (1.0)    (1.1)
                  0.5 −0.1    −1.3     −4.1     −7.2
                      (1.4)   (1.5)    (1.7)    (2.1)
                  0.3 −0.2    −2.4     −7.5    −14.0
                      (2.6)   (3.0)    (4.1)    (5.9)
                  0.2 −0.6    −4.5    −13.8    −27.4
                      (4.6)  (15.6)   (61.9)   (65.6)




Éric LESAGE   (CREST-ENSAI)         CREST(ENSAE-ENSAI)   31 janvier 2013   35 / 37
Simulation study


Conclusion
     Instrument vector calibration is a good technique to adjust for
     nonresponse under certain conditions such as
              cov (Xk , Rk | Zk ) = 0
              or at least α1 large.
     otherwise, one can get bias and variance amplification.


                                                           Y
                                              β1

                                           α1 large
                               Z                           X

                                              γ1
                                                           R

Éric LESAGE    (CREST-ENSAI)          CREST(ENSAE-ENSAI)       31 janvier 2013   36 / 37
Simulation study




Merci de votre attention.




Éric LESAGE   (CREST-ENSAI)         CREST(ENSAE-ENSAI)   31 janvier 2013   37 / 37

Weitere ähnliche Inhalte

Was ist angesagt?

Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
olli0601
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modeling
USC
 

Was ist angesagt? (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modeling
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
 
Gtti 10032021
Gtti 10032021Gtti 10032021
Gtti 10032021
 
11.final paper -0047www.iiste.org call-for_paper-58
11.final paper -0047www.iiste.org call-for_paper-5811.final paper -0047www.iiste.org call-for_paper-58
11.final paper -0047www.iiste.org call-for_paper-58
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 

Ähnlich wie Lesage

Problem_Session_Notes
Problem_Session_NotesProblem_Session_Notes
Problem_Session_Notes
Lu Mao
 

Ähnlich wie Lesage (20)

Problem_Session_Notes
Problem_Session_NotesProblem_Session_Notes
Problem_Session_Notes
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
2.1 Union, intersection and complement
2.1 Union, intersection and complement2.1 Union, intersection and complement
2.1 Union, intersection and complement
 
QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...
QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...
QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...
 
Multivariate Distributions, an overview
Multivariate Distributions, an overviewMultivariate Distributions, an overview
Multivariate Distributions, an overview
 
1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment
 
Nokton theory-en
Nokton theory-enNokton theory-en
Nokton theory-en
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
lec24.ppt
lec24.pptlec24.ppt
lec24.ppt
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Dynamic response of structures with uncertain properties
Dynamic response of structures with uncertain propertiesDynamic response of structures with uncertain properties
Dynamic response of structures with uncertain properties
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 

Mehr von eric_gautier (8)

Rousseau
RousseauRousseau
Rousseau
 
Rouviere
RouviereRouviere
Rouviere
 
Favre
FavreFavre
Favre
 
Grelaud
GrelaudGrelaud
Grelaud
 
Ryder
RyderRyder
Ryder
 
Bertail
BertailBertail
Bertail
 
Davezies
DaveziesDavezies
Davezies
 
Collier
CollierCollier
Collier
 

Lesage

  • 1. On the problem of bias amplification of the instrumental calibration estimator with missing survey data Éric LESAGE Laboratoire de statistique d’enquête CREST-ENSAI Joint work with David HAZIZA (Université de Montréal and CREST-ENSAI) 31 janvier 2013 Séminaire de Statistique ENSAE-ENSAI CREST, Paris Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 1 / 37
  • 2. Outlines 1 Introduction 2 Underlying models 3 Bias amplification of the instrumental calibration estimators 4 Simulation study Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 2 / 37
  • 3. Introduction Context Nonresponse is a major problem in survey In presence of nonresponse, the usual complete data estimators may be biased when respondents and nonrespondents are different with respect to the survey variables. A weighting approach that has received a lot of attention recently is the so-called single-step approach which uses calibration. See Deville (1998, 2002), Sautory (2003), Särndal and Lundström (2005), Kott (2006, 2009, 2012), among others. Issue We examine the properties of instrument vector calibration estimators, where the instrumental variables (related to the response propensity) are available for the responding units only. More specifically, the problem of bias amplification is illustrated. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 3 / 37
  • 4. Introduction Consider a finite population U of size N . The objective is to estimate the population total ty = yk , of a k∈U variable of interest y(e.g., incomes). A sample, s, of size n, is selected from U according to a given sampling design p(s). A complete data estimator of ty is the expansion estimator ˆ tπ = dk yk , k∈s where dk = 1/πk denotes the design weight attached to unit k and πk = P (k ∈ s) denotes its first-order probability of inclusion in the sample. In the presence of unit nonresponse, only a subset sr of s is observed, ˆ which makes it impossible to compute tπ . Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 4 / 37
  • 5. Introduction To define a nonresponse adjusted estimator of ty , we assume that a vector of auxiliary variables x is available for k ∈ sr ; the vector of population totals tx = k∈U xk is known; In practice, the x-vector is often defined by survey managers, who wish to ensure consistency between survey weighted estimates and known population totals for some important variables (e.g., age and sex). In addition, we assume that a vector of instrumental variables z is available for k ∈ sr ; same dimension as x,; The z-vector needs only to be available for the respondents. The instrumental variables are believed to be associated with the propensity of units to respond to the survey. Let Rk be a response indicator attached to unit k such that Rk = 1 if unit k is a respondent Rk = 0 otherwise. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 5 / 37
  • 6. Introduction ˆ Instrumental calibration estimator tC We consider an instrumental calibration estimator (Deville(1998, 2002)) of the form ˆ tC = wk Rk yk , k∈s where wk = dk F λ⊤ zk , r F (.) is a function which is monotonic and twice differentiable. F λ⊤ zk : weighting adjustment factor which is essentially an r estimate of the inverse of the response probability for unit k. The weights wk are constructed so that the calibration constraints wk xk = xk k∈sr k∈U are satisfied. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 6 / 37
  • 7. Introduction Remarks: Linear weighting is a special case for which the weights wk are given by wk = dk 1 + λ⊤ zk . r When x is used in the calibration instead of z then we have a usual calibration estimator and wk = dk F λ⊤ xk . r Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 7 / 37
  • 8. Introduction Error decomposition ˆ The total error of tC can be expressed as ˆ ˆ tC − ty = (tπ − ty ) + ˆ ˆ (tC − tπ ) . sampling error nonresponse error Since the sampling error does not depend on nonresponse, we focus on the nonresponse error in the sequel. Without loss of generality, we consider the case of a census s = U so ˆ that the sampling error, tπ − ty , is equal to zero. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 8 / 37
  • 9. Introduction First approach: good specification of the y model ˆ Regardless of the choice of F (.), the instrument vector calibration, tC , perfectly estimates ty if the variable of interest y is perfectly explained by the x-vector, i.e., yk = x⊤ β k for some vector β. ˆ Hence, we expect tC , to exhibit a small bias if the y-variable and the x-vector are linearly related and the relationship is strong. However, in multipurpose surveys, the number of variables of interest is typically large (possibly few hundred) and therefore, it is unrealistic to presume that the x-vector is linearly related to all y-variables, in which case some estimates could suffer from bias. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 9 / 37
  • 10. Introduction Second approach: estimation of the propensity of response For linear weighting, Särndal and Lundström (2005, Chapter 9) ˆ showed that, tC is asymptotically unbiased for ty for every y-variable provided that the response probability of unit k, pk , is such that −1 pk = 1 + λ⊤ zk for all k ∈ U ; (1) for a vector of unknown constants λ; see also Kott and Liao (2012) for a discussion for nonlinear weighting. However, in practice, it is not clear how to validate the form of the relationship in (1) since the z-vector is available for the respondents only. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 10 / 37
  • 11. Introduction The purpose of this presentation is to examine the so-called problem of bias amplification in the context of instrument vector calibration. In the context of epidemiological studies, it has been found that, including instrumental variables in the set of conditioning variables, can increase unmeasured confounding bias; see Bahattacharya and Vogt (2007), Wooldridge (2009), Pearl (2010) and Myers et al. (2011). We argue that the same is true in the context of instrument vector calibration. Some preliminary studies in this direction can be found in Lesage (2012) and Osier (2012). Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 11 / 37
  • 12. Underlying models Superpopulation model Let (yk , zk )⊤ be a realisation of the vector of random variables (Yk , Zk )⊤ , k ∈ U. Without loss of generality, we assume that E (Zk ) = 0 and V (Zk ) = 1. Further, we assume that the relationship between Y and Z can be modeled using y Yk = β0 + β1 Zk + εk such that E(εy | Zk ) = 0. k This model is often called a prediction model or outcome regression model. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 12 / 37
  • 13. Underlying models Nonresponse model We also assume the following nonresponse model: Rk = γ0 + γ1 Zk + εR k E(εR | Zk ) = 0. k We assume that y is not a direct explanatory variable of nonresponse cov (Yk Rk | Zk ) = 0. Remarks The nonresponse model states that the response indicators Rk are linearly related to Z. Although this relationship may seem awkward, it will be useful to study the problem of bias amplification. A more realistic nonresponse model, namely the logistic model, is considered in the empirical study. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 13 / 37
  • 14. Underlying models Y β1 Z γ1 R Figure: Graph of the variables y, z et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 14 / 37
  • 15. Bias amplification of the instrumental calibration estimators Naive estimator We consider the naive estimator k∈U y k Rk ˆ tnaive = N × . k∈U Rk We have: ˆ tnaive ty cov(Yk Rk ) − = + oP (1) N N γ0 β1 γ1 = + oP (1) γ0 √ Example: with γ0 = 0.5, γ1 = 3/10, β0 = 10 and β1 = 2 γ1 β1 × = 6.9%. γ0 β0 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 15 / 37
  • 16. Bias amplification of the instrumental calibration estimators Instrument vector calibration estimators We suppose that a proxy variable of z, denoted x, is available. Definition A proxy variable of z , in nonresponse context, is a variable x such that: 1 x is an auxiliary variable which we know the population total tx ; 2 cor(Xk , Zk ) = 0; 3 cov (Xk Zk | [Rk = 1]) = 0. We assume that the relationship between X and Z can be modeled using Xk = α0 + α1 Zk + εx k E (εx | Zk ) = 0, k V (εx ) = σx = 1 − α2 k 2 1 Remarks: V (Xk ) = 1 and cor(Xk , Zk ) = α1 . Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 16 / 37
  • 17. Bias amplification of the instrumental calibration estimators Instrument vector calibration estimators We assume also that x is not a direct explanatory variable of the nonresponse cov (Xk Rk | Zk ) = 0. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 17 / 37
  • 18. Bias amplification of the instrumental calibration estimators Y β1 Z γ1 R Figure: Graph of the variables y, z, x et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 18 / 37
  • 19. Bias amplification of the instrumental calibration estimators Y β1 α1 Z X γ1 R Figure: Graph of the variables y, z, x et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 19 / 37
  • 20. Bias amplification of the instrumental calibration estimators Instrument vector calibration estimators We consider the instrument vector calibration estimator with linear weighting  −1 ˆ tC = t⊤  x z k x⊤  k z k yk k∈sr k∈sr where xk = (1, xk )⊤ ; z k = (1, zk )⊤ ; tx = (N, tx )⊤ . Since cov (Xk , Rk | Zk ) = 0, we have: ˆ tC ty − = oP (1). N N Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 20 / 37
  • 21. Bias amplification of the instrumental calibration estimators What if... It is not trivial to verifiy if cov (Xk , Rk | Zk ) = 0, since the variable z is available only for the respondents. What if cov (Xk , Rk | Zk ) = 0? Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 21 / 37
  • 22. Bias amplification of the instrumental calibration estimators Y β1 α1 Z X U γ1 R Figure: Graph of the variables y, z, x, u et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 22 / 37
  • 23. Bias amplification of the instrumental calibration estimators Y β1 α1 α2 Z X U γ1 γ2 R cov(Rk Xk | Z k ) = α2 γ2 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 23 / 37
  • 24. Bias amplification of the instrumental calibration estimators We now assume that it exists a non-observe variable u, independent of z and y, that is an explanatory variable in the nonresponse model. Without loss of generality, we assume that E (Uk | Zk , Yk ) = 0 and V (Uk | Zk , Yk ) = 1. The nonresponse model is rewritten Rk = γ0 + γ1 Zk + γ2 Uk + εR k E εR | Zk , Uk = 0. k We still assume that cov (Yk Rk | Zk ) = 0. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 24 / 37
  • 25. Bias amplification of the instrumental calibration estimators Moreover, we assume that the variable x is linked to the variable u Xk = α0 + α1 Zk + α2 Uk + εx k E εX | Zk , Uk = 0 k V (εx ) = σx = 1 − α2 − α2 k 2 1 2 R x E εk εk | Zk , Uk = 0 Then we have cov(Rk Xk | Z k ) = α2 γ2 . Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 25 / 37
  • 26. Bias amplification of the instrumental calibration estimators We have: ˆ tC ty β1 1 E (Zk Xk | [Rk = 1]) − = − cov(Rk Xk | Z k ) N N α1 E (Rk ) cov(Zk Xk | [Rk = 1]) + oP (1) γ1 α1 + α0 β1 α2 γ2 γ0 = − γ0 α1 γ1 γ1 γ2 α1 − α1 + α2 γ0 γ0 γ0 + oP (1) If α2 γ2 = 0 then the instruments vector calibration is “biased”; The “bias” is amplified if α1 is small (i.e. weak proxy). Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 26 / 37
  • 27. Bias amplification of the instrumental calibration estimators Bias amplification for weak proxy with the instrument vector calibration estimator H HH α2 0 0.1 0.3 0.7 α1 HHH 0.7 0 -0.8 -2.3 -5.8 0.3 0 -1.8 -5.8 -15.5 0.1 0 -5.8 -21.7 -101 Y β1 α1 α2 Z X U γ1 γ2 R Éric LESAGE Figure: Graph CREST(ENSAE-ENSAI) z, x, u et R31 janvier 2013 (CREST-ENSAI) of the variables y, 27 / 37
  • 28. Bias amplification of the instrumental calibration estimators Usual calibration estimators We have seen that instruments vector calibration could lead to estimators with large biases. Would a simple calibration protect against such bias amplification risk?  −1 ˆ tC = t⊤  x xk x⊤  k xk y k . (2) N k∈sr k∈sr Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 28 / 37
  • 29. Bias amplification of the instrumental calibration estimators The simple calibration estimator is asymptotically biased ˆ 2 tC ty β1 γ1 σx − α2 (α1 γ2 − α2 γ1 ) + B − = N N γ0 β0 γ1 γ2 2 1 − α1 + α2 γ0 γ0 + oP (1), where B = α0 α2 γ2 (α1 γ1 + α2 γ2 ) − γ1 1 − (α2 + α2 ) 1 2 is a nul term when α0 = 0, but it offers a protection against bias amplification. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 29 / 37
  • 30. Bias amplification of the instrumental calibration estimators No bias amplification for weak proxy with the simple calibration estimator The usual calibration has a bias similar to the bias of the naive estimator. This bias is not amplified with the decrease of the correlation, α1 , between x and z. α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.7 0.7 3.8 3.5 2.8 1.5 0.3 6.4 6.3 6.1 5.7 0.1 6.9 6.8 6.8 6.8 Table: Asymptotic relative bias (in %) of the simple calibration for different values of the parameters α1 and α2 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 30 / 37
  • 31. Simulation study Simulation study We generated a population U of size N = 1 000 consisting of a variable of interest Y , several proxy variables denoted X (α1 ,α2 ) where α1 ∈ {0.2, 0.3, 0.5, 0.7} and α2 ∈ {0, 0.1, 0.3, 0.5}, an instrumental variable Z and an unobserved variable U. First, the variables Z and U were generated from a uniform √ √ distribution − 3, 3 , which led to mean equal to zero and variance equal to 1. Then, given the z-values, the y-values were generated according to the linear regression model Yk = 10 + 2zk + εy , k where εy is normally distributed with mean 0 and variance 1. k The resulting coefficient of determination was equal to 79.2%. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 31 / 37
  • 32. Simulation study Finally, the proxy variables x(α1 ,α2 ) -values were generated according to the linear regression models (α ,α ) (α ,α ) Xk 1 2 = α1 zk + α2 uk + σ(α1 ,α2 ) εk 1 2 2 where σ1 (α1 , α2 ) = 1 − α2 − α2 1 2 and ε(α1 ,α2 ) was normally distributed with mean 0 and variance 1. In order to focus on the nonresponse error, we considered the census case; i.e., n = N = 1 000. Each unit was assigned a response probability by logit(pk ) = 1.5zk + uk Then, the response indicators Rk for k ∈ U were generated independently from a Bernoulli distribution with parameter pk . This whole process was repeated K = 10 000 times leading to K = 10 000 sets of respondents. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 32 / 37
  • 33. Simulation study For each simulation, we computed instruments vector calibration estimators ˆ denoted tC (α1 , α2 ) where α1 ∈ {0.2, 0.3, 0.5, 0.7} and α2 ∈ {0, 0.1, 0.3, 0.5}:  −1 ⊤ ˆ N (α1 ,α2 )⊤  tC (α1 , α2 ) =  z k xk z k yk . tx(α1 ,α2 ) k∈sr k∈sr We computed: ˆ the Monte Carlo percent relative bias: RBM C tC (α1 , α2 ) the Monte Carlo percent coefficient of variation (CV): ˆ CVM C tC (α1 , α2 ) Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 33 / 37
  • 34. Simulation study Monte Carlo relative bias ˆ tC (α1 , α2 ) − ty ˆ RBM C tC (α1 , α2 ) = EM C × 100. ty Monte Carlo CV ˆ VM C tC (α1 , α2 ) − ty ˆ CVM C tC (α1 , α2 ) = × 100. EM C (ty ) Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 34 / 37
  • 35. Simulation study ˆ RBM C (tC ) (in %) ˆ CVM C (tC ) (in %) α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.5 0.7 0.02 −0.9 −2.8 −4.9 (0.9) (0.9) (1.0) (1.1) 0.5 −0.1 −1.3 −4.1 −7.2 (1.4) (1.5) (1.7) (2.1) 0.3 −0.2 −2.4 −7.5 −14.0 (2.6) (3.0) (4.1) (5.9) 0.2 −0.6 −4.5 −13.8 −27.4 (4.6) (15.6) (61.9) (65.6) Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 35 / 37
  • 36. Simulation study Conclusion Instrument vector calibration is a good technique to adjust for nonresponse under certain conditions such as cov (Xk , Rk | Zk ) = 0 or at least α1 large. otherwise, one can get bias and variance amplification. Y β1 α1 large Z X γ1 R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 36 / 37
  • 37. Simulation study Merci de votre attention. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 37 / 37