Stata Blogger?

Are you a Stata blogger? Join the aggregator!

Endogenous Binary Regressors


* Often times we are interested in estimating the effect of a binary endogenous regressor on a binary outcome variable.

* It is not obvious how to simulate data that will fit the criteria specifications that we desire.

* First let’s think about the standard IV setup.

* y = b0 + xb1 + wb2 + u
* w = g0 + zg1 + v

* With u and v distributed normally and independently of x, w, and z.

* We know this setup is generally not correct if either y is binary or w is binary .

* When y is binary we might choose to use a MLE probit estimator with the following assumptions:

* P(y=1) = normal(b0 + xb1 + wb2)
* P(w=1) = normal(g0 + zg1)

* However, it is not easy to generate data in this form.

* Instead we will generate data introducing endogeneity by use of an unobserved addative error.

clear
set obs 1000

gen z = rnormal()
  label var z “Exogenous instrument”

gen v = rnormal()
  label var v “Unobserved error”

gen wp = normal(-.5 + z*1.25 + 1*v)
  label var wp “Endogenous variable”

gen w = rbinomial(1, wp)

* The above equation should not be expected to estimate a coefficient of 1.25 on the z variable.
* This is because of the addative error term v which contributes to the implicit error of the normal CDF to create an error with a total variance of 1 + 1 = 2
* Thus, when the probit estimator is run it automatically scales the equation to be unit.
* We can discover consistent estimator by rescaling the coefficient on z to the true standard deviation.
di -.5/(2^.5)  ” for the constant”
di 1.25/(2^.5) ” for the coefficient on z”

gen x1 = rnormal()
  label var x1 “Exogenous variable”

probit w z
  * Pretty close estimates to what we expect.

gen yp = normal(1 + w*1.5 + x1 – (2^.5)*v)
  * Now we are including the v term in the generation of y in order to introduce an endogenous correlation between w and y.
  * We need to adjust our estimated coefficients.
  di “Variance of the unobserved =” 1^2 + 2^.5^2
  di “Standard deviation =” (1^2 + 2^.5^2)^.5
  di “Constant coefficient=” 1/(1^2 + 2^.5^2)^.5
  di “x coefficient=” 1/(1^2 + 2^.5^2)^.5
  local w_est =1.5/(1^2 + 2^.5^2)^.5
  di “w coefficient=” `w_est’

gen y = rbinomial(1, yp)

* First let’s see what happens if we neglect to make any effort at controlling for the endogeneity.
probit y w x1

test w= `w_est’

* Our estimate of the coefficient on w is way too small

* In order to estimate our relationship in a consistent manner we will use the biprobit command.
* This command effectively estimates two separate probit regressions with the allowance that the unobserved outcomes be correlated with the parameter /athrho.
* In this case, the unobserved component from the w estimation equation is the endogenous component from the y estimation equation.
* Since, w only is entering y linearly it is sufficient that the unobserved portion of w correlated with y is in a sense controlled for through the joint probit regression.
biprobit (y = w x1) (w=z x1)
test w= `w_est’

* We can test the “endogeneity” of w by testing the significance of /athrho.  Which appears in this case to be quite significant.

* Not bad, we fail to reject there being any difference between our estimate of the coefficient on w and the true.

* This is working well though with 1,000 observations.  It seemed to be extremely ineffective with 100 observations however.

* What do we do if there are multiple endogenous binary variables?

* Let’s generate similar data:
clear
set obs 1000

gen z1 = rnormal()
gen z2 = rnormal()

gen v1 = rnormal()
gen v2 = rnormal()

gen x1 = rnormal()

gen wp1 = normal(-.5 + z1*.5 – z2*.2 + .5^.5*v1)
gen w1 = rbinomial(1, wp1)
  label var w1 “Endogenous variable 1″

gen wp2 = normal(.75 – z1*.5 + z2*.2 + .5^.5*v2)
gen w2 = rbinomial(1, wp2)
  label var w2 “Endogenous variable 2″

gen yp = normal(.1 + w1*.7 + w2*1 + .5*x1 – .5^.5*v1 + .5^.5*v2)
gen y = rbinomial(1, yp)

* Once again we must adjust our expectation of the coefficients
local var_unob = 1+.5^.5^2+.5^.5^2
di “Variance of unobservables =” `var_unob’
  local est_b0 = 1/(`var_unob’)^.5
di “Constant coefficient =” `est_b0′
  local est_w1 = 1/(`var_unob’)^.5
di “w1 coefficient coefficient =” `est_w1′
  local est_w2 = .7/(`var_unob’)^.5
di “w2 coefficient coefficient =” `est_w2′
  local est_x1 = .5/(`var_unob’)^.5
di “x1 coefficient coefficient =” `est_x1′

probit y w1 w2 x1
test [y]w1=`est_w1′
test [y]w2=`est_w2′
test [y]_cons=`est_b0′
test [y]x1=`est_x1′

* The majority of estimates are not working well.

* Let’s try doing the joint MLE

* Previously the biprobit was sufficient.  However, biprobit only allow for a two-way probit.

* Fortunately, there is a user written command that uses simulation to approximate a multivariate probit.

* install mvprobit if not yet installed.
* ssc install mvprobit

* The syntax of mvprobit is very similar to that of biprobit

mvprobit (y = w1 w2 x1) (w1=z1 z2 x1) (w2=z1 z2 x1)
test [y]w1=`est_w1′
test [y]w2=`est_w2′
test [y]_cons=`est_b0′
test [y]x1=`est_x1′

* Unfortunately the estimates are still too far away from the true.

* However, they are closer.

* Let’s see how the fitted probability line compares with the true.
predict yp_hat
replace yp_hat = normal(yp_hat)

reg yp_hat yp, nocon
predict yp_hat1_hat

two   (scatter yp yp_hat) (line yp yp_hat1_hat if yp_hat1_hat      yscale( range(0 1 ) ) xlabel(0 1) legend(off) title(Predicted probability against true)

1>

* Overall, not a bad fit.

* This might not be sufficient for many applications.

* What happens when one of the endogenous variables is continuous?

Continue reading Endogenous Binary Regressors

Sticky Probit – clustered bootstrapped standard errors

do file

* There exist numerous estimators which can be used for a variety of special circumstances.

* If these estimators have been developed by an econometrician, then the econometrician has probably done the hard work of proving consistency (or unbiasedness) and estimated an asymptotically valid standard error estimator under well defined conditions.

* However, sometimes we want to design our own estimator and before doing the hard work of figuring out its theoretical properties, we would first like to see how well it works with simulated data.

* This post will go through one example of how this process may work.

* First let us imagine that I have a new estimator which is a combination of a linear probability model and a probit model.  I will call this estimator a sticky probit.  It is defined as prob(Y(t)=1)=ZA + (1- 1(k x n) A)* NormalCDF(XB) where Z is a K by N matrix of binary explanatory variables.  X is a L by N matrix of explanatory variables as well.  This formulation will become clear in a moment.

* In the event that Z only has one variable then prob(Y(t)=1)=ZA + (1-A)* NormalCDF(XB).  I was thinking that an interesting case would be when Z=Y(t-1).  Thus it would allow for an estimation easily interpretable time dependent outcomes.  For instance, imagine in the “real” world that probability of renting or owning a home was 90% dependent entirely upon what your previous period’s decision was.

* Let’s see this in action.

* p = zA + (1-zA)*normal(xB)

* First let’s program an MLE estimator to solve our “sticky probit”

set more off

cap program drop stickyprobit
program define stickyprobit
  * Tell Stata that this code is written in version 11
  version 11
  args ln_likehood xB zA
  qui replace `ln_likehood’ = ln(`zA’ + (1-`zA’[1])*normal(`xB’)) if $ML_y==1
  qui replace `ln_likehood’ = ln(1-(`zA’ + (1-`zA’[1])*normal(`xB’))) if $ML_y==0
  * This only works if we are only estimating a single coefficient in zA.
  * If anybody knows how to generalize the above statement to allow for any number of parameters in zA to be estimated, please post it was a comment.

end

* Let’s generate some sample data

clear

* We have 500 panel level observations
set obs 500

gen id=_n

* We have some random continuous explanatory variables
gen x1 = rnormal()
gen x2 = rnormal()

* Let’s first just think about A being a single constant.  So A=.5 is telling us that 50% of the decision to buy or rent this period will be dependent upon the decision the previous period.
gen A = .5

* The decision to rent at t=0 we will say is unobserved but that it enters the probability of renting this period in the form of the binomail draw.
gen p=A*rbinomial(1,.5) + (1-A)*normal(.5*x1 – .5*x2)

* Now we simulate the decision to rent at time period t=1
gen rent=rbinomial(1,p)

* We expand the data to make 150 time periods
expand 50
bysort id: gen t=_n

* Now we generate time varying explanatory variables
replace x1 = rnormal()  if t>1
replace x2 = rnormal()  if t>1

* This part of the data generating process is slightly complicated because in each period the decision to rent or buy is dependent upon the previous period’s decision.
forv i=2/50 {
  qui replace p=A*rent[_n-1] + (1-A)*normal(.5*x1 – .5*x2) if t==`i’
  * The actual decision to rent or not is a binomial draw with the probability of 1 equal to p
  qui replace rent=rbinomial(1,p) if t==`i’
}

**** Done with simulation

* Now let’s estimate

gen rent_lag = rent[_n-1]  if t>1

probit rent rent_lag x1 x2
* This is the benchmark

ml model lf stickyprobit (probit: rent=x1 x2, noconstant) (zA:rent_lag, noconstant)
ml maximize, iterate(10)

* I set iterate to 10 because the ml model has difficulty converging on the solution.
* I think this is because as A gets close to .5 then there is an increasing frequency in which infeasible values are evaluated (those are Ln(p
* Because we know there there is serial correlation of the errors then we cannot trust that standard errors from the maximum likelihood estimator.

* Thus we need to bootstrap clustering at the observation level.

* In order to do this we will need to write a short program

cap program drop bsstickyprobit
program define bsstickyprobit

  ml model lf stickyprobit (probit: rent=x1 x2, noconstant) (zA:rent_lag, noconstant)
  ml maximize, iterate(10)
  * I will set iterate to 10 to speed up the boostrapped standard errors.
end

bs, cluster(id): bsstickyprobit

/* (running bsstickyprobit on estimation sample)

Bootstrap replications (50)
—-+— 1 —+— 2 —+— 3 —+— 4 —+— 5
…………………………………………..    50

Bootstrap results                               Number of obs      =     24500
                                                Replications       =        50
                                                Wald chi2(2)       =     11.92
Log likelihood = -12748.652                     Prob > chi2        =    0.0026

                                    (Replications based on 500 clusters in id)
——————————————————————————
             |   Observed   Bootstrap                         Normal-based
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
————-+—————————————————————-
probit       |
          x1 |   .4972777    .145441     3.42   0.001     .2122186    .7823369
          x2 |  -.5287849   .1586004    -3.33   0.001    -.8396359   -.2179339
————-+—————————————————————-
zA           |
    rent_lag |    .497552   .1232098     4.04   0.000     .2560652    .7390388
——————————————————————————
Warning: convergence not achieved
*/

* Finally, let’s compare our boot strapped errors with simulated errors.

cap program drop simstickyprobit
program define simstickyprobit, rclass

  clear
  set obs 150
  gen id=_n
  gen x1 = rnormal()
  gen x2 = rnormal()
  gen A = .5
  gen p=A*rbinomial(1,.5) + (1-A)*normal(.5*x1 – .5*x2)
  gen rent=rbinomial(1,p)
  expand 150
  bysort id: gen t=_n
  replace x1 = rnormal()  if t>1
  replace x2 = rnormal()  if t>1
  forv i=2/150 {
    qui replace p=A*rent[_n-1] + (1-A)*normal(.5*x1 – .5*x2) if t==`i’
    * The actual decision to rent or not is a binomial draw with the probability of 1 equal to p
    qui replace rent=rbinomial(1,p) if t==`i’
  }
  **** Done with simulation
  gen rent_lag = rent[_n-1]  if t>1

  ml model lf stickyprobit (probit: rent=x1 x2, noconstant) (zA:rent_lag, noconstant)
  ml maximize, iterate(20)

  return scalar b1=[probit]_b[x1]
  return scalar b2=[probit]_b[x2]
  return scalar A=[zA]_b[rent_lag]

end

simstickyprobit
return list
* The list of returned values look good.

simulate A=r(A) b1=r(b1) b2=r(b2), rep(50): simstickyprobit

sum

/*  Variable |       Obs        Mean    Std. Dev.       Min        Max
————-+——————————————————–
           A |        50    .3727054    .1289269     .23698   .5096082
          b1 |        50    .3494328    .1601192   .1524611   .5577423
          b2 |        50    -.348098    .1563565  -.5363456   -.145098
*/

* We can see the standard deviation estimates from the clustered bootstrap are right on.
0>

* It is important to note that the estimator is strongly biased.

Continue reading Sticky Probit – clustered bootstrapped standard errors