Stata Blogger?

Are you a Stata blogger? Join the aggregator!

Quandl Package – 5,000,000 free datasets at the tip of your fingers!

# Yes, you read that correctly and no Quandl (http://www.quandl.com/) did not pay me anything.# Quandl is a new database management tool which seeks to become the place to find datasets.  They boast of having over 5×10^6 data sets available t…

Continue reading Quandl Package – 5,000,000 free datasets at the tip of your fingers!

The effect of non-convergence on MLE estimates

* Maximum likelihood proceedures have become widely used to solve a variety of econometric problems.

* Unfortunately there is no guarantee that these proceedures will yeild a single solution which satisfies the convergence criteria of the maximizing function.

* This might occur for reasons difficult to detect such as localy flat spots or discontinuous areas.

* Maximization proceedures are usually evaluated based on their 1. efficiency (speed of convergence) or 2. on their robustness at detecting optimal values.

* The problem is that sometimes in simulation we need to limit the time a MLE proceedure takes in attempting to find a solution.

* What effect does that limitation result in and what do we do with estimates that result from non-convergence?
* 1. Keep them or 2. throw them out

* This simulation will explore both options

* In this simulation we will fall back on the widely used estimator which is equivalent when the standard errors are not structurally estimated to the OLS estimator.

* This is the “normal” regression estimator.  IE the MLE maximization that allows for linearly modeled heteroskedasticity.
cap program drop myNormalReg
program define myNormalReg
  args lnlk xb sigma2
  qui replace `lnlk’ = -ln(sqrt(`sigma2′*2*_pi)) – ($ML_y-`xb’)^2/(2*`sigma2′)
end

* First let’s generate a sample data set
clear
set obs 300

* I am going to try to make the problem hard to solve by including both addative and multiplicative error.
gen u = (runiform()-.5)
  * I made this error small because actually when the error is small it is harder to estimate the variance of the error.
  * It takes a little work with simulations to generate data which does not converge.

gen v1 = rnormal()
gen x1 = runiform()-.5

gen v2 = rnormal()
gen x2 = runiform()-.5

gen y = 3 + (1+v1)*x1 + (1+v2)*x2 + u

reg y x1 x2

ml model lf myNormalReg (reg: y=x1 x2) (sigma2:)
ml maximize

* This is the more efficient model because it is explicitly modeling the error.
gen x1_2 = x1^2
gen x2_2 = x2^2

ml model lf myNormalReg (reg: y=x1 x2) (sigma2: x1_2 x2_2)
ml maximize

* It seems that typically this model frequently converges.

* Let’s see if we can’t dilute the maximization:
gen x1x2 = x1*x2

gen x1abs = abs(x1)
gen x2abs = abs(x2)

ml model lf myNormalReg (reg: y=x1 x2 x1_2 x2_2 x1x2 x1abs x2abs) (sigma2: x1 x2 x1_2 x2_2 x1x2 x1abs x2abs)
ml maximize, iterate(100)

* It looks to me about half the time I run this code it does not converge within a 100 iterations.

* Now lets specify our functions we will use to test differences in results depending upon our method of dealing with convergence.

* This program is just a condensation of the above code.
cap program drop sim_converge
program define sim_converge

  clear
  set obs 300

  gen u = (runiform()-.5)

  gen v1 = rnormal()
  gen x1 = runiform()-.5

  gen v2 = rnormal()
  gen x2 = runiform()-.5
  gen y = 3 + (1+v1)*x1 + (1+v2)*x2 + u
  gen x1_2 = x1^2
  gen x2_2 = x2^2

  ml model lf myNormalReg (reg: y=x1 x2) (sigma2: x1_2 x2_2)
  ml maximize

  gen x1x2 = x1*x2

  gen x1abs = abs(x1)
  gen x2abs = abs(x2)

  ml model lf myNormalReg (reg: y=x1 x2 x1_2 x2_2 x1x2 x1abs x2abs) (sigma2: x1 x2 x1_2 x2_2 x1x2 x1abs x2abs)
  ml maximize, iterate(`1′)
  * The only difference is that iterate is specified by the user.
end

* Leaving the first argument blank will not specify a maximum convergence iteration.
sim_converge
sim_converge 50

* Let’s first define what we would like to save from the MLE.
* Yes, I am going to use a forbidden global :)
gl savelist ic=e(ic)
  * e(ic) is the macro in which the number of iterations used is saved.

 foreach i in reg sigma2 {
   foreach v in x1 x2 x1_2 x2_2 x1x2 x1abs x2abs {
   gl savelist $savelist `i’`v’=[`i']_b[`v']
   }
 }

 * Let’s see what our savelist looks like:
 di “${savelist}”

 * looking pretty good.

simulate ${savelist} , rep(100) seed(32): sim_converge 50
tab ic
* In my simulation 51 times the MLE did not converge by the 50th iteration.

* Let’s see if there are systematic differences between estimates.
sum if ic==50
sum if ic  * We can see that if the estimator did converge then it is much more precise (smaller sd) than in the cases when it did not converge.
 
  * The mean estimates of regx1 and regx2 and sigmax1_2 and sigmax2_2 are much closer to 1 which is the true parameter values.
 
* Let’s try it again setting convergence at a higher bar:
simulate ${savelist} , rep(100) seed(32): sim_converge 250
tab ic
sum if ic==250
sum if ic
* Raising the max iteration does not lead to any of the observations converging.

* This is problematic because we want to know if there is a systematic difference in the draws for the estimates which converged and those that did not.

* By the results so far we might be tempted just to include the results of the iterations that did converge.

* First off let’s see if the estimates that converged quickly are better or worse than those that converged more slowly.

recode ic (1/14=0) (15/49=1), gen(grp)

bysort grp: sum regx1 regx2
anova regx1 grp if grp* It seems there is no detectable difference in the means for those observations that converged more quickly than 15 iterations than those that converged more slowly.

* This implies, assuming the results are generalizable that truncating the simulation to only the results that converge might produce unbiased estimates.

* We should run the simulation again with more repetitions in order to confirm this.

simulate ${savelist} , rep(500) seed(32): sim_converge 50
  tab ic
 
/* tab ic

      e(ic) |      Freq.     Percent        Cum.
————+———————————–
         10 |          1        0.20        0.20
         11 |         18        3.63        3.83
         12 |         24        4.84        8.67
         13 |         48        9.68       18.35
         14 |         57       11.49       29.84
         15 |         50       10.08       39.92
         16 |         24        4.84       44.76
         17 |         10        2.02       46.77
         18 |         18        3.63       50.40
         19 |         10        2.02       52.42
         20 |          4        0.81       53.23
         21 |          6        1.21       54.44
         22 |          1        0.20       54.64
         23 |          3        0.60       55.24
         24 |          4        0.81       56.05
         25 |          2        0.40       56.45
         26 |          3        0.60       57.06
         29 |          1        0.20       57.26
         32 |          1        0.20       57.46
         33 |          1        0.20       57.66
         50 |        210       42.34      100.00
————+———————————–
      Total |        496      100.00
*/

  sum if ic==50
  sum if ic
  recode ic (1/14=0) (15/49=1), gen(grp)

  bysort grp: sum regx1 regx2

/*
-> grp = 0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
————-+——————————————————–
       regx1 |       148    .9918629    .1076053    .732878   1.296374
       regx2 |       148    .9807507     .112081    .684716   1.447506

—————————————————————————————–
-> grp = 1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
————-+——————————————————–
       regx1 |       138    .9967074    .1183514   .7492483   1.318624
       regx2 |       138    .9913069    .1161812   .6711526   1.407439

—————————————————————————————–
-> grp = 50

    Variable |       Obs        Mean    Std. Dev.       Min        Max
————-+——————————————————–
       regx1 |       210     .814391    .3268973   .0518116   1.590542
       regx2 |       210    .8106436    .3278075   .0707162   1.775239

*/
  anova regx1 grp if grp
/*
                         Number of obs =     286     R-squared     =  0.0005
                           Root MSE      = .112917     Adj R-squared = -0.0031

                  Source |  Partial SS    df       MS           F     Prob > F
              ———–+—————————————————-
                   Model |  .001675983     1  .001675983       0.13     0.7172
                         |
                     grp |  .001675983     1  .001675983       0.13     0.7172
                         |
                Residual |  3.62106459   284  .012750227  
              ———–+—————————————————-
                   Total |  3.62274057   285   .01271137  
*/

* We can see that even when the sample size is larger (about 140 per iteration group) there is no discernable difference between those draws that converge before the first 15 iterations and those that converge after.

* If we assume that for those draws that do not converge within 250 draws are sampling from the same probability of convergence distribution then this evidence suggests that rate of convergence is independent of the actual estimates and therefore it might be safe to exclude observations in which convergence did not occur.

* Now finally what we might interested in is seeing how our estimates change (for the values in which convergence is achieved) as we include more and more estimates by way of increasing our threshold of max iterations.

* How do we do this?

* Well, this might take a little while but we basically loop through the simulation saving the results it terms of means and standard deviations from each run.

forv i = 15(5)35 {
  simulate ${savelist} , rep(100) seed(32): sim_converge `i’
  sum regx1 if ic<`i’
    global mean_x1_`i’=r(mean)
    global var_x1_`i’=r(sd)^2
  sum regx2 if ic<`i’
    global mean_x2_`i’=r(mean)
    global var_x2_`i’=r(sd)^2
}
* By the way this is a highly redundant and inefficient method.

clear
set obs 5
gen mean_x1 = .
  label var mean_x1 “Mean of x1 estimates”
gen mean_x2 = .
  label var mean_x2 “Mean of x2 estimates”
gen var_x1 = .
  label var var_x1 “Variance of x1 estimates”
gen var_x2 = .
  label var var_x2 “Variance of x2 estimates”

gen i = .
  label var i “Max # iterations”

* Save the results as variables
forv i = 1(1)5 {
  local ii = 10+`i’*5
  replace mean_x1 = ${mean_x1_`ii’} if _n==`i’
  replace mean_x2 = ${mean_x2_`ii’} if _n==`i’
  replace var_x1  = ${var_x1_`ii’}  if _n==`i’
  replace var_x2  = ${var_x2_`ii’}  if _n==`i’
  replace i = `ii’ if _n==`i’
}

two (connected mean_x1 i, msize(large) lwidth(thick)) ///
    (connected mean_x2 i, msize(large) lwidth(thick)), name(means, replace)
two (connected var_x1  i, msize(large) lwidth(thick)) ///
    (connected var_x2  i, msize(large) lwidth(thick)), name(vars,  replace)

graph combine means vars, col(1) title(“Estimates are insensitive to speed of convergence”)

* The take away seems to be that it is safe (at least in this simulation) to exclude from your analysis simulations that did not converge in the specified iteration count.

* This simulation also suggests that it is not ideal to include in your results MLE estimates from iterations in which no convergence was achieved.
50>50>250>250>50>

Continue reading The effect of non-convergence on MLE estimates

Capacity for Love and Prejudice – Stata Simulation

* This is my simple hypothesis (really my own personal prejudice):* the more prejudice someone allows themselves to be the less capacity they have for love.* In this simulation I will attempt to generate a graph which convey’s this idea with a pink equ…

Continue reading Capacity for Love and Prejudice – Stata Simulation

ML and initial values

* This post explores the setting of initial values with the ML command.* There are several reasons you may be interested in doing this.* 1. You are having difficulty with your data converging and you think that*        you…

Continue reading ML and initial values

asim command

* After using the msim command from a previous post, I decided I did not like having many different variables for each simulation run.* So I have recoded an alternative command called asim* This command allow subsequent simulate commands to b…

Continue reading asim command

My not so Brief Stata Formatting Guide

* I write this as a short guide though I do not always stick to it. This post was inspired by a thoughtful discussion on the linkedin Stata-Users group.* The number one rule is: Always, always comment your code!* See my reasons:* http://www.econometric…

Continue reading My not so Brief Stata Formatting Guide

MSimulate Command

CLT is a powerful property* Stata has a wonderfully effective simulate function that allows users to easily simulate data and analysis in a very rapid fashion.* The only drawback is that when you run it, it will replace the data in memory with the simu…

Continue reading MSimulate Command

Endogenous Binary Regressors


* Often times we are interested in estimating the effect of a binary endogenous regressor on a binary outcome variable.

* It is not obvious how to simulate data that will fit the criteria specifications that we desire.

* First let’s think about the standard IV setup.

* y = b0 + xb1 + wb2 + u
* w = g0 + zg1 + v

* With u and v distributed normally and independently of x, w, and z.

* We know this setup is generally not correct if either y is binary or w is binary .

* When y is binary we might choose to use a MLE probit estimator with the following assumptions:

* P(y=1) = normal(b0 + xb1 + wb2)
* P(w=1) = normal(g0 + zg1)

* However, it is not easy to generate data in this form.

* Instead we will generate data introducing endogeneity by use of an unobserved addative error.

clear
set obs 1000

gen z = rnormal()
  label var z “Exogenous instrument”

gen v = rnormal()
  label var v “Unobserved error”

gen wp = normal(-.5 + z*1.25 + 1*v)
  label var wp “Endogenous variable”

gen w = rbinomial(1, wp)

* The above equation should not be expected to estimate a coefficient of 1.25 on the z variable.
* This is because of the addative error term v which contributes to the implicit error of the normal CDF to create an error with a total variance of 1 + 1 = 2
* Thus, when the probit estimator is run it automatically scales the equation to be unit.
* We can discover consistent estimator by rescaling the coefficient on z to the true standard deviation.
di -.5/(2^.5)  ” for the constant”
di 1.25/(2^.5) ” for the coefficient on z”

gen x1 = rnormal()
  label var x1 “Exogenous variable”

probit w z
  * Pretty close estimates to what we expect.

gen yp = normal(1 + w*1.5 + x1 – (2^.5)*v)
  * Now we are including the v term in the generation of y in order to introduce an endogenous correlation between w and y.
  * We need to adjust our estimated coefficients.
  di “Variance of the unobserved =” 1^2 + 2^.5^2
  di “Standard deviation =” (1^2 + 2^.5^2)^.5
  di “Constant coefficient=” 1/(1^2 + 2^.5^2)^.5
  di “x coefficient=” 1/(1^2 + 2^.5^2)^.5
  local w_est =1.5/(1^2 + 2^.5^2)^.5
  di “w coefficient=” `w_est’

gen y = rbinomial(1, yp)

* First let’s see what happens if we neglect to make any effort at controlling for the endogeneity.
probit y w x1

test w= `w_est’

* Our estimate of the coefficient on w is way too small

* In order to estimate our relationship in a consistent manner we will use the biprobit command.
* This command effectively estimates two separate probit regressions with the allowance that the unobserved outcomes be correlated with the parameter /athrho.
* In this case, the unobserved component from the w estimation equation is the endogenous component from the y estimation equation.
* Since, w only is entering y linearly it is sufficient that the unobserved portion of w correlated with y is in a sense controlled for through the joint probit regression.
biprobit (y = w x1) (w=z x1)
test w= `w_est’

* We can test the “endogeneity” of w by testing the significance of /athrho.  Which appears in this case to be quite significant.

* Not bad, we fail to reject there being any difference between our estimate of the coefficient on w and the true.

* This is working well though with 1,000 observations.  It seemed to be extremely ineffective with 100 observations however.

* What do we do if there are multiple endogenous binary variables?

* Let’s generate similar data:
clear
set obs 1000

gen z1 = rnormal()
gen z2 = rnormal()

gen v1 = rnormal()
gen v2 = rnormal()

gen x1 = rnormal()

gen wp1 = normal(-.5 + z1*.5 – z2*.2 + .5^.5*v1)
gen w1 = rbinomial(1, wp1)
  label var w1 “Endogenous variable 1″

gen wp2 = normal(.75 – z1*.5 + z2*.2 + .5^.5*v2)
gen w2 = rbinomial(1, wp2)
  label var w2 “Endogenous variable 2″

gen yp = normal(.1 + w1*.7 + w2*1 + .5*x1 – .5^.5*v1 + .5^.5*v2)
gen y = rbinomial(1, yp)

* Once again we must adjust our expectation of the coefficients
local var_unob = 1+.5^.5^2+.5^.5^2
di “Variance of unobservables =” `var_unob’
  local est_b0 = 1/(`var_unob’)^.5
di “Constant coefficient =” `est_b0′
  local est_w1 = 1/(`var_unob’)^.5
di “w1 coefficient coefficient =” `est_w1′
  local est_w2 = .7/(`var_unob’)^.5
di “w2 coefficient coefficient =” `est_w2′
  local est_x1 = .5/(`var_unob’)^.5
di “x1 coefficient coefficient =” `est_x1′

probit y w1 w2 x1
test [y]w1=`est_w1′
test [y]w2=`est_w2′
test [y]_cons=`est_b0′
test [y]x1=`est_x1′

* The majority of estimates are not working well.

* Let’s try doing the joint MLE

* Previously the biprobit was sufficient.  However, biprobit only allow for a two-way probit.

* Fortunately, there is a user written command that uses simulation to approximate a multivariate probit.

* install mvprobit if not yet installed.
* ssc install mvprobit

* The syntax of mvprobit is very similar to that of biprobit

mvprobit (y = w1 w2 x1) (w1=z1 z2 x1) (w2=z1 z2 x1)
test [y]w1=`est_w1′
test [y]w2=`est_w2′
test [y]_cons=`est_b0′
test [y]x1=`est_x1′

* Unfortunately the estimates are still too far away from the true.

* However, they are closer.

* Let’s see how the fitted probability line compares with the true.
predict yp_hat
replace yp_hat = normal(yp_hat)

reg yp_hat yp, nocon
predict yp_hat1_hat

two   (scatter yp yp_hat) (line yp yp_hat1_hat if yp_hat1_hat      yscale( range(0 1 ) ) xlabel(0 1) legend(off) title(Predicted probability against true)

1>

* Overall, not a bad fit.

* This might not be sufficient for many applications.

* What happens when one of the endogenous variables is continuous?

Continue reading Endogenous Binary Regressors

New Project – Stata Blog Aggregator

I have decided to start a Stata blog aggregator.  I believe I am the first.  You can find my preliminary efforts at http://www.stata-bloggers.com/The purpose of a blog aggregator is to provide a system by which users are connected with v…

Continue reading New Project – Stata Blog Aggregator

Interpreting the Control Function Coefficient

* Is the control function coefficient a measure of the direction and size of the bias caused by endogeneity?* Imagine the endogenous variable w being composed of three components: 1 endogenous portion, 2 exogenous portion correlated with z, 3 exogenous…

Continue reading Interpreting the Control Function Coefficient