<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stata-Bloggers</title>
	<atom:link href="http://stata-bloggers.com/feed" rel="self" type="application/rss+xml" />
	<link>http://stata-bloggers.com</link>
	<description>Stata examples and tutorials contributed by bloggers</description>
	<lastBuildDate>Mon, 20 May 2013 07:46:47 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>R, Stata and matching additional learning costs</title>
		<link>http://srqm.tumblr.com/post/50731828812</link>
		<comments>http://srqm.tumblr.com/post/50731828812#comments</comments>
		<pubDate>Sat, 18 May 2013 15:02:00 +0000</pubDate>
		<dc:creator>SRQM</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Stata]]></category>
		<category><![CDATA[teaching]]></category>

		<guid isPermaLink="false">http://srqm.tumblr.com/post/50731828812</guid>
		<description><![CDATA[<p>Francis Smart recently pointed to an <a href="http://www.econometricsbysimulation.com/2013/05/the-power-of-evaluate-parse-paste.html">important difference</a> between R and Stata from a teaching perspective, which has to do with the additional learning costs of vectorization in R over the single-dataset orientation of Stata.</p> <p>Stata makes it easy to manipulate names, or more specifically, variable names, as in a dataset with three variables for social expenditure called <code>party1 party2 party3</code>. This is common to many empirical preprocessed datasets.</p> <pre><code> // example mvdecode party*, mv(999) </code></pre> <p>Furthermore, Stata works like an accountant&#8217;s book, so all variables belong to a same data object that never needs to be called beyond loading. This naturally suppresses a lot of possibilities, compensated in part by macros and scalars.</p> <pre><code> // example loc regressors "age sex" </code></pre> <p>Macros in particular then branch with loops like the <code>forval</code> and <code>foreach</code> commands to allow more complex data processing. At that level of use, the software is flexible enough for most applied data cleaning.</p> <pre><code> // example forval i = 1/3 { replace socx`i' = socx`i' / 10^6 } </code></pre> <p>To access matrix notation, the Stata user needs to move to Mata syntax, while R immediately offers the user to manipulate objects through vectorization. Thinking in these terms is more demanding as there are more possibilities for errors, starting with calls to undeclared objects.</p> <p>I teach both <a href="http://f.briatte.org/teaching/ida/">R</a> and <a href="http://f.briatte.org/teaching/quanti/">Stata</a>. My experience with social science students is that the additional learning costs of R syntax need to be matched with other benefits to become valuable to them. To me, these benefits lie primordially in the more diverse array of data that R allows to access.</p> <p>Continue reading <a href="http://srqm.tumblr.com/post/50731828812">R, Stata and matching additional learning costs</a></p>]]></description>
				<content:encoded><![CDATA[<p>Francis Smart recently pointed to an <a href="http://www.econometricsbysimulation.com/2013/05/the-power-of-evaluate-parse-paste.html">important difference</a> between R and Stata from a teaching perspective, which has to do with the additional learning costs of vectorization in R over the single-dataset orientation of Stata.</p>

<p>Stata makes it easy to manipulate names, or more specifically, variable names,  as in a dataset with three variables for social expenditure called <code>party1 party2 party3</code>. This is common to many empirical preprocessed datasets.</p>

<pre><code>    // example
    mvdecode party*, mv(999)
</code></pre>

<p>Furthermore, Stata works like an accountant&#8217;s book, so all variables belong to a same data object that never needs to be called beyond loading. This naturally suppresses a lot of possibilities, compensated in part by macros and scalars.</p>

<pre><code>    // example
    loc regressors "age sex"
</code></pre>

<p>Macros in particular then branch with loops like the <code>forval</code> and <code>foreach</code> commands to allow more complex data processing. At that level of use, the software is flexible enough for most applied data cleaning.</p>

<pre><code>    // example
    forval i = 1/3 {
      replace socx`i' = socx`i' / 10^6
    }
</code></pre>

<p>To access matrix notation, the Stata user needs to move to Mata syntax, while R immediately offers the user to manipulate objects through vectorization. Thinking in these terms is more demanding as there are more possibilities for errors, starting with calls to undeclared objects.</p>

<p>I teach both <a href="http://f.briatte.org/teaching/ida/">R</a> and <a href="http://f.briatte.org/teaching/quanti/">Stata</a>. My experience with social science students is that the additional learning costs of R syntax need to be matched with other benefits to become valuable to them. To me, these benefits lie primordially in the more diverse array of data that R allows to access.</p>]]></content:encoded>
			<wfw:commentRss>http://stata-bloggers.com/about/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Quandl Package &#8211; 5,000,000 free datasets at the tip of your fingers!</title>
		<link>http://www.econometricsbysimulation.com/2013/05/quandl-package-5000000-free-datasets-at.html</link>
		<comments>http://www.econometricsbysimulation.com/2013/05/quandl-package-5000000-free-datasets-at.html#comments</comments>
		<pubDate>Sun, 05 May 2013 06:37:00 +0000</pubDate>
		<dc:creator>Francis Smart</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[quandl]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[r packages]]></category>
		<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://stata-bloggers.com/?guid=21b696282037420f12bdda57ca23ebca</guid>
		<description><![CDATA[# Yes, you read that correctly and no Quandl (http://www.quandl.com/)&#160;did not pay me anything.# Quandl is a new database management tool which seeks to become the place to find datasets. &#160;They boast of having over 5x10^6 data sets available t... <p>Continue reading <a href="http://www.econometricsbysimulation.com/2013/05/quandl-package-5000000-free-datasets-at.html">Quandl Package &#8211; 5,000,000 free datasets at the tip of your fingers!</a></p>]]></description>
				<content:encoded><![CDATA[# Yes, you read that correctly and no Quandl (<a href="http://www.quandl.com/">http://www.quandl.com/</a>)&nbsp;did not pay me anything.<br /><br /><a href="https://secure.gravatar.com/avatar/1ba0629c59af6282ab210395660f0754?s=420&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-org-420.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="200" src="https://secure.gravatar.com/avatar/1ba0629c59af6282ab210395660f0754?s=420&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-org-420.png" width="200" /></a># Quandl is a new database management tool which seeks to become the place to find datasets. &nbsp;They boast of having over 5x10^6 data sets available though after examining them, I have decided that they are not entirely what everybody might think of as data sets. &nbsp;That is, each unique indicator is considered an independent data set. &nbsp;This helps them to seem to have a ginormous quantity of data sets.<br /><br /># That said, they are not wrong in calling each indicator its own data set since much of their data, like financial data or government data is collected by disjoint teams. &nbsp;The scope of their&nbsp;ambition&nbsp;is fantastic yet it is doable and frankly someone needed to do it.<br /><br /># Currently, data seekers can access the <a href="http://www.icpsr.umich.edu/">Inter-University Consortium for Political and Social Research (IPCSR)</a>. &nbsp;This great resource is composed mostly of cross section and panel data sets which are great for much analysis but IPCSR resricts access to data to member universities. &nbsp;In addition, the kind of data that Quandl is indexing is a lot of data that would not show up on IPCSR database. &nbsp;In addition, Quandl is integrating an automated structure that will be self-updating.<br /><br /># For an example of how Quandl is a good step ahead of the game take a look at this search quiery:<br /><br /><a href="http://www.quandl.com/search/lansing,%20michigan">http://www.quandl.com/search/lansing,%20michigan</a><br /><br /># In this search, I searched out Lansing, Michigan where I live and returned results of data for the last decade or earlier up to today from sources such as the Federal Reserve and the US Energy Information Administration.<br /><br /><a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?q=Lansing%2C+Michigan&amp;permit%5B0%5D=AVAILABLE">http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?q=Lansing%2C+Michigan&amp;permit%5B0%5D=AVAILABLE</a><br /><br /># In constrast when queirying ICPSR, I found a few databases listed but they were historical databases that spanned back generally between 30 and 70 years. &nbsp;That said both sources could provide valuable information depending upon what I am interested in modeling.<br /><br /># Quandl is very clever for a number of reasons. &nbsp;One of these reasons is that they have simultaneously released 8 software packages that can be used in a number of statistical packages such as R, Stata, and Excel.<br /><br /># In order to demonstrate the use of Quandl I will grab a few data sets from the Lansing quiery drawn from the Federal Reserve.<br /><br />install.packages("Quandl")<br />library(Quandl)<br /><br /># Employment numbers (thousands of people") for Lansing, Michigan<br />NonFarm = Quandl("FRED/LANS626NAN")<br />CivLaborForce = Quandl("FRED/LANS626LFN")<br />PerCapitaIncome = Quandl("FRED/LANS626PCPI")<br /><br /># Now let's combine the data so that we can related data values.<br />Labor = merge(NonFarm, CivLaborForce, by="Date")<br />Combined = merge(Labor, PerCapitaIncome, by="Date")<br />colnames(Combined) = c("Date", "NonFarm", "CivLaborForce", "PerCapitaIncome")<br />&nbsp; # Notice that though our data had many more data points, the default option of merge only keeps data that exists in both data sets. &nbsp;In this case, it is per capital income that has the least number of data points.<br /><br /># Let's see if we can predict income as a function of employment:<br />summary(lm(PerCapitaIncome~NonFarm+CivLaborForce, data=Combined))<br /><br /># Our&nbsp;naive&nbsp;prediction as a result of this is that as the Civilian Labor Force increases, wages rise. &nbsp;This is of course a&nbsp;naive&nbsp;example ignoring completely issues of causation and endogeneity not to mention probable random walks and other challenging features of this kind of data.<br /><br /># The overall take away though, should be "cool", I think. &nbsp;Maybe this data bank does not provide information currently on many issues of interest to those looking for data. &nbsp;But it does make things easier and self-updating, which are great features.]]></content:encoded>
			<wfw:commentRss>http://www.econometricsbysimulation.com/feeds/6571470605642240724/comments/default</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Crack Limited Dependent Variable’s Regression Using Stata</title>
		<link>http://anaharb.com/blog/limited-dependent-variable/</link>
		<comments>http://anaharb.com/blog/limited-dependent-variable/#comments</comments>
		<pubDate>Wed, 01 May 2013 15:32:32 +0000</pubDate>
		<dc:creator>Kung Hiu</dc:creator>
				<category><![CDATA[Censored Regression]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[Heckman]]></category>
		<category><![CDATA[Incidental Trucation Regression]]></category>
		<category><![CDATA[Instrumental Variable Regression]]></category>
		<category><![CDATA[sample selection]]></category>
		<category><![CDATA[Stata]]></category>
		<category><![CDATA[Tobit]]></category>
		<category><![CDATA[Treatment Effects Model]]></category>
		<category><![CDATA[Truncated Regression]]></category>

		<guid isPermaLink="false">http://anaharb.com/blog/?p=132</guid>
		<description><![CDATA[Neophyte usually finds it difficult to crack the problem they meet when they have some data-that is not randomly collected or truncated or censored-to analysis. But in the real world, these kinds of problems always exist. This post will dig a little deeper in this area by presenting limited dependent variable&#8217;s type and relevent Stata [...] <p>Continue reading <a href="http://anaharb.com/blog/limited-dependent-variable/">Crack Limited Dependent Variable’s Regression Using Stata</a></p>]]></description>
				<content:encoded><![CDATA[<p>Neophyte usually finds it difficult to crack the problem they meet when they have some data-that is not randomly collected or truncated or censored-to analysis. But in the real world, these kinds of problems always exist.</p>
<p>This post will dig a little deeper in this area by presenting limited dependent variable&#8217;s type and relevent Stata cracking commands. The methods are:</p>
<blockquote><p>Truncated Regression</p>
<p>Censored Regression (Tobit model)</p>
<p>Incidental Trucation Regression (sample selection model, Heckman model)</p>
<p>Treatment Effects Model</p>
<p>Instrumental Variable Regression</p></blockquote>
<p>1, Truncated Regression</p>
<p>Truncated regression deals with truncated data. Then, what is truncated data? It shows like graphs below:</p>
<p>Left truncation(do neglect the numbers in the middle and on the axises):</p>
<p><a href="http://anaharb.com/blog/wp-content/uploads/2013/05/20130501210935.jpg"><img class="alignnone size-medium wp-image-133" alt="20130501210935" src="http://anaharb.com/blog/wp-content/uploads/2013/05/20130501210935-300x272.jpg" width="300" height="272" /></a></p>
<p>&nbsp;</p>
<p>Right truncation:</p>
<p><a href="http://anaharb.com/blog/wp-content/uploads/2013/05/20130501210923.jpg"><img class="alignnone size-medium wp-image-134" alt="20130501210923" src="http://anaharb.com/blog/wp-content/uploads/2013/05/20130501210923-300x276.jpg" width="300" height="276" /></a></p>
<p>&nbsp;</p>
<p>Two sides truncation:</p>
<p><a href="http://anaharb.com/blog/wp-content/uploads/2013/05/20130501210858.jpg"><img class="alignnone size-medium wp-image-135" alt="20130501210858" src="http://anaharb.com/blog/wp-content/uploads/2013/05/20130501210858-300x279.jpg" width="300" height="279" /></a></p>
<p>&nbsp;</p>
<p>If you meet this kind of data distribution, no matter what the actual specific situation you meet, you can use Stata command &#8220;truncreg&#8221; to solve the problem.</p>
<blockquote><p>truncreg y x1 x2 x3, ll(#)  (lower limit, Left truncation)</p>
<p>truncreg y x1 x2 x3, ul(#)  (upper limit, Right truncation)</p>
<p>truncreg y x1 x2 x3, ll(#) ul(#)  (lower and upper limits, Two sides truncation)</p>
<p># is the value of where the truncation happens.</p>
<p>The # should be chosen carefully because it could affect the regression outcomes. The number of obs that out of the # threshold does not affect the regression outcome.</p></blockquote>
<p>2, Censored Regression (Tobit model)</p>
<p>Censored regression deals with censored data. Then what is censored data? If the data is censored at the upper limit of 5000, then any value that should be larger than 5000 shows to be 5000 in the dataset.</p>
<p>For this kind of data, we use &#8220;tobit&#8221; command to analysis.</p>
<blockquote><p>tobit y x1 x2 x3, ll(#)  (lower limit, Left censored)</p>
<p>tobit y x1 x2 x3, ul(#)  (upper limit, Right censored)</p>
<p>tobit y x1 x2 x3, ll(#) ul(#)  (lower and upper limits, Two sides censored)</p>
<p>The # should be chosen carefully because it could affect the regression outcomes. The number of obs that at the # threshold does affect the regression outcome.</p></blockquote>
<p>3, Incidental Trucation Regression (sample selection model, Heckman model)</p>
<p>Incidental trucation regression, or sample selection, means the data have some kind of bias. Kinds of bias are elaborated in books of Epidemiology. To correct the bias and get the effect of factors on the whole population, we use &#8220;heckman&#8221; command.</p>
<blockquote><p>heckman y x1 x2 x3, select(z1 z2)    /*using MLE method, and the dependent variable of selcetion equation is y*/</p>
<p>heckman y x1 x2 x3, select(z1 z2) twostep    /*using twostep method, and the dependent variable of selcetion equation is y*/</p>
<p>heckman y x1 x2 x3, select(w=z1 z2)    /*using MLE method, and the dependent variable of selcetion equation is w. Variable w won&#8217;t show up in the main(second step, if twostep) regression equation, even you select it purposely.*/</p>
<p>Things to remember, any un-observed y should be set into missing.</p></blockquote>
<p>4, Treatment Effects Model</p>
<p>Treatment effects model is derived from heckman model and mainly used to realize the mission of program evaluation.</p>
<p>It has two merits compared with Heckman model mentioned above: 1, In the heckman model, only part of dependent variables can be observed. 2, In the heckman model, intervention variable, which is the potential dependent variable of selection equation, cannot get into the main regression equation.</p>
<p>The command for this model is &#8220;treatreg&#8221;:</p>
<blockquote><p>treatreg y x1 x2 x3 w, treat(w=z1 z2) [twostep]   /*w is usually the intervention variable, which defines the treatment group and control group*/</p>
<p>The rho variable in the output is important, the hypothesis test of it in the bottom of the output tells us whether it&#8217;s appropriate to choose this model. (It should be non-zero.)</p></blockquote>
<p>5, Instrumental Variable Regression</p>
<p>Instrumental variable regression is a method to solve the problem that the independent variable have contemporaneous correlation with error term, when there are some explainary items are not included in the equation. So, this method uses an instrumental variable to solve this problem. This IV is hard to determine because it needs to fit the criteria &#8220;the instrument should be correlated with endogenous explanatory variables, but cannot be correlated with the error term&#8221;.</p>
<p>Here is the command:</p>
<blockquote><p>ivregress 2sls/liml/gmm y x1 x2 x3 (x4=iv1 iv2 iv3)  /*x4 is the variable that correlated with the error term*/</p>
<p>ivprobit y x1 x2 x3 (x4=iv1 iv2 iv3), [twostep]   /*using MLE or twostep method*/</p></blockquote>
<p>Also, it is recognized that because it is hard to find the appropriate instrument, it&#8217;s fine to use treatment effects model to insteat IV method.</p>
<p>All the above is the five methods I would like to introduce, later I may revise this post if I find any mistake. If the reader of this article finds any mistake, or has any suggestion, please kindly let me know, thank you!</p>
]]></content:encoded>
			<wfw:commentRss>http://anaharb.com/blog/limited-dependent-variable/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Propensity Score Match</title>
		<link>http://anaharb.com/blog/propensity-score-match/</link>
		<comments>http://anaharb.com/blog/propensity-score-match/#comments</comments>
		<pubDate>Mon, 29 Apr 2013 12:17:50 +0000</pubDate>
		<dc:creator>Kung Hiu</dc:creator>
				<category><![CDATA[econometrics]]></category>
		<category><![CDATA[Propensity Score Match]]></category>
		<category><![CDATA[PSM]]></category>
		<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://anaharb.com/blog/?p=109</guid>
		<description><![CDATA[This issue is a bit hard, I have been reading a book for several weeks(a short time period every time), and I know the theory behind that is so complex&#8230; However, we can solve the complex problem by just several &#160;commands in STATA, so powerful a software, Ha&#8230; First, you should have the STATA software [...] <p>Continue reading <a href="http://anaharb.com/blog/propensity-score-match/">Propensity Score Match</a></p>]]></description>
				<content:encoded><![CDATA[<p>This issue is a bit hard, I have been reading a book for several weeks(a short time period every time), and I know the theory behind that is so complex&#8230; However, we can solve the complex problem by just several  commands in STATA, so powerful a software, Ha&#8230;</p>
<p>First, you should have the STATA software and program adds-on. By</p>
<blockquote><p>help pscore</p>
<p>install (one of the suits, usually the newest one)</p></blockquote>
<p>Then, after you load the dataset, you can begin to calculate the PSM.</p>
<p>The example given by &#8221;Microeconometrics: Methods and Applications&#8221; used global command to set the global variables, if you don&#8217;t like it, you can just jump this global part and use real numbers or words names in where there is a $.</p>
<blockquote><p>global breps 200      /*set 200 to variable breps, and indicate the times bootstrap process will repeat*/</p>
<p>global vars name1 name2 &#8230;namen  /*just to save efforts of repeating writing the variable names*/</p></blockquote>
<p>Next, we come to the core processors of PSM calculating. Pscore and PSM regression.</p>
<p>Pscore:</p>
<blockquote><p>pscore TREAT $vars, pscore(myscore) comsup blockid(myblock) numblo(5) level(0.005) logit</p>
<p>or</p>
<p>pscore TREAT $vars, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit</p></blockquote>
<p>TREAT is the var which indicats where the obs is a treated or a controlled. To perform regression on it, is based on the theory that all the obervations should have the same probability to be treated or controlled. In the other word, it should be a independent variable.</p>
<p>Pscore, blockid, numblo, level, logit(the default is probit model), means to get pscore, block indicator, set number of blocks, and significant level when selecting varialbes, also, by the method of logit model.</p>
<p>The comsup, which is the sole difference between the two command, is used to set whether to use the common supported part, the part that both group, treat and control, share.</p>
<p>PSM regression:</p>
<blockquote><p>set seed 10101<br />
attnd RE78 TREAT $vars, comsup boot reps($breps) dots logit</p>
<p>set seed 10101<br />
attr RE78 TREAT $vars, comsup boot reps($breps) dots logit radius(0.001)</p>
<p>set seed 10101<br />
atts RE78 TREAT, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots</p>
<p>set seed 10101<br />
attk RE78 TREAT $vars, comsup boot reps($breps) dots logit</p></blockquote>
<p>The four groups of commands all perform PSM regression, but use different method, specifically, Nearest neighbor matching, Radius matching for Radius=0.001, Stratification Matching, Kernel Matching, respectively.</p>
<p>Boot reps($breps) is showing that we are using bootstrap method and repeat $breps times. Dots is interesting dots in the screen if you like to see. Attention should be taken that different methods using different numbers of observations.</p>
<p>&nbsp;</p>
<p>Ok, all above is a simple, short command to perform PSM. Enjoy your research!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://anaharb.com/blog/propensity-score-match/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Unveil the truth of DID</title>
		<link>http://anaharb.com/blog/unveil-the-truth-of-did/</link>
		<comments>http://anaharb.com/blog/unveil-the-truth-of-did/#comments</comments>
		<pubDate>Sun, 28 Apr 2013 15:52:51 +0000</pubDate>
		<dc:creator>Kung Hiu</dc:creator>
				<category><![CDATA[DID]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://anaharb.com/blog/?p=98</guid>
		<description><![CDATA[Have to say, that, DID, which stands for differnce in difference, one of commonly used economitrica method in program evaluation, is such an easy method. I have been deceived by it for such a long time&#8230; Here is the command of applying DID in STATA: regress EARNS Tdyear2 TREAT dyear2 OLS regresssion, with some dummy [...] <p>Continue reading <a href="http://anaharb.com/blog/unveil-the-truth-of-did/">Unveil the truth of DID</a></p>]]></description>
				<content:encoded><![CDATA[<p>Have to say, that, DID, which stands for differnce in difference, one of commonly used economitrica method in program evaluation, is such an easy method.</p>
<p>I have been deceived by it for such a long time&#8230;</p>
<p>Here is the command of applying DID in STATA:</p>
<blockquote><p>regress EARNS Tdyear2 TREAT dyear2</p></blockquote>
<p>OLS regresssion, with some dummy variables. Among the command, EARNS stands for the outcome you want to measure, and TREAT and dyear2 stand for the treat and time, respectively. The KEY variable Tdyear2, is the multiplication of TREAT and dyear2, which, therefore distinguish the treat group on the second year with all the other 3 parts of the observations.</p>
<p>So, the paramater of Tdyear2, is the treat effect.</p>
<p>If you use the command of &#8220;regress EARNS Tdyear2 TREAT dyear2, robust&#8221;, then you can calculate heteroskedastic-robust standard errors.</p>
<p>Things to remember:</p>
<blockquote><p>DID的基本假设：<br />
1， 共同趋势假设：假设不同组别的时间效应都是相同的。<br />
2， 假定两组的合成部分在变动前后均是稳定的。</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://anaharb.com/blog/unveil-the-truth-of-did/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>&#8220;By using Excel, which was never designed for scientific research, they institutionalized mouse&#8230;&#8221;</title>
		<link>http://srqm.tumblr.com/post/48503543881</link>
		<comments>http://srqm.tumblr.com/post/48503543881#comments</comments>
		<pubDate>Sun, 21 Apr 2013 05:49:30 +0000</pubDate>
		<dc:creator>SRQM</dc:creator>
				<category><![CDATA[economics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://srqm.tumblr.com/post/48503543881</guid>
		<description><![CDATA[&#8220;By using Excel, which was never designed for scientific research, they institutionalized mouse clicks and other untraceable actions into a scientific workflow, which must be avoided since it makes explaining to others (and to oneself) how to replicate the findings next to impossible and too easily introduces inadvertent mistakes.&#8221;<br /><br /> - <em><p>Period. The replication was carried with R, and additional analysis (easily found online) was done with Stata.</p> <p>Victoria Stodden at <a href="http://themonkeycage.org/2013/04/19/what-the-reinhart-rogoff-debacle-really-shows-verifying-empirical-results-needs-to-be-routine/">What the Reinhart &#38; Rogoff Debacle Really Shows: Verifying Empirical Results Needs to be Routine &#8212; The Monkey Cage</a></p></em> <p>Continue reading <a href="http://srqm.tumblr.com/post/48503543881">&#8220;By using Excel, which was never designed for scientific research, they institutionalized mouse&#8230;&#8221;</a></p>]]></description>
				<content:encoded><![CDATA[“By using Excel, which was never designed for scientific research, they institutionalized mouse clicks and other untraceable actions into a scientific workflow, which must be avoided since it makes explaining to others (and to oneself) how to replicate the findings next to impossible and too easily introduces inadvertent mistakes.”<br/><br/> - <em><p>Period. The replication was carried with R, and additional analysis (easily found online) was done with Stata.</p>

<p>Victoria Stodden at <a href="http://themonkeycage.org/2013/04/19/what-the-reinhart-rogoff-debacle-really-shows-verifying-empirical-results-needs-to-be-routine/">What the Reinhart & Rogoff Debacle Really Shows: Verifying Empirical Results Needs to be Routine — The Monkey Cage</a></p></em>]]></content:encoded>
			<wfw:commentRss>http://stata-bloggers.com/about/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>The effect of non-convergence on MLE estimates</title>
		<link>http://www.econometricsbysimulation.com/2013/04/the-effect-of-non-convergence-on-mle.html</link>
		<comments>http://www.econometricsbysimulation.com/2013/04/the-effect-of-non-convergence-on-mle.html#comments</comments>
		<pubDate>Thu, 11 Apr 2013 01:46:00 +0000</pubDate>
		<dc:creator>Francis Smart</dc:creator>
				<category><![CDATA[convergence]]></category>
		<category><![CDATA[maximization algorithm]]></category>
		<category><![CDATA[maximizations]]></category>
		<category><![CDATA[Maximum Likelihood]]></category>
		<category><![CDATA[Stata]]></category>
		<category><![CDATA[testing assumption failures]]></category>

		<guid isPermaLink="false">http://stata-bloggers.com/?guid=d5b73ba595db8233c16d93e7ee57b557</guid>
		<description><![CDATA[<br /><a href="http://3.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlg/RmiG_DnSunI/s1600/2013-04-10-nonconvergence.png"><img border="0" height="145" src="http://3.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlg/RmiG_DnSunI/s200/2013-04-10-nonconvergence.png" width="200"></a>* Maximum likelihood proceedures have become widely used to solve a variety of econometric problems.<br /><br />* Unfortunately there is no guarantee that these proceedures will yeild a single solution which satisfies the convergence criteria of the maximizing function.<br /><br />* This might occur for reasons difficult to detect such as localy flat spots or discontinuous areas.<br /><br />* Maximization proceedures are usually evaluated based on their 1. efficiency (speed of convergence) or 2. on their robustness at detecting optimal values.<br /><br />* The problem is that sometimes in simulation we need to limit the time a MLE proceedure takes in attempting to find a solution.<br /><br />* What effect does that limitation result in and what do we do with estimates that result from non-convergence?<br />* 1. Keep them or 2. throw them out<br /><br />* This simulation will explore both options<br /><br />* In this simulation we will fall back on the widely used estimator which is equivalent when the standard errors are not structurally estimated to the OLS estimator.<br /><br />* This is the "normal" regression estimator. &#160;IE the MLE maximization that allows for linearly modeled heteroskedasticity.<br />cap program drop myNormalReg<br />program define myNormalReg<br />&#160; args lnlk xb sigma2<br />&#160; qui replace `lnlk' = -ln(sqrt(`sigma2'*2*_pi)) - ($ML_y-`xb')^2/(2*`sigma2')<br />end<br /><br />* First let's generate a sample data set<br />clear<br />set obs 300<br /><br />* I am going to try to make the problem hard to solve by including both addative and multiplicative error.<br />gen u = (runiform()-.5)<br />&#160; * I made this error small because actually when the error is small it is harder to estimate the variance of the error.<br />&#160; * It takes a little work with simulations to generate data which does not converge.<br /><br />gen v1 = rnormal()<br />gen x1 = runiform()-.5<br /><br />gen v2 = rnormal()<br />gen x2 = runiform()-.5<br /><br />gen y = 3 + (1+v1)*x1 + (1+v2)*x2 + u<br /><br />reg y x1 x2<br /><br />ml model lf myNormalReg (reg: y=x1 x2) (sigma2:)<br />ml maximize<br /><br />* This is the more efficient model because it is explicitly modeling the error.<br />gen x1_2 = x1^2<br />gen x2_2 = x2^2<br /><br />ml model lf myNormalReg (reg: y=x1 x2) (sigma2: x1_2 x2_2)<br />ml maximize<br /><br />* It seems that typically this model frequently converges.<br /><br />* Let's see if we can't dilute the maximization:<br />gen x1x2 = x1*x2<br /><br />gen x1abs = abs(x1)<br />gen x2abs = abs(x2)<br /><br />ml model lf myNormalReg (reg: y=x1 x2 x1_2 x2_2 x1x2 x1abs x2abs) (sigma2: x1 x2 x1_2 x2_2 x1x2 x1abs x2abs)<br />ml maximize, iterate(100)<br /><br />* It looks to me about half the time I run this code it does not converge within a 100 iterations.<br /><br />* Now lets specify our functions we will use to test differences in results depending upon our method of dealing with convergence.<br /><br />* This program is just a condensation of the above code.<br />cap program drop sim_converge<br />program define sim_converge<br /><br />&#160; clear<br />&#160; set obs 300<br /><br />&#160; gen u = (runiform()-.5)<br /><br />&#160; gen v1 = rnormal()<br />&#160; gen x1 = runiform()-.5<br /><br />&#160; gen v2 = rnormal()<br />&#160; gen x2 = runiform()-.5<br />&#160; gen y = 3 + (1+v1)*x1 + (1+v2)*x2 + u<br />&#160; gen x1_2 = x1^2<br />&#160; gen x2_2 = x2^2<br /><br />&#160; ml model lf myNormalReg (reg: y=x1 x2) (sigma2: x1_2 x2_2)<br />&#160; ml maximize<br /><br />&#160; gen x1x2 = x1*x2<br /><br />&#160; gen x1abs = abs(x1)<br />&#160; gen x2abs = abs(x2)<br /><br />&#160; ml model lf myNormalReg (reg: y=x1 x2 x1_2 x2_2 x1x2 x1abs x2abs) (sigma2: x1 x2 x1_2 x2_2 x1x2 x1abs x2abs)<br />&#160; ml maximize, iterate(`1')<br />&#160; * The only difference is that iterate is specified by the user.<br />end<br /><br />* Leaving the first argument blank will not specify a maximum convergence iteration.<br />sim_converge<br />sim_converge 50<br /><br />* Let's first define what we would like to save from the MLE.<br />* Yes, I am going to use a forbidden global :)<br />gl savelist ic=e(ic)<br />&#160; * e(ic) is the macro in which the number of iterations used is saved.<br /><br />&#160;foreach i in reg sigma2 {<br />&#160; &#160;foreach v in x1 x2 x1_2 x2_2 x1x2 x1abs x2abs {<br />&#160; &#160;gl savelist $savelist `i'`v'=[`i']_b[`v']<br />&#160; &#160;}<br />&#160;}<br /><br />&#160;* Let's see what our savelist looks like:<br />&#160;di "${savelist}"<br /><br />&#160;* looking pretty good.<br /><br />simulate ${savelist} , rep(100) seed(32): sim_converge 50<br />tab ic<br />* In my simulation 51 times the MLE did not converge by the 50th iteration.<br /><br />* Let's see if there are systematic differences between estimates.<br />sum if ic==50<br />sum if ic&#160; * We can see that if the estimator did converge then it is much more precise (smaller sd) than in the cases when it did not converge.<br />&#160;<br />&#160; * The mean estimates of regx1 and regx2 and sigmax1_2 and sigmax2_2 are much closer to 1 which is the true parameter values.<br />&#160;<br />* Let's try it again setting convergence at a higher bar:<br />simulate ${savelist} , rep(100) seed(32): sim_converge 250<br />tab ic<br />sum if ic==250<br />sum if ic<br />* Raising the max iteration does not lead to any of the observations converging.<br /><br />* This is problematic because we want to know if there is a systematic difference in the draws for the estimates which converged and those that did not.<br /><br />* By the results so far we might be tempted just to include the results of the iterations that did converge.<br /><br />* First off let's see if the estimates that converged quickly are better or worse than those that converged more slowly.<br /><br />recode ic (1/14=0) (15/49=1), gen(grp)<br /><br />bysort grp: sum regx1 regx2<br />anova regx1 grp if grp* It seems there is no detectable difference in the means for those observations that converged more quickly than 15 iterations than those that converged more slowly.<br /><br />* This implies, assuming the results are generalizable that truncating the simulation to only the results that converge might produce unbiased estimates.<br /><br />* We should run the simulation again with more repetitions in order to confirm this.<br /><br />simulate ${savelist} , rep(500) seed(32): sim_converge 50<br />&#160; tab ic<br />&#160;<br />/* tab ic<br /><br />&#160; &#160; &#160; e(ic) &#124; &#160; &#160; &#160;Freq. &#160; &#160; Percent &#160; &#160; &#160; &#160;Cum.<br />------------+-----------------------------------<br />&#160; &#160; &#160; &#160; &#160;10 &#124; &#160; &#160; &#160; &#160; &#160;1 &#160; &#160; &#160; &#160;0.20 &#160; &#160; &#160; &#160;0.20<br />&#160; &#160; &#160; &#160; &#160;11 &#124; &#160; &#160; &#160; &#160; 18 &#160; &#160; &#160; &#160;3.63 &#160; &#160; &#160; &#160;3.83<br />&#160; &#160; &#160; &#160; &#160;12 &#124; &#160; &#160; &#160; &#160; 24 &#160; &#160; &#160; &#160;4.84 &#160; &#160; &#160; &#160;8.67<br />&#160; &#160; &#160; &#160; &#160;13 &#124; &#160; &#160; &#160; &#160; 48 &#160; &#160; &#160; &#160;9.68 &#160; &#160; &#160; 18.35<br />&#160; &#160; &#160; &#160; &#160;14 &#124; &#160; &#160; &#160; &#160; 57 &#160; &#160; &#160; 11.49 &#160; &#160; &#160; 29.84<br />&#160; &#160; &#160; &#160; &#160;15 &#124; &#160; &#160; &#160; &#160; 50 &#160; &#160; &#160; 10.08 &#160; &#160; &#160; 39.92<br />&#160; &#160; &#160; &#160; &#160;16 &#124; &#160; &#160; &#160; &#160; 24 &#160; &#160; &#160; &#160;4.84 &#160; &#160; &#160; 44.76<br />&#160; &#160; &#160; &#160; &#160;17 &#124; &#160; &#160; &#160; &#160; 10 &#160; &#160; &#160; &#160;2.02 &#160; &#160; &#160; 46.77<br />&#160; &#160; &#160; &#160; &#160;18 &#124; &#160; &#160; &#160; &#160; 18 &#160; &#160; &#160; &#160;3.63 &#160; &#160; &#160; 50.40<br />&#160; &#160; &#160; &#160; &#160;19 &#124; &#160; &#160; &#160; &#160; 10 &#160; &#160; &#160; &#160;2.02 &#160; &#160; &#160; 52.42<br />&#160; &#160; &#160; &#160; &#160;20 &#124; &#160; &#160; &#160; &#160; &#160;4 &#160; &#160; &#160; &#160;0.81 &#160; &#160; &#160; 53.23<br />&#160; &#160; &#160; &#160; &#160;21 &#124; &#160; &#160; &#160; &#160; &#160;6 &#160; &#160; &#160; &#160;1.21 &#160; &#160; &#160; 54.44<br />&#160; &#160; &#160; &#160; &#160;22 &#124; &#160; &#160; &#160; &#160; &#160;1 &#160; &#160; &#160; &#160;0.20 &#160; &#160; &#160; 54.64<br />&#160; &#160; &#160; &#160; &#160;23 &#124; &#160; &#160; &#160; &#160; &#160;3 &#160; &#160; &#160; &#160;0.60 &#160; &#160; &#160; 55.24<br />&#160; &#160; &#160; &#160; &#160;24 &#124; &#160; &#160; &#160; &#160; &#160;4 &#160; &#160; &#160; &#160;0.81 &#160; &#160; &#160; 56.05<br />&#160; &#160; &#160; &#160; &#160;25 &#124; &#160; &#160; &#160; &#160; &#160;2 &#160; &#160; &#160; &#160;0.40 &#160; &#160; &#160; 56.45<br />&#160; &#160; &#160; &#160; &#160;26 &#124; &#160; &#160; &#160; &#160; &#160;3 &#160; &#160; &#160; &#160;0.60 &#160; &#160; &#160; 57.06<br />&#160; &#160; &#160; &#160; &#160;29 &#124; &#160; &#160; &#160; &#160; &#160;1 &#160; &#160; &#160; &#160;0.20 &#160; &#160; &#160; 57.26<br />&#160; &#160; &#160; &#160; &#160;32 &#124; &#160; &#160; &#160; &#160; &#160;1 &#160; &#160; &#160; &#160;0.20 &#160; &#160; &#160; 57.46<br />&#160; &#160; &#160; &#160; &#160;33 &#124; &#160; &#160; &#160; &#160; &#160;1 &#160; &#160; &#160; &#160;0.20 &#160; &#160; &#160; 57.66<br />&#160; &#160; &#160; &#160; &#160;50 &#124; &#160; &#160; &#160; &#160;210 &#160; &#160; &#160; 42.34 &#160; &#160; &#160;100.00<br />------------+-----------------------------------<br />&#160; &#160; &#160; Total &#124; &#160; &#160; &#160; &#160;496 &#160; &#160; &#160;100.00<br />*/<br /><br />&#160; sum if ic==50<br />&#160; sum if ic<br />&#160; recode ic (1/14=0) (15/49=1), gen(grp)<br /><br />&#160; bysort grp: sum regx1 regx2<br /><br />/*<br />-&#62; grp = 0<br /><br />&#160; &#160; Variable &#124; &#160; &#160; &#160; Obs &#160; &#160; &#160; &#160;Mean &#160; &#160;Std. Dev. &#160; &#160; &#160; Min &#160; &#160; &#160; &#160;Max<br />-------------+--------------------------------------------------------<br />&#160; &#160; &#160; &#160;regx1 &#124; &#160; &#160; &#160; 148 &#160; &#160;.9918629 &#160; &#160;.1076053 &#160; &#160;.732878 &#160; 1.296374<br />&#160; &#160; &#160; &#160;regx2 &#124; &#160; &#160; &#160; 148 &#160; &#160;.9807507 &#160; &#160; .112081 &#160; &#160;.684716 &#160; 1.447506<br /><br />-----------------------------------------------------------------------------------------<br />-&#62; grp = 1<br /><br />&#160; &#160; Variable &#124; &#160; &#160; &#160; Obs &#160; &#160; &#160; &#160;Mean &#160; &#160;Std. Dev. &#160; &#160; &#160; Min &#160; &#160; &#160; &#160;Max<br />-------------+--------------------------------------------------------<br />&#160; &#160; &#160; &#160;regx1 &#124; &#160; &#160; &#160; 138 &#160; &#160;.9967074 &#160; &#160;.1183514 &#160; .7492483 &#160; 1.318624<br />&#160; &#160; &#160; &#160;regx2 &#124; &#160; &#160; &#160; 138 &#160; &#160;.9913069 &#160; &#160;.1161812 &#160; .6711526 &#160; 1.407439<br /><br />-----------------------------------------------------------------------------------------<br />-&#62; grp = 50<br /><br />&#160; &#160; Variable &#124; &#160; &#160; &#160; Obs &#160; &#160; &#160; &#160;Mean &#160; &#160;Std. Dev. &#160; &#160; &#160; Min &#160; &#160; &#160; &#160;Max<br />-------------+--------------------------------------------------------<br />&#160; &#160; &#160; &#160;regx1 &#124; &#160; &#160; &#160; 210 &#160; &#160; .814391 &#160; &#160;.3268973 &#160; .0518116 &#160; 1.590542<br />&#160; &#160; &#160; &#160;regx2 &#124; &#160; &#160; &#160; 210 &#160; &#160;.8106436 &#160; &#160;.3278075 &#160; .0707162 &#160; 1.775239<br /><br />*/<br />&#160; anova regx1 grp if grp<br />/*<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;Number of obs = &#160; &#160; 286 &#160; &#160; R-squared &#160; &#160; = &#160;0.0005<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;Root MSE &#160; &#160; &#160;= .112917 &#160; &#160; Adj R-squared = -0.0031<br /><br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; Source &#124; &#160;Partial SS &#160; &#160;df &#160; &#160; &#160; MS &#160; &#160; &#160; &#160; &#160; F &#160; &#160; Prob &#62; F<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; -----------+----------------------------------------------------<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;Model &#124; &#160;.001675983 &#160; &#160; 1 &#160;.001675983 &#160; &#160; &#160; 0.13 &#160; &#160; 0.7172<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#124;<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;grp &#124; &#160;.001675983 &#160; &#160; 1 &#160;.001675983 &#160; &#160; &#160; 0.13 &#160; &#160; 0.7172<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#124;<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; Residual &#124; &#160;3.62106459 &#160; 284 &#160;.012750227 &#160;<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; -----------+----------------------------------------------------<br />&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;Total &#124; &#160;3.62274057 &#160; 285 &#160; .01271137 &#160;<br />*/<br /><br />* We can see that even when the sample size is larger (about 140 per iteration group) there is no discernable difference between those draws that converge before the first 15 iterations and those that converge after.<br /><br />* If we assume that for those draws that do not converge within 250 draws are sampling from the same probability of convergence distribution then this evidence suggests that rate of convergence is independent of the actual estimates and therefore it might be safe to exclude observations in which convergence did not occur.<br /><br /><br />* Now finally what we might interested in is seeing how our estimates change (for the values in which convergence is achieved) as we include more and more estimates by way of increasing our threshold of max iterations.<br /><br />* How do we do this?<br /><br />* Well, this might take a little while but we basically loop through the simulation saving the results it terms of means and standard deviations from each run.<br /><br />forv i = 15(5)35 {<br />&#160; simulate ${savelist} , rep(100) seed(32): sim_converge `i'<br />&#160; sum regx1 if ic&#60;`i'<br />&#160; &#160; global mean_x1_`i'=r(mean)<br />&#160; &#160; global var_x1_`i'=r(sd)^2<br />&#160; sum regx2 if ic&#60;`i'<br />&#160; &#160; global mean_x2_`i'=r(mean)<br />&#160; &#160; global var_x2_`i'=r(sd)^2<br />}<br />* By the way this is a highly redundant and inefficient method.<br /><br />clear<br />set obs 5<br />gen mean_x1 = .<br />&#160; label var mean_x1 "Mean of x1 estimates"<br />gen mean_x2 = .<br />&#160; label var mean_x2 "Mean of x2 estimates"<br />gen var_x1 = .<br />&#160; label var var_x1 "Variance of x1 estimates"<br />gen var_x2 = .<br />&#160; label var var_x2 "Variance of x2 estimates"<br /><br />gen i = .<br />&#160; label var i "Max # iterations"<br /><br />* Save the results as variables<br />forv i = 1(1)5 {<br />&#160; local ii = 10+`i'*5<br />&#160; replace mean_x1 = ${mean_x1_`ii'} if _n==`i'<br />&#160; replace mean_x2 = ${mean_x2_`ii'} if _n==`i'<br />&#160; replace var_x1 &#160;= ${var_x1_`ii'} &#160;if _n==`i'<br />&#160; replace var_x2 &#160;= ${var_x2_`ii'} &#160;if _n==`i'<br />&#160; replace i = `ii' if _n==`i'<br />}<br /><br />two (connected mean_x1 i, msize(large) lwidth(thick)) ///<br />&#160; &#160; (connected mean_x2 i, msize(large) lwidth(thick)), name(means, replace)<br />two (connected var_x1 &#160;i, msize(large) lwidth(thick)) ///<br />&#160; &#160; (connected var_x2 &#160;i, msize(large) lwidth(thick)), name(vars, &#160;replace)<br /><br />graph combine means vars, col(1) title("Estimates are insensitive to speed of convergence")<br /><br /><div><a href="http://4.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlc/UNQsiOdWXGY/s1600/2013-04-10-nonconvergence.png"><img border="0" height="290" src="http://4.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlc/UNQsiOdWXGY/s400/2013-04-10-nonconvergence.png" width="400"></a></div><br /><br />* The take away seems to be that it is safe (at least in this simulation) to exclude from your analysis simulations that did not converge in the specified iteration count.<br /><br />* This simulation also suggests that it is not ideal to include in your results MLE estimates from iterations in which no convergence was achieved.<br />50&#62;50&#62;250&#62;250&#62;50&#62; <p>Continue reading <a href="http://www.econometricsbysimulation.com/2013/04/the-effect-of-non-convergence-on-mle.html">The effect of non-convergence on MLE estimates</a></p>]]></description>
				<content:encoded><![CDATA[<br /><a href="http://3.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlg/RmiG_DnSunI/s1600/2013-04-10-nonconvergence.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="145" src="http://3.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlg/RmiG_DnSunI/s200/2013-04-10-nonconvergence.png" width="200" /></a>* Maximum likelihood proceedures have become widely used to solve a variety of econometric problems.<br /><br />* Unfortunately there is no guarantee that these proceedures will yeild a single solution which satisfies the convergence criteria of the maximizing function.<br /><br />* This might occur for reasons difficult to detect such as localy flat spots or discontinuous areas.<br /><br />* Maximization proceedures are usually evaluated based on their 1. efficiency (speed of convergence) or 2. on their robustness at detecting optimal values.<br /><br />* The problem is that sometimes in simulation we need to limit the time a MLE proceedure takes in attempting to find a solution.<br /><br />* What effect does that limitation result in and what do we do with estimates that result from non-convergence?<br />* 1. Keep them or 2. throw them out<br /><br />* This simulation will explore both options<br /><br />* In this simulation we will fall back on the widely used estimator which is equivalent when the standard errors are not structurally estimated to the OLS estimator.<br /><br />* This is the "normal" regression estimator. &nbsp;IE the MLE maximization that allows for linearly modeled heteroskedasticity.<br />cap program drop myNormalReg<br />program define myNormalReg<br />&nbsp; args lnlk xb sigma2<br />&nbsp; qui replace `lnlk' = -ln(sqrt(`sigma2'*2*_pi)) - ($ML_y-`xb')^2/(2*`sigma2')<br />end<br /><br />* First let's generate a sample data set<br />clear<br />set obs 300<br /><br />* I am going to try to make the problem hard to solve by including both addative and multiplicative error.<br />gen u = (runiform()-.5)<br />&nbsp; * I made this error small because actually when the error is small it is harder to estimate the variance of the error.<br />&nbsp; * It takes a little work with simulations to generate data which does not converge.<br /><br />gen v1 = rnormal()<br />gen x1 = runiform()-.5<br /><br />gen v2 = rnormal()<br />gen x2 = runiform()-.5<br /><br />gen y = 3 + (1+v1)*x1 + (1+v2)*x2 + u<br /><br />reg y x1 x2<br /><br />ml model lf myNormalReg (reg: y=x1 x2) (sigma2:)<br />ml maximize<br /><br />* This is the more efficient model because it is explicitly modeling the error.<br />gen x1_2 = x1^2<br />gen x2_2 = x2^2<br /><br />ml model lf myNormalReg (reg: y=x1 x2) (sigma2: x1_2 x2_2)<br />ml maximize<br /><br />* It seems that typically this model frequently converges.<br /><br />* Let's see if we can't dilute the maximization:<br />gen x1x2 = x1*x2<br /><br />gen x1abs = abs(x1)<br />gen x2abs = abs(x2)<br /><br />ml model lf myNormalReg (reg: y=x1 x2 x1_2 x2_2 x1x2 x1abs x2abs) (sigma2: x1 x2 x1_2 x2_2 x1x2 x1abs x2abs)<br />ml maximize, iterate(100)<br /><br />* It looks to me about half the time I run this code it does not converge within a 100 iterations.<br /><br />* Now lets specify our functions we will use to test differences in results depending upon our method of dealing with convergence.<br /><br />* This program is just a condensation of the above code.<br />cap program drop sim_converge<br />program define sim_converge<br /><br />&nbsp; clear<br />&nbsp; set obs 300<br /><br />&nbsp; gen u = (runiform()-.5)<br /><br />&nbsp; gen v1 = rnormal()<br />&nbsp; gen x1 = runiform()-.5<br /><br />&nbsp; gen v2 = rnormal()<br />&nbsp; gen x2 = runiform()-.5<br />&nbsp; gen y = 3 + (1+v1)*x1 + (1+v2)*x2 + u<br />&nbsp; gen x1_2 = x1^2<br />&nbsp; gen x2_2 = x2^2<br /><br />&nbsp; ml model lf myNormalReg (reg: y=x1 x2) (sigma2: x1_2 x2_2)<br />&nbsp; ml maximize<br /><br />&nbsp; gen x1x2 = x1*x2<br /><br />&nbsp; gen x1abs = abs(x1)<br />&nbsp; gen x2abs = abs(x2)<br /><br />&nbsp; ml model lf myNormalReg (reg: y=x1 x2 x1_2 x2_2 x1x2 x1abs x2abs) (sigma2: x1 x2 x1_2 x2_2 x1x2 x1abs x2abs)<br />&nbsp; ml maximize, iterate(`1')<br />&nbsp; * The only difference is that iterate is specified by the user.<br />end<br /><br />* Leaving the first argument blank will not specify a maximum convergence iteration.<br />sim_converge<br />sim_converge 50<br /><br />* Let's first define what we would like to save from the MLE.<br />* Yes, I am going to use a forbidden global :)<br />gl savelist ic=e(ic)<br />&nbsp; * e(ic) is the macro in which the number of iterations used is saved.<br /><br />&nbsp;foreach i in reg sigma2 {<br />&nbsp; &nbsp;foreach v in x1 x2 x1_2 x2_2 x1x2 x1abs x2abs {<br />&nbsp; &nbsp;gl savelist $savelist `i'`v'=[`i']_b[`v']<br />&nbsp; &nbsp;}<br />&nbsp;}<br /><br />&nbsp;* Let's see what our savelist looks like:<br />&nbsp;di "${savelist}"<br /><br />&nbsp;* looking pretty good.<br /><br />simulate ${savelist} , rep(100) seed(32): sim_converge 50<br />tab ic<br />* In my simulation 51 times the MLE did not converge by the 50th iteration.<br /><br />* Let's see if there are systematic differences between estimates.<br />sum if ic==50<br />sum if ic<50 p="">&nbsp; * We can see that if the estimator did converge then it is much more precise (smaller sd) than in the cases when it did not converge.<br />&nbsp;<br />&nbsp; * The mean estimates of regx1 and regx2 and sigmax1_2 and sigmax2_2 are much closer to 1 which is the true parameter values.<br />&nbsp;<br />* Let's try it again setting convergence at a higher bar:<br />simulate ${savelist} , rep(100) seed(32): sim_converge 250<br />tab ic<br />sum if ic==250<br />sum if ic<250 p=""><br />* Raising the max iteration does not lead to any of the observations converging.<br /><br />* This is problematic because we want to know if there is a systematic difference in the draws for the estimates which converged and those that did not.<br /><br />* By the results so far we might be tempted just to include the results of the iterations that did converge.<br /><br />* First off let's see if the estimates that converged quickly are better or worse than those that converged more slowly.<br /><br />recode ic (1/14=0) (15/49=1), gen(grp)<br /><br />bysort grp: sum regx1 regx2<br />anova regx1 grp if grp<250 p="">* It seems there is no detectable difference in the means for those observations that converged more quickly than 15 iterations than those that converged more slowly.<br /><br />* This implies, assuming the results are generalizable that truncating the simulation to only the results that converge might produce unbiased estimates.<br /><br />* We should run the simulation again with more repetitions in order to confirm this.<br /><br />simulate ${savelist} , rep(500) seed(32): sim_converge 50<br />&nbsp; tab ic<br />&nbsp;<br />/* tab ic<br /><br />&nbsp; &nbsp; &nbsp; e(ic) | &nbsp; &nbsp; &nbsp;Freq. &nbsp; &nbsp; Percent &nbsp; &nbsp; &nbsp; &nbsp;Cum.<br />------------+-----------------------------------<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;10 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp;0.20 &nbsp; &nbsp; &nbsp; &nbsp;0.20<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;11 | &nbsp; &nbsp; &nbsp; &nbsp; 18 &nbsp; &nbsp; &nbsp; &nbsp;3.63 &nbsp; &nbsp; &nbsp; &nbsp;3.83<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;12 | &nbsp; &nbsp; &nbsp; &nbsp; 24 &nbsp; &nbsp; &nbsp; &nbsp;4.84 &nbsp; &nbsp; &nbsp; &nbsp;8.67<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;13 | &nbsp; &nbsp; &nbsp; &nbsp; 48 &nbsp; &nbsp; &nbsp; &nbsp;9.68 &nbsp; &nbsp; &nbsp; 18.35<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;14 | &nbsp; &nbsp; &nbsp; &nbsp; 57 &nbsp; &nbsp; &nbsp; 11.49 &nbsp; &nbsp; &nbsp; 29.84<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;15 | &nbsp; &nbsp; &nbsp; &nbsp; 50 &nbsp; &nbsp; &nbsp; 10.08 &nbsp; &nbsp; &nbsp; 39.92<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;16 | &nbsp; &nbsp; &nbsp; &nbsp; 24 &nbsp; &nbsp; &nbsp; &nbsp;4.84 &nbsp; &nbsp; &nbsp; 44.76<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;17 | &nbsp; &nbsp; &nbsp; &nbsp; 10 &nbsp; &nbsp; &nbsp; &nbsp;2.02 &nbsp; &nbsp; &nbsp; 46.77<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;18 | &nbsp; &nbsp; &nbsp; &nbsp; 18 &nbsp; &nbsp; &nbsp; &nbsp;3.63 &nbsp; &nbsp; &nbsp; 50.40<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;19 | &nbsp; &nbsp; &nbsp; &nbsp; 10 &nbsp; &nbsp; &nbsp; &nbsp;2.02 &nbsp; &nbsp; &nbsp; 52.42<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;20 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4 &nbsp; &nbsp; &nbsp; &nbsp;0.81 &nbsp; &nbsp; &nbsp; 53.23<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;21 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;6 &nbsp; &nbsp; &nbsp; &nbsp;1.21 &nbsp; &nbsp; &nbsp; 54.44<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;22 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp;0.20 &nbsp; &nbsp; &nbsp; 54.64<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;23 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3 &nbsp; &nbsp; &nbsp; &nbsp;0.60 &nbsp; &nbsp; &nbsp; 55.24<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;24 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4 &nbsp; &nbsp; &nbsp; &nbsp;0.81 &nbsp; &nbsp; &nbsp; 56.05<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;25 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; &nbsp; &nbsp;0.40 &nbsp; &nbsp; &nbsp; 56.45<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;26 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3 &nbsp; &nbsp; &nbsp; &nbsp;0.60 &nbsp; &nbsp; &nbsp; 57.06<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;29 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp;0.20 &nbsp; &nbsp; &nbsp; 57.26<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;32 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp;0.20 &nbsp; &nbsp; &nbsp; 57.46<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;33 | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp;0.20 &nbsp; &nbsp; &nbsp; 57.66<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;50 | &nbsp; &nbsp; &nbsp; &nbsp;210 &nbsp; &nbsp; &nbsp; 42.34 &nbsp; &nbsp; &nbsp;100.00<br />------------+-----------------------------------<br />&nbsp; &nbsp; &nbsp; Total | &nbsp; &nbsp; &nbsp; &nbsp;496 &nbsp; &nbsp; &nbsp;100.00<br />*/<br /><br />&nbsp; sum if ic==50<br />&nbsp; sum if ic<50 p=""><br />&nbsp; recode ic (1/14=0) (15/49=1), gen(grp)<br /><br />&nbsp; bysort grp: sum regx1 regx2<br /><br />/*<br />-&gt; grp = 0<br /><br />&nbsp; &nbsp; Variable | &nbsp; &nbsp; &nbsp; Obs &nbsp; &nbsp; &nbsp; &nbsp;Mean &nbsp; &nbsp;Std. Dev. &nbsp; &nbsp; &nbsp; Min &nbsp; &nbsp; &nbsp; &nbsp;Max<br />-------------+--------------------------------------------------------<br />&nbsp; &nbsp; &nbsp; &nbsp;regx1 | &nbsp; &nbsp; &nbsp; 148 &nbsp; &nbsp;.9918629 &nbsp; &nbsp;.1076053 &nbsp; &nbsp;.732878 &nbsp; 1.296374<br />&nbsp; &nbsp; &nbsp; &nbsp;regx2 | &nbsp; &nbsp; &nbsp; 148 &nbsp; &nbsp;.9807507 &nbsp; &nbsp; .112081 &nbsp; &nbsp;.684716 &nbsp; 1.447506<br /><br />-----------------------------------------------------------------------------------------<br />-&gt; grp = 1<br /><br />&nbsp; &nbsp; Variable | &nbsp; &nbsp; &nbsp; Obs &nbsp; &nbsp; &nbsp; &nbsp;Mean &nbsp; &nbsp;Std. Dev. &nbsp; &nbsp; &nbsp; Min &nbsp; &nbsp; &nbsp; &nbsp;Max<br />-------------+--------------------------------------------------------<br />&nbsp; &nbsp; &nbsp; &nbsp;regx1 | &nbsp; &nbsp; &nbsp; 138 &nbsp; &nbsp;.9967074 &nbsp; &nbsp;.1183514 &nbsp; .7492483 &nbsp; 1.318624<br />&nbsp; &nbsp; &nbsp; &nbsp;regx2 | &nbsp; &nbsp; &nbsp; 138 &nbsp; &nbsp;.9913069 &nbsp; &nbsp;.1161812 &nbsp; .6711526 &nbsp; 1.407439<br /><br />-----------------------------------------------------------------------------------------<br />-&gt; grp = 50<br /><br />&nbsp; &nbsp; Variable | &nbsp; &nbsp; &nbsp; Obs &nbsp; &nbsp; &nbsp; &nbsp;Mean &nbsp; &nbsp;Std. Dev. &nbsp; &nbsp; &nbsp; Min &nbsp; &nbsp; &nbsp; &nbsp;Max<br />-------------+--------------------------------------------------------<br />&nbsp; &nbsp; &nbsp; &nbsp;regx1 | &nbsp; &nbsp; &nbsp; 210 &nbsp; &nbsp; .814391 &nbsp; &nbsp;.3268973 &nbsp; .0518116 &nbsp; 1.590542<br />&nbsp; &nbsp; &nbsp; &nbsp;regx2 | &nbsp; &nbsp; &nbsp; 210 &nbsp; &nbsp;.8106436 &nbsp; &nbsp;.3278075 &nbsp; .0707162 &nbsp; 1.775239<br /><br />*/<br />&nbsp; anova regx1 grp if grp<50 p=""><br />/*<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Number of obs = &nbsp; &nbsp; 286 &nbsp; &nbsp; R-squared &nbsp; &nbsp; = &nbsp;0.0005<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Root MSE &nbsp; &nbsp; &nbsp;= .112917 &nbsp; &nbsp; Adj R-squared = -0.0031<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Source | &nbsp;Partial SS &nbsp; &nbsp;df &nbsp; &nbsp; &nbsp; MS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; F &nbsp; &nbsp; Prob &gt; F<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -----------+----------------------------------------------------<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Model | &nbsp;.001675983 &nbsp; &nbsp; 1 &nbsp;.001675983 &nbsp; &nbsp; &nbsp; 0.13 &nbsp; &nbsp; 0.7172<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;grp | &nbsp;.001675983 &nbsp; &nbsp; 1 &nbsp;.001675983 &nbsp; &nbsp; &nbsp; 0.13 &nbsp; &nbsp; 0.7172<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Residual | &nbsp;3.62106459 &nbsp; 284 &nbsp;.012750227 &nbsp;<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -----------+----------------------------------------------------<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Total | &nbsp;3.62274057 &nbsp; 285 &nbsp; .01271137 &nbsp;<br />*/<br /><br />* We can see that even when the sample size is larger (about 140 per iteration group) there is no discernable difference between those draws that converge before the first 15 iterations and those that converge after.<br /><br />* If we assume that for those draws that do not converge within 250 draws are sampling from the same probability of convergence distribution then this evidence suggests that rate of convergence is independent of the actual estimates and therefore it might be safe to exclude observations in which convergence did not occur.<br /><br /><br />* Now finally what we might interested in is seeing how our estimates change (for the values in which convergence is achieved) as we include more and more estimates by way of increasing our threshold of max iterations.<br /><br />* How do we do this?<br /><br />* Well, this might take a little while but we basically loop through the simulation saving the results it terms of means and standard deviations from each run.<br /><br />forv i = 15(5)35 {<br />&nbsp; simulate ${savelist} , rep(100) seed(32): sim_converge `i'<br />&nbsp; sum regx1 if ic&lt;`i'<br />&nbsp; &nbsp; global mean_x1_`i'=r(mean)<br />&nbsp; &nbsp; global var_x1_`i'=r(sd)^2<br />&nbsp; sum regx2 if ic&lt;`i'<br />&nbsp; &nbsp; global mean_x2_`i'=r(mean)<br />&nbsp; &nbsp; global var_x2_`i'=r(sd)^2<br />}<br />* By the way this is a highly redundant and inefficient method.<br /><br />clear<br />set obs 5<br />gen mean_x1 = .<br />&nbsp; label var mean_x1 "Mean of x1 estimates"<br />gen mean_x2 = .<br />&nbsp; label var mean_x2 "Mean of x2 estimates"<br />gen var_x1 = .<br />&nbsp; label var var_x1 "Variance of x1 estimates"<br />gen var_x2 = .<br />&nbsp; label var var_x2 "Variance of x2 estimates"<br /><br />gen i = .<br />&nbsp; label var i "Max # iterations"<br /><br />* Save the results as variables<br />forv i = 1(1)5 {<br />&nbsp; local ii = 10+`i'*5<br />&nbsp; replace mean_x1 = ${mean_x1_`ii'} if _n==`i'<br />&nbsp; replace mean_x2 = ${mean_x2_`ii'} if _n==`i'<br />&nbsp; replace var_x1 &nbsp;= ${var_x1_`ii'} &nbsp;if _n==`i'<br />&nbsp; replace var_x2 &nbsp;= ${var_x2_`ii'} &nbsp;if _n==`i'<br />&nbsp; replace i = `ii' if _n==`i'<br />}<br /><br />two (connected mean_x1 i, msize(large) lwidth(thick)) ///<br />&nbsp; &nbsp; (connected mean_x2 i, msize(large) lwidth(thick)), name(means, replace)<br />two (connected var_x1 &nbsp;i, msize(large) lwidth(thick)) ///<br />&nbsp; &nbsp; (connected var_x2 &nbsp;i, msize(large) lwidth(thick)), name(vars, &nbsp;replace)<br /><br />graph combine means vars, col(1) title("Estimates are insensitive to speed of convergence")<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlc/UNQsiOdWXGY/s1600/2013-04-10-nonconvergence.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="290" src="http://4.bp.blogspot.com/-wQqAhVgSMto/UWYVgqnvGXI/AAAAAAAAFlc/UNQsiOdWXGY/s400/2013-04-10-nonconvergence.png" width="400" /></a></div><br /><br />* The take away seems to be that it is safe (at least in this simulation) to exclude from your analysis simulations that did not converge in the specified iteration count.<br /><br />* This simulation also suggests that it is not ideal to include in your results MLE estimates from iterations in which no convergence was achieved.<br /></50></50></250></250></50>]]></content:encoded>
			<wfw:commentRss>http://www.econometricsbysimulation.com/feeds/6892368173635635687/comments/default</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>From my student files.</title>
		<link>http://srqm.tumblr.com/post/46901772207</link>
		<comments>http://srqm.tumblr.com/post/46901772207#comments</comments>
		<pubDate>Tue, 02 Apr 2013 01:56:00 +0000</pubDate>
		<dc:creator>SRQM</dc:creator>
				<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://srqm.tumblr.com/post/46901772207</guid>
		<description><![CDATA[ From my student files. <p>Continue reading <a href="http://srqm.tumblr.com/post/46901772207">From my student files.</a></p>]]></description>
				<content:encoded><![CDATA[<img src="http://25.media.tumblr.com/b631142db729f6267b209a72fcdcf8ee/tumblr_mkluq0rWpH1qktnb8o1_500.jpg"/><br/> <br/><img src="http://24.media.tumblr.com/118124ef79a2e9ee2114da76613dc70c/tumblr_mkluq0rWpH1qktnb8o2_500.jpg"/><br/> <br/><img src="http://25.media.tumblr.com/4936724a06110d617532f831a23ee1fd/tumblr_mkluq0rWpH1qktnb8o3_500.jpg"/><br/> <br/><img src="http://24.media.tumblr.com/3f931e07a3a5311c50e4dcec194496e2/tumblr_mkluq0rWpH1qktnb8o4_500.jpg"/><br/> <br/><img src="http://25.media.tumblr.com/cc80b2b5109ea5033f244b28ea731baa/tumblr_mkluq0rWpH1qktnb8o5_500.png"/><br/> <br/><img src="http://25.media.tumblr.com/39c91ca49168bdb0161a5443ece2a41c/tumblr_mkluq0rWpH1qktnb8o6_r1_500.png"/><br/> <br/><p>From my student files.</p>]]></content:encoded>
			<wfw:commentRss>http://stata-bloggers.com/about/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>A shorter lookfor command, in five lines of code</title>
		<link>http://srqm.tumblr.com/post/46731180459</link>
		<comments>http://srqm.tumblr.com/post/46731180459#comments</comments>
		<pubDate>Sun, 31 Mar 2013 04:00:00 +0000</pubDate>
		<dc:creator>SRQM</dc:creator>
				<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://srqm.tumblr.com/post/46731180459</guid>
		<description><![CDATA[One thing that I like about Stata is the possibility to write quick wrappers for commands that get things done. The code below is an example that I wrote to search for variables in less keystrokes than lookfor (which cannot be abbreviated). I also wan... <p>Continue reading <a href="http://srqm.tumblr.com/post/46731180459">A shorter lookfor command, in five lines of code</a></p>]]></description>
				<content:encoded><![CDATA[<p>One thing that I like about Stata is the possibility to write quick wrappers for commands that get things done. The code below is an example that I wrote  to search for variables in less keystrokes than <code>lookfor</code> (which cannot be abbreviated). I also wanted to get numbers of observations at the same time.</p>

<pre><code>    cap pr drop find
    program find, rclass
        qui lookfor `*'
        if "`r(varlist)'" != "" codebook `r(varlist)', c
    end
</code></pre>

<p>This returns the output of the <code>codebook</code> command with the <code>compact</code> option instead of the standard <code>lookfor</code> output, which is based on the <code>describe</code> command. The variable labels are less readable, and long variable names still get abbreviated (such a strange idea). Yet it works, and returns the <em>N</em>.</p>

<p>I tried to call the program <code>look</code>, but that is still taken by a deprecated command, <code>lookup</code>, which has been superseded by <code>search</code>. As for the single letter <code>l</code>, it is taken by the <code>list</code> command, so that will not work either. The same holds for <code>q</code>, and <code>?</code> is sadly not a valid program name.</p>]]></content:encoded>
			<wfw:commentRss>http://stata-bloggers.com/about/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Capacity for Love and Prejudice &#8211; Stata Simulation</title>
		<link>http://www.econometricsbysimulation.com/2013/03/capacity-for-love-and-prejudice-stata.html</link>
		<comments>http://www.econometricsbysimulation.com/2013/03/capacity-for-love-and-prejudice-stata.html#comments</comments>
		<pubDate>Fri, 29 Mar 2013 14:00:00 +0000</pubDate>
		<dc:creator>Francis Smart</dc:creator>
				<category><![CDATA[equality]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[human rights]]></category>
		<category><![CDATA[Stata]]></category>
		<category><![CDATA[text plotted on graphs]]></category>
		<category><![CDATA[twoway graphs]]></category>

		<guid isPermaLink="false">http://stata-bloggers.com/?guid=648e5af171965b5d3522c0d0fe1ebd2a</guid>
		<description><![CDATA[* This is my simple hypothesis (really my own personal prejudice):* the more prejudice someone allows themselves to be the less capacity they have for love.* In this simulation I will attempt to generate a graph which convey's this idea with a pink equ... <p>Continue reading <a href="http://www.econometricsbysimulation.com/2013/03/capacity-for-love-and-prejudice-stata.html">Capacity for Love and Prejudice &#8211; Stata Simulation</a></p>]]></description>
				<content:encoded><![CDATA[<br /><a href="http://2.bp.blogspot.com/-7TpsoAWqNpM/UVUDLgk5uII/AAAAAAAAFeU/c7AngOFnHYw/s1600/2013-03-29-Love_vs_prejudice.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="145" src="http://2.bp.blogspot.com/-7TpsoAWqNpM/UVUDLgk5uII/AAAAAAAAFeU/c7AngOFnHYw/s200/2013-03-29-Love_vs_prejudice.png" width="200" /></a>* This is my simple hypothesis (really my own personal prejudice):<br />* the more prejudice someone allows themselves to be the less capacity they have for love.<br /><br />* In this simulation I will attempt to generate a graph which convey's this idea with a pink equals sign superimposed on it :)<br /><br />* Sorry for the blatant politicizing.<br /><br />* However, I think this is a significant human rights battle being waged in the US.<br /><br />* Though, it is not the last human rights battle that needs to be wages nor perhaps the most dire.<br /><br />* Other important human rights issues are criminal sentencing, hunger and poverty among children, and continued widespread racial discrimination.<br /><br />* However, this is the battle of today and should be addressed.<br /><br />* In this very simple simulation I will attempt to generate a graphic which is a variant of the pink equal sign.<br />clear<br />set obs 400<br /><br />gen prejudice = (runiform()-.5)*2<br />gen unobserved = rnormal()<br />gen love = unobserved - 3*prejudice<br /><br />twoway (lpolyci love prejudice, fcolor("242 191 241") blcolor("255 128 192") degree(5) /* <br />&nbsp; This first line will generate a best fit line through the data with a 5 degree polynomial<br />&nbsp; &nbsp; &nbsp; &nbsp; */ , text(0 0 "=", size(full) color("255 128 255") ) ) /*<br />&nbsp; This will place an equals sign at the cordinates 0,0 and make it "full" size which is the large<br />&nbsp; &nbsp; but not as large as I would prefer it to be.<br />&nbsp; &nbsp; &nbsp; &nbsp;*/ (scatter love prejudice, mcolor("red") ), /*<br />&nbsp; This draws my generated data points on the graph<br /><span class="Apple-tab-span" style="white-space: pre;"> </span> &nbsp; */ ytitle(capacity for love, size(large)) ytitle(, color("255 128 255")) ylabel(none) /*<br />&nbsp; This will tell the y axis what to place on it and what color to make it.<br />&nbsp; &nbsp; The labeling of the y axis is naturally supressed by the twoway option since<br />&nbsp; &nbsp; each twoway graph can have its own x and y corrdinates<br /><span class="Apple-tab-span" style="white-space: pre;"> </span> &nbsp; */ xtitle(capacity for &nbsp;prejudice, size(large )) xtitle(, color("255 128 255")) xlabel(none) /*<br /><span class="Apple-tab-span" style="white-space: pre;"> </span> &nbsp; */ legend(off) &nbsp;graphregion(fcolor("168 0 0"))<br />&nbsp; * I turn the legend off and change the background color to be a crimson.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-7TpsoAWqNpM/UVUDLgk5uII/AAAAAAAAFeY/EvgHAwtgTRM/s1600/2013-03-29-Love_vs_prejudice.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="290" src="http://3.bp.blogspot.com/-7TpsoAWqNpM/UVUDLgk5uII/AAAAAAAAFeY/EvgHAwtgTRM/s400/2013-03-29-Love_vs_prejudice.png" width="400" /></a></div><br />]]></content:encoded>
			<wfw:commentRss>http://www.econometricsbysimulation.com/feeds/2321485213255442144/comments/default</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
	</channel>
</rss>
