R, Stata and matching additional learning costs

Francis Smart recently pointed to an important difference between R and Stata from a teaching perspective, which has to do with the additional learning costs of vectorization in R over the single-dataset orientation of Stata.

Stata makes it easy to manipulate names, or more specifically, variable names, as in a dataset with three variables for social expenditure called party1 party2 party3. This is common to many empirical preprocessed datasets.

 // example mvdecode party*, mv(999) 

Furthermore, Stata works like an accountant’s book, so all variables belong to a same data object that never needs to be called beyond loading. This naturally suppresses a lot of possibilities, compensated in part by macros and scalars.

 // example loc regressors "age sex" 

Macros in particular then branch with loops like the forval and foreach commands to allow more complex data processing. At that level of use, the software is flexible enough for most applied data cleaning.

 // example forval i = 1/3 { replace socx`i' = socx`i' / 10^6 } 

To access matrix notation, the Stata user needs to move to Mata syntax, while R immediately offers the user to manipulate objects through vectorization. Thinking in these terms is more demanding as there are more possibilities for errors, starting with calls to undeclared objects.

I teach both R and Stata. My experience with social science students is that the additional learning costs of R syntax need to be matched with other benefits to become valuable to them. To me, these benefits lie primordially in the more diverse array of data that R allows to access.

Continue reading R, Stata and matching additional learning costs

Quandl Package – 5,000,000 free datasets at the tip of your fingers!

# Yes, you read that correctly and no Quandl (http://www.quandl.com/) did not pay me anything.# Quandl is a new database management tool which seeks to become the place to find datasets.  They boast of having over 5×10^6 data sets available t…

Continue reading Quandl Package – 5,000,000 free datasets at the tip of your fingers!

“By using Excel, which was never designed for scientific research, they institutionalized mouse…”

“By using Excel, which was never designed for scientific research, they institutionalized mouse clicks and other untraceable actions into a scientific workflow, which must be avoided since it makes explaining to others (and to oneself) how to replicate the findings next to impossible and too easily introduces inadvertent mistakes.”

Period. The replication was carried with R, and additional analysis (easily found online) was done with Stata.

Victoria Stodden at What the Reinhart & Rogoff Debacle Really Shows: Verifying Empirical Results Needs to be Routine — The Monkey Cage

Continue reading “By using Excel, which was never designed for scientific research, they institutionalized mouse…”

Benchmarks

I went googling for some examples of quadratic programming done in Mata, and stumbled across a fairly recent Statalist discussion. The original question is here and the official response, typically prompt, is here. I tested Patrick Roland’s code on my own machine (2011 MacBook Pro Core2 i5) but with Octave instead of MATLAB, and with [...]

Continue reading Benchmarks

Importnew.ado (requires R)

| Gabriel | After hearing from two friends in a single day who are still on Stata 10 that they were having trouble opening Stata 12 .dta files, I rewrote my importspss.ado script to translate Stata files into an older format, by default Stata 9. I’ve tested this with Stata 12 and in theory it […]

Continue reading Importnew.ado (requires R)

Misc Links

| Gabriel | Useful detailed overview of Lion. The user interface stuff doesn’t interest me nearly as much as the tight integration of version control and “resume.” Also, worth checking if your apps are compatible. (Stata and Lyx are supposed to work fine. TextMate is supposed to run OK with some minor bugs. No word on […]

Continue reading Misc Links