William Gould on Stata’s blog (previously mentioned
here) has two great posts (
here and
here) on the intuition behind matrices and regression coefficients. The section on near-singular matrices is characteristically nice:
Singular matrices are an extreme case of nearly singular matrices, which are the bane of my existence here at StataCorp. Here is what it means for a matrix to be nearly singular: [see figure]
Nearly singular matrices result in spaces that are heavily but not fully compressed. In nearly singular matrices, the mapping from x to y is still one-to-one, but x‘s that are far away from each other can end up having nearly equal y values. Nearly singular matrices cause finite-precision computers difficulty. Calculating y = Ax is easy enough, but to calculate the reverse transform x = A-1y means taking small differences and blowing them back up, which can be a numeric disaster in the making.
Both posts are great and I recommend them for anyone struggling with the intuition behind what exactly you’re doing when you type in reg y x.
As an added bonus, earlier this week I stumbled across Kenneth Simon’s
excellent pdf cheat sheet of Stata commands for intermediate / advanced econometrics,
here. I was trying to figure out a way to do something cute with distributed lag models and post-estimation tests, but the sheet covers everything from the simple but important (e.g., the difference between
gen old = age >= 18 and
gen old = age >= 18 if age<. ) to the arcane but potentially important (e.g., nonlinear hypothesis testing). If you’re in applied work and use Stata I
highly recommend flipping through it. I’ve already found several useful techniques I wasn’t even aware existed.
Continue reading Stata blog post on understanding matrices (with bonus Stata cheat sheet)