I woke up and suddenly understood Bayesian statistics…
March 2, 2008 – 1:44 pmIt sounds crazy, but that’s pretty much what happened. I woke up too early (about 6) and as I lay in bed, my mind drifted to a Bayesian statistics example that I had read on Friday afternoon. I had read all of the parts of it and understood the sentences and concepts presented (for about the 10th time; I’ve been curious about this for years) but it wasn’t until this morning that I felt like I got it. I’ve always known it has something to do with prior information and it’s a computationally intensive technique, but this morning I was finally able to answer the question: why do people bother doing something this abstruse?
For me, a couple things made it click. First, the example, which was in Introduction to Applied Bayesian Statistics and Estimation for Social Scientists (pp. 47-49; kudos to author Scott Lynch). It had to do with a young woman who is pregnant and takes a pregnancy test that comes out positive. But the test has a known error rate for false positives and negatives. Ok, great, so we do some math and find out the chances that she is pregnant. But here’s the kicker! Let’s say we also know that she only had sex once and we know the likelihood of pregnancy from a single encounter (which is relatively low). That is relevant prior information and the Bayesian methods have a way of incorporating that, which is like, pretty cool. When we incorporate that information, it turns out that our expectation of pregnancy should be a lot lower.
Another important part of the puzzle that helped me put this simple pregnancy example in perspective is something I found out while learning a lot of other multivariate statistics recently: a lot of the hoopla about Bayesian techniques is not necessarily about the inclusion of prior information, but instead about the computational techniques that are used to estimate model parameters (computationally intense resampling techniques like Markov chain Monte Carlo methods). (Apparently a lot of models don’t even have much prior information.) And to put that in perspective, model parameters are the core of statistics: every measure of central tendency, estimation of uncertainty and model coefficient has a model parameter and the estimation of those is pretty much what statistics is.
So, perhaps I’m just slow and dense, or perhaps the concepts and philosophy behind Bayesian statistics are actually challenging and subtle! I still have no idea how to implement this stuff or most of the details, but I feel like when I read about it now, I will understand why people bother. Either way, it’s about time, and I’m looking forward to exploring more.
2 Responses to “I woke up and suddenly understood Bayesian statistics…”
How did I find this blog? Google!
I didn’t know you had such an awesome Web site.
I know you are insane—–ly brilliant. It’s SO cool that you now understand baye-stats. Statistics itself is too complex for me. I barely understand average, variation, min, max, and stuff like that. I literally flunked basic stats in undergrad. Anyway, I’m learning the hard way that to be able to do good data analysis, it is imperative that I know my stats. I wish model development was simple like back in the days when Boyle, Charles, and Gay-Lussac only needed to know how to draw a straight line with their dataset. I won’t say I could do that because honestly I don’t 100% understand linear regressions!
I’ll bookmark this blog and hopefully drop some scientific comments instead of this rant style comment. That stats model fit vs prediction post is…good stuff. I need to think more before I say more. I already sound like a drunk typing. (I’m really not drunk, though.)
By Brian Seok on Mar 7, 2008
Brian, thanks for your comment, drunken or not! And don’t worry: I am correspondingly intimidated when I see all of your rad snow physics data. After we do the snow survey later this month, perhaps I will write a post on snow biogeochemistry and you can go buck wild. Please do visit again!
By Anthony on Mar 10, 2008