Yesterday, I realized that the basic idea of Bayesian updating doesn’t require any fancy probability theory, or even any numbers. Many people have told me about this, including but not limited to @davidad, @mromang08, Elena Di Lavore, Nathaniel Virgo, @Adele_Lopez, but it took until yesterday for it really to sink in.
It is extremely simple. Suppose that I have a set W, elements of which are “world states.” For instance, I might choose W = \{\mathrm{morning}, \mathrm{noonish}, \mathrm{teatime}, \mathrm{sleeptime}\} (I’m trying to get into the British spirit to prepare for going to Oxford). My knowledge about the world is expressed as a subset K \subset W. For instance, if I know that it’s daytime but don’t have any sort of clock and just woke up from an unknown amount of sleeping in, then K = \{\mathrm{morning}, \mathrm{noonish}, \mathrm{teatime}\}. This is my “prior.” Now, suppose I make some new observation, like “people around me are eating little tea cookies.” Let’s say for the sake of argument that this is possible in the morning, or at teatime, but impossible at noonish (I know I’m going to get some objections to this, but bear with me). Then my state of knowledge after that observation is \{\mathrm{morning}, \mathrm{teatime}\}.
More generally, “updating my state of knowledge about the world” just means eliminating hypotheses that are inconsistent with observation. This is really not a fancy operation; this is maybe the most basic description of the process of “reasoning about the world” that there is. I have heard this called “possibilistic Bayesian updating,” but it probably goes by many other names.
Now, an orthodox Bayesian might object in the following way. “Why don’t you just represent your knowledge by the uniform distribution on K, as maximum entropy would imply?” Well, even in the finite case, you still have to worry about your base measure. For instance, one might argue that the probability distribution (0.33, 0.33, 0.33, 0) is wrong, because we should weight times of day in proportion to how long they are. But this opens the rabbit hole of “have you really incorporated all the information you know into your prior?” For instance, I might actually have more evidence, like I might know how sleepy I feel. At some point you have to sit down and do some math, and I feel more confident about math when it makes fewer assumptions.
On the other hand, an orthodox Bayesian might object in another way. “There is a small chance that people actually eat tea cookies around noon!” If this is true, then in the possibilistic setting I can’t update my belief at all because I can’t completely rule out any possibility! Whereas in the probabilistic setting I can update to thinking “it’s still possible that it’s noonish, but less likely.”
Can we get the best of both worlds, can we both have the ability to say “eh, I have no idea what the probabilities here are, but I know that these things are possible and these things aren’t possible” and also the ability to say “this evidence doesn’t rule out noon, but it makes it less likely”? Yes, it’s called infra-Bayesianism! But a summary of this would no longer be suitable for a post titled “Lowbrow Bayesian Updates,” because I would consider it to be “Highbrow Bayesian Updates”!