Lowbrow Bayesian Updates

Yesterday, I realized that the basic idea of Bayesian updating doesn’t require any fancy probability theory, or even any numbers. Many people have told me about this, including but not limited to @davidad, @mromang08, Elena Di Lavore, Nathaniel Virgo, @Adele_Lopez, but it took until yesterday for it really to sink in.

It is extremely simple. Suppose that I have a set W, elements of which are “world states.” For instance, I might choose W = \{\mathrm{morning}, \mathrm{noonish}, \mathrm{teatime}, \mathrm{sleeptime}\} (I’m trying to get into the British spirit to prepare for going to Oxford). My knowledge about the world is expressed as a subset K \subset W. For instance, if I know that it’s daytime but don’t have any sort of clock and just woke up from an unknown amount of sleeping in, then K = \{\mathrm{morning}, \mathrm{noonish}, \mathrm{teatime}\}. This is my “prior.” Now, suppose I make some new observation, like “people around me are eating little tea cookies.” Let’s say for the sake of argument that this is possible in the morning, or at teatime, but impossible at noonish (I know I’m going to get some objections to this, but bear with me). Then my state of knowledge after that observation is \{\mathrm{morning}, \mathrm{teatime}\}.

More generally, “updating my state of knowledge about the world” just means eliminating hypotheses that are inconsistent with observation. This is really not a fancy operation; this is maybe the most basic description of the process of “reasoning about the world” that there is. I have heard this called “possibilistic Bayesian updating,” but it probably goes by many other names.

Now, an orthodox Bayesian might object in the following way. “Why don’t you just represent your knowledge by the uniform distribution on K, as maximum entropy would imply?” Well, even in the finite case, you still have to worry about your base measure. For instance, one might argue that the probability distribution (0.33, 0.33, 0.33, 0) is wrong, because we should weight times of day in proportion to how long they are. But this opens the rabbit hole of “have you really incorporated all the information you know into your prior?” For instance, I might actually have more evidence, like I might know how sleepy I feel. At some point you have to sit down and do some math, and I feel more confident about math when it makes fewer assumptions.

On the other hand, an orthodox Bayesian might object in another way. “There is a small chance that people actually eat tea cookies around noon!” If this is true, then in the possibilistic setting I can’t update my belief at all because I can’t completely rule out any possibility! Whereas in the probabilistic setting I can update to thinking “it’s still possible that it’s noonish, but less likely.”

Can we get the best of both worlds, can we both have the ability to say “eh, I have no idea what the probabilities here are, but I know that these things are possible and these things aren’t possible” and also the ability to say “this evidence doesn’t rule out noon, but it makes it less likely”? Yes, it’s called infra-Bayesianism! But a summary of this would no longer be suitable for a post titled “Lowbrow Bayesian Updates,” because I would consider it to be “Highbrow Bayesian Updates”!

1 Like

There are attempts to coarse grain probabilities by different semirings, such as Giorgolo and Asudeh’s One Semiring to Rule Them All. The authors define a semiring on the set {I(mpossible), U(unlikely), P(ossible), L(ikely), C(ertain)}, claiming its use matches everyday reasoning.

1 Like

It seems like this is the semiring associated to the meet/join for the linear order on {I,U,P,L,C}, so it doesn’t seem like the specific choice of five elements is particularly canonical… But I only read the first bit of the paper so perhaps they mention this. I do like the idea of using a general semiring for probability-like things though.

1 Like

Sure, the choice is non-canonical – a little more fine-grained than your effective choice of {I, P, C}.

Maybe something interesting in their choice is that there isn’t a surjective semiring homomorphism from [0, 1] to {I,U,P,L,C}. Presumably this relates to their proposed solution to the conjunction fallacy.

Has someone written an account of Bayesian reasoning, or other aspects of probability theory, using a general commutative semiring? For example did Giorgolo and Asudeh do this before choosing “one semiring to rule them all”? It seems a lot better to work generally and then specialize than to pick one particular 5-element semiring and do a lot of work using just that.

This comment may seem like the pot calling the kettle black, given that I’ve written a 53-page paper on 2-rigs that’s almost entirely about one particular 2-rig. But at least it was the free 2-rig on one generator!

2 Likes

It seems like it might be begging the question to refer to the Giorgolo-Asudeh work as being “about probability theory” at all, since there are simple axioms (de Finetti is the name, I think?) forcing your beliefs to accord with ordinary probability theory over [0,1][0,1][0,1] if you’re going to follow them. But I do like thinking about these non-probabilistic reasoning situations as still about some kind of generalized Bayesian updating…

I just think that “normal reasoning” doesn’t need to worry about “ruling out” noon; saying something is impossible in an everyday way just doesn’t mean the same thing as saying it has “probability 0” in a more formal Bayesian framework.

Maybe it’s more like, the list of possibilities is what I’m actually currently willing to consider. This should be (i.e. actually is) defeasible, like in the non-monotonic logics @kris-brown has been thinking about lately, rather than panicking about the totally quotidian idea of finding out something you had been ruling out is actually (though surprisingly) possible again.

1 Like