Compositional imprecise probability

In Lowbrow Bayesian Updates, I hinted at the end that the truly enlighted way to reason about the world was to use both possibility and probability, which goes by the name of “infrabayesianism” or “imprecise probability.”

The way to get a handle on what this means is the following

  • In the probabilistic setting, a prior is a probability distribution over states of the world, which measures how likely each state is
  • In the possibilistic setting, a prior is a subset of states of the world, which measures which states are possible
  • In the “imprecise probability/infrabayesian” setting, a prior is a subset of probability distributions over the states of the world, which measures the possibilities for the probabilities! We call these “infrapriors.”

Imprecise probability allows you to describe belief-states in a more nuanced way than probability. For instance, imagine that it is 1903, and the Wright Brothers haven’t yet demonstrated that sustained heavier-than-air flight is possible for humans. Then your “infraprior” for the question “will there be heavier-than-air flight by the end of the year” is a set of probabilities that might include 0%, which would indicate that you believe that it is possible that heavier than air flight needs technology that won’t be invented for a century, so no amount of luck would ever cause there to be heavier-than-air flight in a year. But you might also include 10%, which indicates that heavier-than-air flight is possible but unlikely.

In other words, you can say “it is possible that human heavier-than-air flight is impossible with the technology of the next century.” Which is a bit of a strange sentence, but one that you could imagine someone in 1903 saying (in fact, the New York Times said something even dumber: Did Wright Brothers Fly Same Year NYT Said Flying Machines Could Take 10M Years To Develop? | Snopes.com).

I should note that technically speaking, infrapriors are convex sets of probability distributions, which means that if you consider p_1 to be possible and p_2 to be possible, then for any \lambda \in [0,1], you have to also consider \lambda p_1 + (1-\lambda)p_2 to be possible.

If you think it’s tricky to wrap your head around the intuitive meaning of infrapriors, then it gets even worse when you try to do rigorous math with them, and especially when you try to do category theory!

We can study probability theory and possibility theory using Markov categories, because the Kleisli categories for the probability monad and the non-empty powerset monad both form Markov categories.

However, somewhat surprisingly when we combine possibility and probability, we no longer get a Markov category! This is essentially because given an infraprior B_X \subset \mathcal{P}(X) and an infraprior B_Y \subset \mathcal{P}(Y), there isn’t a canonical way of making an infraprior B_X \otimes B_Y \subset \mathcal{P}(X \times Y). Or rather, there are several somewhat canonical ways, but none of them satisfy all of the technical conditions that we would need to make a Markov category.

For instance, one way would be to say B_X \otimes B_Y is all of the probability distributions of the form p_X \times p_Y, with p_X \in B_X and p_Y \in B_Y. But another way would be to say that B_X \otimes B_Y is all of the probability distributions of the form p such that p restricted to X is in B_X and p restricted to Y is in B_Y. The first one amounts “assuming the independence” of X and Y, while the second one does not assume independence. Both of these are useful, and then there are also even other ones that are asymmetric and also useful.

This has been a bit of a thorny problem for a while, but recently Jack Liell-Cock and Sam Staton have made some serious progress, by taking a slightly different approach to imprecise probability than “convex subset of distributions,” in a paper called Compositional imprecise probability. The key idea is to name each of the nondeterministic choices that you make. I.e. explicitly write out each unknown factor where you just feel like you have no information to decide either way. Then if A is the set of names for choices, your “infraprior” is a function 2^A \to \mathcal{P}(X). That is, for each assignment of true/false to all of your choices, you get a probability distribution on X.

In my opinion, this is quite natural from a practical point of view, because when you are reasoning out your belief, you should record explicitly all of the unknowns.

But not only is this practically natural, it turns out to produce better theory! I think at this point though the paper does a better job explaining how this works technically than I would here, so I encourage you to check it out.

4 Likes