Conditional Probability

For a while, I’ve been fascinated by the way Jaynes does probability theory in Probability Theory: The Logic of Science.

Specifically, Jaynes is very careful to always work with conditional probability. Rather than positing some big measure space \Omega and assuming there is a single distribution on it that all conditional distributions are a result of, he works with conditional distributions natively, which gives everything a more local feel. This post is the result of some thoughts I’ve had recently on how to formalize this approach to probability theory.

Frames and Valuations

A frame \mathcal{O} is a poset with small coproducts and finite limits which satisfies the infinite distributive law

x \wedge \bigvee_i y_i \leq \bigvee_i x \wedge y_i

where i ranges over some small set. This generalizes topology to the case where the “open subsets” are not necessarily subsets of a fixed set \Omega.

A valuation on a frame is a function \nu \colon \mathcal{O} \to [0,\infty) from the frame to the positive reals that is

  1. Monotonic: \nu(A) \leq \nu(B) when A \leq B
  2. Strict: \nu(\bot) = 0
  3. Modular: \nu(A) + \nu(B) = \nu(A \vee B) + \nu(A \wedge B) (this is called the “sum rule” in Jaynes)

Valuations on frames generalize probability distributions on measurable spaces.

For an elegant perspective on the modular property, see David Spivak’s mathoverflow post.

Counterfactuals and conditioning

In this blog post, I propose an extension of valuations which I believe handles counterfactuals in a better manner. Specifically, we might define conditional probability from a valuation via

p(A|B) = \frac{\nu(A \wedge B)}{\nu(B)}

However, the value of \nu(B) may be 0, in which case this is undefined. If we want to represent Bayesian priors via probability distributions, this is unsatisfactory, because I have a strong prior that if there is an elephant in my house, I’m probably going to have to clean up poop later, even though I assign a probability of 0 to there being an elephant in my house.

Therefore, I propose the following definition.

A conditioned distribution on \mathcal{O} is an enrichment p(-|-) of \mathcal{O} in the discrete monoidal category ([0,1], \cdot), such that p(-|C) is a valuation on the slice category \mathcal{O}/C for every C \in \mathcal{O}.

That is, for all A \leq C, p(A|C) \in [0,1], p(C|C) = 1, and for all A \leq B \leq C,

p(A|C) = p(A|B)p(B|C)

Note that by “enriching a category” here I mean something slightly different than creating an enriched category; enriching a category \mathcal{C} in a monoidal category \mathcal{V} means assigning an object of \mathcal{V} to every morphism in \mathcal{C}. Taking \mathcal{C} to be an indiscrete category (i.e. the complete graph) recovers the notion of enriched category. I’m not sure where to find a reference for this; I learned this sense of enriching from Matteo Capucci.

Conditioned distributions allow us to work with p(A|C) even when p(C|\top) = 0, where \top is the terminal object. In fact, they allow us to work even when there is no terminal object, so that objects of the frame have no “global” probability, only local, relative probability.

Notice that some things are slightly different from how Jaynes deals with probability theory. First of all, notice that we have only defined p(A|C) when A\leq C. We can extend this to the case when A \not \leq C by defining

p(A | C) := p(A \wedge C | C)

With this definition, we can prove the classic product rule from Jaynes

p(A \wedge B | C) = p(A|B \wedge C) p(B|C)


p(A \wedge B | C) := p(A \wedge B \wedge C | C)
= p(A \wedge B \wedge C | B \wedge C) p(B \wedge C|C)
=: p(A | B \wedge C) p(B|C)

Thus my “conditioned distribution” satisfies both the product and the sum rules.

Possible Extensions

I believe that we can extend the notion of a conditioning distribution to be an enrichment on an arbitrary category with finite limits, infinite coproducts, and some analogue of the infinite distributive law. We then require p(-|C) to be a valuation on \mathrm{Sub}(C) for each C.

We can consider the space \mathrm{P}(\Omega) of conditioned distributions on a frame \mathcal{O}. Interestingly enough, this doesn’t seem to be a convex space, at least not in the obvious way, because I believe that in general

\lambda p(A|C) + (1-\lambda)p'(A|C) \neq (\lambda p(A|B) + (1-\lambda)p'(A|B))(\lambda p(B|C) + (1-\lambda)p'(B|C))

I would like to understand how to extend this picture to infradistributions, because I want to talk about, for instance, the set of conditioned distributions where p(-|C) is some fixed distribution, but p(C|-) is allowed to vary. This models the situation where we know probabilities given the presumption that our model is correct, but have no idea whether or not our model is correct. Of course, we might have some probability that our model is correct, given some other assumptions, but at some point there are assumptions that we make that we have Knightian uncertainty about. In one formalism for infradistributions, we require that the set of distributions be a convex set; I’m not sure what the right analogous condition for conditioned distributions is.

But in any case, I believe that we may at least consider \mathrm{P} as a functor \mathsf{Frm}^{\mathrm{op}} \to \mathsf{Set}, where if f \colon \mathcal{O} \to \mathcal{O}' is a frame homomorphism and p a conditioned distribution on \mathcal{O}', we can define f^\ast(p)(A|C) = p(f(A)|f(C)). Then we are in presheaf land, and we can think about different categorical constructions, like the category of elements, etc.

I’m looking forward to seeing other people’s thoughts on this, and seeing where this can go!

1 Like

I like this a lot, and have had similar thoughts. For example in control theory, you often want the speed of the car or whatever to be at a measure zero subset, e.g. 60 miles per hour, and you still need to condition.

But I wouldn’t use your/Mateo’s terminology of “enriching a category” in V, as though it’s some other but nebulously-related notion. Indeed, your category is enriched in the coproduct completion of V; let’s notate that as ∑V. A hom-object in a ∑V-enriched category consists of a set for which each element is assigned an object in V.

Finally, I wonder if the “extension” notion you’re looking for is geometric category.