Continuing on from what we worked on yesterday, today we tried to extend the picture to incorporate probabilities.

We were lucky enough to have Vanessa Kosoy working with us today (and Kris Brown and Jade joined as well), and Vanessa told us about how there is a nicer structure for coarse graining in decision theory when you work with *infradistributions* instead of *distributions*.

An infradistribution on a measurable space X is a closed, convex subset U \subset \Delta X, where \Delta X is the space of distributions on X. We denote the set of infradistributions on X by \square X.

There is an obvious poset structure on \square X given by subset inclusion. However, this doesnâ€™t capture all of the structure that I think is important in \square X. Specifically, if V = \{p\} and U = \{q\}, and p \neq q, then the poset structure on \square X doesnâ€™t tell you anything about V and U even if p is very close to q.

To rectify this, we want to introduce a â€śfuzzificationâ€ť of the subset inclusion. Specifically, to two infradistributions V and U, I want to assign an extended positive real number d(V, U) \in [0,\infty] which measures â€śthe failure of U to contain Vâ€ť. If this number is 0, then it should be the case that V \subset U, and if this number is \infty, then some probability distribution in V is totally incompatible with every probability distribution U. I.e., it could assign positive probability to some event that every probability distribution in U assigns 0 probability to.

My proposal for such a function d is

where D(p || q) is the Kullback-Leibler divergence of p and q. This function satisfies the two heuristics that I said above.

For the first one, If d(V,U) = 0, then for every p \in V,

Because U is closed, convex, and consequently compact, this infimum is achieved at some q', so there is some q' \in U with D(p || q') = 0. But then p = q'. Thus, all p \in V have p \in U, so V \subset U.

For the second one, D(p || q) = \infty if p is not absolutely continuous with respect to q (itâ€™s not quite an only if, but perhaps for nice X, like finite X, it is an only if). And moreover, again because of compact closedness, if d(V,U) = \infty, then there exists some p \in V such that D(p || q) = \infty for all q \in U, which we can interpret as â€śp is incompatible with every distribution in Uâ€ť.

I now claim that with d, \square X is a Lawvere metric space, which is a category enriched in the monoidal category ([0,\infty], \geq, +, 0). In other words, it satisfies the following

- Reflexivity. For all U \in \square X,

- Transitivity. For all U,V,W \in \square X

Reflexivity is easy to show; the challenge is transitivity. But this actually fails! This is because Kullback-Leibler divergence famously does not obey the triangle identity, so if we just consider singleton sets we can show a failure of transitivity.

So it seems like category theory doesnâ€™t have something to say about this; there isnâ€™t a nice structure on our d(V,U). However, there is a some categorical structure on Kullback-Leibler divergence! The trick is that compositionality in the Kullback-Leibler divergence shows up when you go from just considering distributions to considering *kernels*. Fortunately, we can do the same thing for infradistributions; there is a perfectly good Kleisli category for the infradistribution monad \square!

So can we extend our d to a *divergence* on the Kleisli category of the infradistribution monad? That will have to be the subject of another post.

Another approach here would be to choose something that was actually a metric on the space of distributions, and then extend this to infradistributions in a similar way; I hope that this actually *does* create a fuzzy poset. I hope that both of these will be fruitful directions, and hope to update you dear reader on new results soon!