Probability spaces form a lotteries-algebra

Any weighted sum of probability spaces is again a probability space. This idea is captured using the notion of polynomial monad algebras.

There is a polynomial monad \mathtt{lott} of lotteries: a position is a finite number N of “lottery tickets”, together with a probability distribution p:\Delta_N on it, and a direction there is a choice of lottery tickets. That is,
\mathtt{lott}\coloneqq\sum_{N:\mathbb{N}}\Delta_N\mathcal{y}^N.

It turns out that the polynomial \mathtt{prob} of probability spaces is an algebra on this monad, i.e. there is a Cartesian map
\sigma\colon\mathtt{lott}\lhd\mathtt{prob}\to\mathtt{prob}
satisfying the usual laws. Here, I’m defining
\mathtt{prob}\coloneqq\sum_{\{(S,\mu)\mid S\text{ : measurable space}, \mu \text{ : probability measure on S\}}}\mathcal{y}^S
to have a position given by a measurable space equipped with a normalized measure on it—a probability space—and directions there are given by points of the space.

So what is the map \sigma saying? It just says that given a lottery where each ticket points to a probability space, we can get a new probability space. Namely: take the disjoint union and assign a measure given by weighted sum. The cartesianness of \sigma is just saying that a point of this disjoint union can be identified with a choice of ticket and a point in the corresponding space.

1 Like

This is highly related to the formulation of probability within quasi-Borel spaces.

For @dspivak, I’m going to explain quasi-Borel spaces in compact categorical language. Basically, there is a site of standard Borel spaces (a standard Borel space is a measurable space where the \Sigma-algebra comes from the opens of a complete, separable metric space), where the morphisms are measurable maps, and it is equipped with the extensive coverage. Then a quasi-Borel space is a concrete sheaf on this site.

In these terms, we can think of \mathtt{prob} as the polynomial in \mathsf{QBS} given by

\mathtt{prob} = \sum_{\text{$S$ standard Borel}} \sum_{\mu \in \Delta S} y^S

where by Yoneda, for a quasi-Borel space X considered as a sheaf, X^S = X(S).

The probability monad for quasi-Borel spaces that is traditionally considered is a quotient of \mathtt{prob} by the relation (S, \mu \in \Delta S, f \colon S \to X) \sim (S', \mu' \in \Delta S, f' \colon S' \to X) if they both induce the same expectations on all observables \alpha \colon X \to \mathbb{R}, i.e. \int_S f;\alpha \;\mathrm{d} \mu = \int_{S'} f' ; \alpha \;\mathrm{d} \mu'. We might call this “extensional equivalence”.

We could also quotient by the relation generated by (S, \mu) \sim (S', \mu') if there exists \phi \colon S \to S' such that \mu = \phi^\ast \mu' and f = \phi ; f'. This would be something more similar to “intensional equivalence”, and could be expressed more directly as a colimit.

I’m pretty sure that we’d get the same equivalence relation in the end, but I’m not completely positive!

Now, interestingly enough, I’m not sure if \mathtt{prob} without quotienting is a monad! It seems like one would have to build standard Borel spaces that are in some sense “too large”. But I think that we could still build a category similar in spirit to the “Kleisli category” of \mathtt{prob} via something like the \mathsf{Para} construction. Namely, construct a category where the objects are quasi-Borel spaces, and a morphism from X to Y consists of a standard Borel space S, a measure \mu on it, and a map X \times S \to Y. This of course is actually a double category, where the 2-cells are “reparameterizations” of the sample space.

We could perhaps go one step farther, and do a “dependent Para” construction as well, where a morphism from X to Y was a span X \leftarrow S \rightarrow Y of quasi-Borel spaces such that the fibers of the left-hand map were all standard Borel spaces, and then a choice of measure on each fiber of the left-hand map. The technicalities here could sink or float this, however, I’m not confident that this would actually work. @mattecapu needs to weigh in here.

The span thing you (@owenlynch) mentioned at the end there is one of the things I was sketching to you on the way to lunch one day at ACT this year, when I was talking about various ‘better’ ways to do copy-composition. I’m writing another post about this just now :slight_smile:

The thing you sketched in your penultimate paragraph is quite a common way of talking about stochastic maps in scientific contexts; I described a version of it in §7.1 of my thesis. It’s also closely related to the notion of “randomness pushback” (see Def 11.19 of Fritz’s ‘synthetic approach’) but I don’t know a general story of when that’s possible.