Effectful Trace Semantics via Effectful Streams

I’m really enjoying this paper by Filippo Bonchi, Elena DiLavore and @mromang08.

Special highlights:

  • Separating out value manipulation, computation, and side effects into three categories on the same collection of objects, which are cartesian, monoidal, and premonoidal respectively. The example to keep in mind here is if you have a (possibly non-commutative) monad T on a cartesian category C, then you have (C, Kl_{Z(T)}(C), Kl_{T}(C)). The intuition is that computation might be nondeterministic, so it doesn’t commute with copy-delete (monoidal not cartesian), and side effects must happen in a specific, so parallel composition is ill-defined (premonoidal not monoidal). They call such a triple an “effectful category”

    Incidentally, it might seem evil to say that there are “identity on objects” functors on categories, but this can be done in a non-evil way by saying that an identity on objects functor C \to D is a monad in \mathbf{Prof} with carrier C. I know that Mario knows this, but probably this is pitched to an audience who doesn’t care about evilness but would be put off by monads in \mathbf{Prof}.

    Also, they relate this to call-by-push-value. I wonder if @mromang08 could comment on the precise relationship, because if I remember correctly in call-by-push-value there is an adjunction rather than just an inclusion of categories. Honestly, just having an inclusion seems more desirable to me in any case, because I don’t want to reflect the computation back into my values, I want my values to stay values! But I’d like to know why one would choose to do one way or another.

  • Showing how Haskell’s do-notation can be used in an arbitrary effectful category!! Do notation: not just for monads anymore!

  • Streams! I used to think that streams were not very interesting “they are just like lists but infinite, what’s the big deal?” But I kept seeing libraries using streams, like fs2, which is a stream library in Scala that powers, believe it or not, Disney Plus. And also, everyone’s favorite swiss army knife, bash pipelines, is secretly streams! So I have been convinced that streams are actually a big deal. I do wonder if streams are something like a “non-dependent” version of the cofree comonad, and that maybe generalizing from streams to cofree comonads would bring some interesting benefits to the various stream libraries. However, I really like that the approach to streams that is taken by Bonchi et al is very generic over the base category, while cofree comonads are pretty Set-based.

  • Causal processes, which are maps \bigotimes_i X_i \to \bigotimes_i Y_i that are determined by a family of maps f_n \colon X_0 \otimes \cdots \otimes X_n \to Y_0 \otimes \cdots \otimes Y_n that satisfy a certain compatibility condition. This is cool to me, because it seems like a categorification of Markov processes being adapted to a filtration. @mromang08 when are you going to do this for continuously-indexed streams?? Many people want this.

  • Semantics for Mealy machines as effectful streams. This relates to some stuff that I was thinking about with David Jaz and others in Oxford a couple weeks ago: Random trajectories of Markov kernels. Figuring out how to blend the streams/partial Markov stuff coming from @mromang08 and Elena with @david.jaz systems theory is long overdue.

4 Likes

I’ve actually been thinking about something very similar to this question of how to bring monoidal streams into the @david.jaz CST framework.

Using some assumptions on the category, a monoidal stream can be repsented in the form of a sequence of objects S_n and morphisms X_n \otimes S_n \to Y_n \otimes S_{n+1} (as well as an initial I \to S_0). A stream of this type (say in \mathsf{Stoch}) is very close to a dynamical system which is “time-dependent” - where the state space is \coprod_n S_n, the output is \coprod_n Y_n, and the input is X_n (dependent on the output!).

There are a few problems with this - a minor one is that monoidal streams are really Mealy machine-like, whereas the CST machines really want to be Moore machines. This can be hacked around in the usual way and isn’t a big problem.

A more awkward thing is that, in a monoidal stream, the output and the next state may carry a “correlation”, whereas in the CST framework, the readout and update functions are two separate things. (This can be remedied to a certain extent by including the thing to read out int the state, but this makes the whole thing even more hacky).

I was thinking about this in the context of making a CST-type category of causality-style DAG models - here there is the further complication that you may have nonlinear causal graph, so there isn’t necessarily a unique choice of “next input/output step”.

(I’ve also written a note about trajectories for stochastic models, here: On Random Trajectories in Categorical Systems Theory)

4 Likes

Your note on trajectories for stochastic models is interesting: it’s a different approach than the one David Jaz and I were taking, but it seems to also work! The way we were thinking was to explicitly represent a random function as a parametric map X \times \Omega \to Y, where \Omega is the “seed space” in quasi-Borel spaces.

Another obvious marriage between CST-type systems and Strathclydism is to work out optic-based systems in CST. I wonder if optics could help with the correlation between the output and the next state, because we could represent it as S_n \to M and M \to S_{n+1}, M \to Y_n, but we “coend” over M.

And I like thinking about causal graphs. One of the main constructions for CST is a “tangent bundle”-like thing: I wonder if the tangent bundle functor could be “parametrized” so that you have a choice of where the next step goes?

1 Like

I guess ultimately a map \Omega \to A should be much the same as a distribution on A (and a map \Omega \times A \to B is a distribution on [A,B], then) - the difference being, if you have multiple, whether you want to be able to speak about the correlation between them or not. It’s not totally clear to me what’s the right notion there (whether, for example, when composing random trajectories X \rightrightarrows Y \rightrightarrows Z you want to make “the same” random selection in each case, or choose them independently).

(Working with a map \Omega \times A \to B also generalizes differently, since it uses the stochastic structure inside the category, rather than a probability monad on the category of enrichment (which may be \mathsf{Set})

I think that ultimately we need a theory of probability which talks about random elements and distributions as two distinct notions, because the distinction between the two comes up over and over again.

For instance, when we go to the continuous case, a continuous random process can be described by its generator, which is a linear map on observables, or as a function X_t \colon \Omega \to \mathbb{R} that satisfies certain properties, or as a collection of kernels p(-|-) \colon B\mathbb{R} \to \mathbf{Stoch}. The Hille-Yosida theorem tells us how to go between the generator perspective and the p(-|-) perpective, but then we also need to go between the p(-|-) perspective and the X_t perspective. I think that there’s a lot of confusion around stochastic processes which comes from confusing these three perspectives, and a categorical take has to not just “pick one.”

1 Like

Agreed! I’ve thought there might be a double category where one direction has stochastic maps in some sense, and the other direction has parametrized maps, with 2-cells being reparametrizations that are in some sense compatible with the kernels on the edges.

1 Like

I like this idea: if the stochastic maps are the horizontal direction, then going from a parametrized map to a stochastic map could be something like a conjoint operation. You should write this up on the forest.

1 Like

Thanks for sharing the paper, Owen; I enjoyed it, too, and your subsequent discussion with Eigil, which covers topics that I often find myself thinking about. Thus, a couple of quick thoughts:

  • I often feel like there should be a result along the lines of “the relationship between Markov processes [dynamical Markov kernels] and random dynamical systems [dynamical random variables] is akin to randomness pushback”. I’d be interested in discussing this more with people.
  • On “blending streams stuff with systems theory”, I like to think of computation as “dynamics with semantics”. Thus I think part of this blending might involve a more structured approach to operational semantics (akin to the passage from the algebra describing a program to the coalgebra describing its execution).
2 Likes

I’ve put some thoughts along these lines up here: The double category of mixed probabilities (I don’t like the title).

Why do readout and update have to be separate?

Say your interfaces are pairs of measurable spaces {A \choose B}, and then proclaim the systems over it to be probabilistic Mealy machines, i.e. X \times B \xrightarrow{\delta} \Delta(X \times A) .

These things reindex contravariantly along e.g. deterministic lenses {f^\sharp \choose f}:{A \choose B} \leftrightarrows {A' \choose B'} by defining

X \times B \xrightarrow{\langle X \times f, B \rangle} X \times B' \times B \xrightarrow{\langle \delta, \eta_B \rangle} \Delta(X \times A') \times \Delta(B) \to \Delta(X \times A' \times B) \xrightarrow{\Delta(X \times f^\sharp)} \Delta(X \times A)

I’m sure one can either do optic-y stuff or just add an effect in the backwards direction without problems.

Using explicit randomness indexed by \Omega is even easier, and makes defining the morphisms much more tractable. Ultimately my opinion is that we have to enrich to meanigfully do probabilistic systems theory.

Wait, I think you’re reindexing in the wrong direction? A lens X \to Y is supposed to wire a system with interface X into one with interface Y, but you’re going in the other direction somehow?

Generally speaking the problem with rewiring Mealy machines along lenses is that there’s a counit lens \binom{A}{A} \to I which wires the output directly into the input, and this doesn’t make sense in the Mealy case.

But I think you’re right that you can have Moore machines where readout and update are combined in an optic-y way, elements of \int^M Kl(\Delta)(X, A \otimes M) \times Kl(\Delta)(M \otimes B, X). (in other words, optics \binom{X}{X} \to \binom{B}{A}. And of course these do reindex along deterministic lenses (by simply including them in optics) - it’s not quite as obvious that there’s a profunctor for charts, but I think it should work.

1 Like

Yeah that’s not a mistake. It’s a not-so-well-known fact that Moore machines and Mealy machines are both reindexed by lenses/optics but with different variance. This explains why they seem to want to ‘pair’.

Ah, I see! I’ve never seen this before, but it makes sense.

1 Like

I think CST explains this very neatly: a Mealy machine is a lens TS \otimes A \to I (where S is a space of states and A = \binom{A}{B} is an interface - this actually looks really nice.

(I wrote a short note about this here: Moore and Mealy machines in CST)

But to deal with monoidal streams in general, we do want the set of possible inputs to depend on the current state, since when you’re inside S_i, you want to see something from X_i. This account of Mealy machines don’t give you that dependency.

1 Like

Thank you for the detailed highlight, Owen (@owenlynch); very appreciated (and thank you, everyone, for the comments). I answer because I have a login here at LocalCharts, but it is important to say again that this is j.w. with Filippo Bonchi and Elena Di Lavore. Elena has slides online for this. Also, some years ago, we published a different take on the same idea with Giovanni de Felice.

it might seem evil to say that there are “identity on objects” functors on categories […] I know that Mario knows this, probably this is pitched to an audience who doesn’t care about evilness but would be put off by monads in Prof.

Agree. Whenever I can, I do talk about promonads directly, but it depends on the audience. Category theorists mostly know about this story anyway.

I wonder if @mromang08 could comment on the precise relationship, because if I remember correctly in call-by-push-value there is an adjunction rather than just an inclusion of categories.

As far as I understand, CBPV has much more structure: the adjunction allows you to “thunk” and “produce” values into computations and computations into values. In a monad, any X × A → TB is also X → (A → TB) and then X → T(A → TB): any computation has a “code” that is a value. Relatedly, CBPV is higher-order and has lambda expressions (the base category is closed). These assumptions are too strong for arbitrary promonads. As you say,

do-notation can be used in an arbitrary effectful category!! Do notation: not just for monads anymore!

To be a bit pedantic, Haskell programmers know very well that do-notation is not for monads but strong promonads over Set (Hask?); this is Hughes’ arrows for Freyd categories. The problem is they do not work as well as one would like for non-cartesian cases.

Instead, it felt interesting to formalize a type theoretical presentation and show that it can be used as a sound and complete calculus. This idea of a triple of categories comes from Jeffrey’s paper on graphical programming languages; we try to make a fully-abstract calculus out of that.

Causal processes […] mromang08 when are you going to do this for continuously-indexed streams?? Many people want this.

I should quote this “Many people want this” on research proposals. : ) Yes, of course, I would be very happy to get time to focus specifically on this.

Figuring out how to blend the streams/partial Markov stuff coming from @mromang08 and Elena with @david.jaz systems theory is long overdue.

I know David’s systems theory, but not enough to say anything intelligent. Eigil’s notes sound good to me. It would be very encouraging to see that this works.

Again, thanks everyone for the discussion. See you around!

2 Likes

Something’s up in your notes, the definitions don’t quite line up!

The duality you notice is indeed quite cool, and something me, Nathaniel Virgo and Dylan Braithwaite are investigating in generality.

1 Like