Open classical mechanical systems via lenses

In my thesis I formalize composition of port-Hamiltonian systems in a relational way. That is, the dynamics of a port-Hamiltonian system are related to the input of the system, in a such a way that a given pattern of input could have none or many corresponding system behaviors.

Talking to @dspivak today I realized that there is another way of doing open classical mechanical systems which more directly divides things into input/output.

Unfortunately, after this week I won’t have time to work on this so much, so I want to do a quick write up of the story so far, so that I can either refer back to it in the farther future or so that someone else can be inspired.

The story begins with a version of Org that works for continuous-time dynamical systems.

I would be very surprised if @mattecapu hasn’t already described something like this, and @mattecapu if you are reading this please point me to a good reference if you have already come up with this.

Essentially, we start with the symmetric monoidal double category of lenses/charts between vector bundles on smooth spaces, apply the Para construction, and then look at the subcategory where the parameterizations are arenas of the form \begin{pmatrix} TX \\ X \end{pmatrix}.

Expanded a little bit, this is the double category where objects are pairs \begin{pmatrix} E \\ B \end{pmatrix} of a smooth space B (whatever your favorite definition of smooth space is, maybe just manifold) and vector bundle over it, vertical morphisms are charts, and a horizontal morphism from \begin{pmatrix} E \\ B \end{pmatrix} to \begin{pmatrix} E' \\ B' \end{pmatrix} is a space X along with a lens

\begin{pmatrix} E \\ B \end{pmatrix} \otimes \begin{pmatrix} TX \\ X \end{pmatrix} \begin{matrix} \leftarrow \\ \rightarrow \end{matrix} \begin{pmatrix} E' \\ B' \end{pmatrix}

These horizontal morphisms describe parameterized maps where the parameters smoothly change in response to information flowing backwards through the system. Call this double category \mathsf{COrg} (continuous Org).

Today with @dspivak, I came up with a way of constructing horizontal morphisms in \mathsf{COrg} which correspond to open classical mechanical systems.

I claim that given spaces A, X, B, a function f \colon A \times X \to B, a Poisson structure on X, and a function H \colon A \times X \to \mathbb{R}, I can construct a lens

\begin{pmatrix} T^\ast A \\ A \end{pmatrix} \otimes \begin{pmatrix} TX \\ X \end{pmatrix} \begin{matrix} \leftarrow \\ \rightarrow \end{matrix} \begin{pmatrix} T^\ast B \\ B \end{pmatrix}

The bottom map is simply f.

Now, for the top map, recall that a Poisson structure is given by a linear map J_x \colon T^\ast_x X \to T_x X that smoothly depends on x \in X, such that when viewed as a matrix J_x is antisymmetric (along with some other conditions).

Given a \in A, x \in X, and \phi \in T^\ast_{f(a,x)} B, we need to produce \psi \in T^\ast_a A and \dot{x} \in T_x X.

Recall that f^\ast_{a,x}(\phi), dH(a,x) \in T^\ast_{a,x} (A \times X) \cong T^\ast_a A \times T^\ast_x X, where f^\ast_{a,x}(\phi) is the pullback of \phi along f, and dH(a,x) is the gradient of H at (a,x). Then let

\psi = \pi_1(dH(a,x) + f^\ast_{a,x}(\phi))
\dot{x} = J_{x} \pi_2(dH(a,x) + f^\ast_{a,x}(\phi))

Now, I claim that this construction is in some way “natural”. What would it mean for this to be natural?

Construct a bicategory \mathsf{OpenCM} where the objects are spaces and a morphism from A to B consists of a Poisson manifold X, a function f \colon A \times X \to B, and a function H \colon A \times X \to \mathbb{R} called the Hamiltonian. Composition of (f \colon A \times X \to B, H_X \colon A \times X \to B) and (g \colon B \times Y \to C, H_Y \colon B \times Y \to C) is given by putting the natural Poisson structure on X \times Y, doing the normal Para construction to get a map (f ; g) \colon A \times X \times Y \to C, and then defining H_{X,Y} \colon A \times X \times Y \to \mathbb{R} by

H_{X,Y}(a,x,y) = H_X(a,x) + H_Y(f(a,x), y)

The 2-cells in this bicategory should be symplectomorphisms X \to X' that preserve the Hamiltonian.

Then I hope that there is a symmetric monoidal bifunctor from \mathsf{OpenCM} to the horizontal bicategory of \mathsf{COrg} that corresponds to the construction that I did above. The reason that this is just a bifunctor and not a double functor is that sending A to \begin{pmatrix} T^\ast A \\ A \end{pmatrix} is not functorial when you want a chart from \begin{pmatrix} T^\ast A \\ A \end{pmatrix} to \begin{pmatrix} T^\ast B \\ B \end{pmatrix}, because we pullback cotangent vectors instead of pushing them forward. Perhaps with some galaxy brained @mattecapu stuff we can do a triple category of parameterized lenses, normal lenses, and charts, but I’m not there yet.

Future work includes:

You mean that B is a space and E is a vector bundle over it, right?

Yes, good catch, thanks David.

Hi Owen,
nice post!
You’re indeed right in thinking I’ve been thinking around the same lines, though you went further that I’ve been in a sense. I spent a long time getting the foundations right but I’ve yet have to sit down and try to spell out this example in detail. So you’re giving me a good opportunity to get my lazy ass on it again.

Here’s a way to make your nice story a terrible mess of category theory.

The structure of bundles you work with is similar to what is described by DJM in this paper, though maybe it was known before? (@dspivak probably knows). It is also one of the main examples in David Jaz’s book on categorical systems theory (CST), where he calls it the theory of differential systems (in the doctrine of open dynamical systems, aka ‘generalized Moore machines’) in Section 3.5.2.
The only difference is that you consider vector bundles as bundles, whereas he considers all submersions. It’s not very relevant for this post, but I like to spread the amazing fact that vector bundles over a manifold X are actions of (\R, \cdot) (= the trivial line bundle over X) in the category of submersions over X (see here and the paper cited therein).

To recall, such a theory of systems is obtain from
(1) an indexed category of bundles, in this case {\rm Subm} : {\bf Smooth}^{\rm op} \to {\bf Cat},
(2) a ‘section’ thereof, i.e. an assignment T: \bf Smooth \to \int {\rm Subm} such that T;\pi_{{\rm Subm}} = 1, which in this case is given by X \mapsto TX \overset{\pi_X}\to X the tangent bundle of X.
Clearly you can ‘restrict’ both these pieces of data to vector bundles only, thus you get a theory {\bf LinDiff} of linear differential systems.

This consists of a double category called {\bf\mathbb Arena_{LinDiff}} (given by lenses and charts associated to {\rm Subm}) and a doubly indexed category over it which, on objects, maps bundles {E \choose B} to the category of 'open differential systems with interface {E \choose B}, whose objects are lenses {TX \choose X} \leftrightarrows {E \choose B} (notice the special form we require to the left boundary), and whose maps are given by smooth maps \varphi:X \to Y that commute the dynamics (notice {T\varphi \choose \varphi} is a chart). You can find a better description of these maps in the aforementioned section of the CST book.

Now the first double category you describe is a certain restriction of the triple category you get by hitting the ‘simple self-action’ of {\bf\mathbb Arena_{LinDiff}} with \bf\mathbb Para:

\bf \mathfrak{C}Org := \mathbb Para \left({\bf\mathbb Arena_{LinDiff}} \overset{\sf fst}\leftarrow S({\bf\mathbb Arena_{LinDiff}}) \overset{\times}\to {\bf\mathbb Arena_{LinDiff}}\right)

Recall from my ACT23 talk that the simple self-action is a tiny modification on the self-action of {\bf\mathbb Arena_{LinDiff}} as a monoidal object. The difference lies in the extra maps the simple fibration has in its fibers compared to the trivial fibration {\bf\mathbb Arena_{LinDiff}} \times {\bf\mathbb Arena_{LinDiff}} \overset{\pi_1}\to {\bf\mathbb Arena_{LinDiff}}. You need the first over the latter to get the right kind of maps between parametric lenses.

Anyway, what you call \sf COrg is the horizontal double category in the triple category \bf\mathfrak{C}Org. In my slides, I pictured it like this:
So this double category is given by ‘the bases’ of the cubes that make up \bf\mathfrak{C}Org. (It is this straightforward because \sf COrg is morally distinct from \bf Org: the latter has both kinds of 1-cells being dynamical whereas the first has one dimension being ‘static’, i.e. given by maps (charts) which compare bundles instead of doing something).

There is a small caveat here: you restrict you parametric maps to have a specific form of top boundary (i.e. the bundle of parameters), whereas in {\bf\mathfrak{C}Org}_h any bundle is admissible. One can of course restrict to loose maps and squares of the desired form but my way of taking care of this is more conceptual and follows the way DJM does it: distinguish ‘systems’ and ‘processes’. These unconstrained parametric lenses are ‘processes’, and ‘systems’ attach to their boundaries. I’ll discuss this later on in this reply

Now on to the interesting part!

First, let me say that I don’t see (because I’m ignorant, not because I doubt you!) why the kind of parametric lenses you consider are open classical mechanical systems, that is, I don’t see the role of the parameter here. Perhaps you can explain it to me!
I’ll just note these kind of parametric lenses have precisely the type of gradient-based learners, where A and B are the inputs and outputs of the model and X are the weights (I note this in my paper from last year). So for now I’ll borrow intuition from there.

Let’s have a look at \sf OpenCM. Let me note this is again a \bf \mathbb Para-construction, whose data is quite interesting (and reminescent of Toby’s last paper, where he considers a similar but dual construction).
Here we have \bf Smooth involved in a fibred action

\bf Smooth \overset{\it p}\leftarrow Ham \overset{\times}\to Smooth


  1. \bf Ham is the category whose objects are triples (A,X,H:A \times X \to \R) with A smooth manifold, X Poisson manifold, H smooth function, and whose morphisms are triples (f,\varphi, =) : (A,X,H) \to (B,Y,K) such that f is smooth, \varphi is symplectic (I’ll be tempted here to have \varphi depend on A too, like in a simple fibration) and H = (f \times \varphi)K,
  2. p project the first component (and you can easily see this admits cartesian lifts),
  3. \times sends (A,X,H) to A \times X, and likewise for maps.

It’s easy to convince oneself this does indeed define a fibred action, with unit (1,1, 0) and multiplication

(A,X,H) \otimes (A \times X, Y, K) = (A, X \times Y, (\pi_{A \times X}H) + K)

Running \bf \mathbb Para on this data, we get a double category \bf\mathbb OpenCM where

  1. objects are smooth manifolds,
  2. tight cells are smooth maps,
  3. loose cells A \to B are given by a choice of Poisson manifold X, a choice of Hamiltonian H:A \times X \to \R, and a choice of smooth map f:A \times X \to B (i.e., a choice of parameter (A,X,H) : \bf Ham_A and a map f: \times(A,X,H) \to B, as prescribed by the recipe), which compose as you say (and that’s the composition rule prescribed by the recipe again!),
  4. squares are pairs (\varphi, =) where \varphi is a symplectomorphism between the parameters of the loose cells that commutes with the Hamiltonians:

You’ll notice that your \sf OpenCM is the loose bicategory associated to this double category.

Now you raise a good question, which is: can we go from the latter double category \bf\mathbb OpenCM to the first (perhaps triple) category \bf\mathfrak{C}Org?

So here is where I got stuck last time I thought about it (because of the same problem you mention: T^* needs to hit lenses, not charts), but that was a long time ago so let me try to see if I can get unstuck with the help of your ideas.

Since now I have to go and sit on it for a while, I’ll post the rest as a separate post/reply.

Thank you Matteo for unfolding my construction into real category theory!!

I have more to say, but I want to say something quickly about classical mechanics and the connection to gradient descent.

Gradient descent works with a Riemannian manifold; we turn a covector into a vector that “points” in the same direction. So we can think about the Riemannian structure as a linear map M(x) from the cotangent space T^*_x X to the tangent space T_x X, and the equation for gradient descent is

\dot{x} = M(x) d S

(if S is your objective function).

Classical mechanics works with a Poisson manifold, which just another way of turning a covector into a vector!! Specifically, we have a linear map J(x) and Hamilton’s equation is

\dot{x} = J(x) d H

(if H is your Hamiltonian).

The difference between the two is that M(x) is symmetric non-negative definite, and J(x) is antisymmetric.

For instance, if X = \mathbb{R}^2, with coordinates r,p of position and momentum, then we can let

J(r,p) = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}

and Hamilton’s equations are then

\dot{r} = \frac{\partial H}{\partial p}
\dot{p} = -\frac{\partial H}{\partial r}

If we let H(r,p) = \frac{1}{2m} p^2 + \frac{1}{2} k r^2, then we get the equations of motion for a ball of mass m on a spring with spring constant k.

So the story for open gradient descent systems and open classical mechanical systems is exactly the same, you just swap the Riemannian structure for a Poisson structure!

As a bonus, you can also consider the GENERIC equation, which is an equation from non-equilibrium thermodynamics,

\dot{x} = J(x) d H + M(x) d S

which involves both an antisymmetric and symmetric positive definite matrix, and an energy H and an entropy S, and I think that you can do open GENERIC systems in a similar way!

So very exciting stuff here.

Oh yeah that part was clear :slight_smile: let me point out that in differential geometry the linear map T^*X \to TX is one of the two ‘musical’ (iso)morphisms induced by the metric (the other being just the adjoint/inverse to this one), and it’s usually denoted by (-)^\sharp. By extension that’s used for any 2-form, hence I usually write the gradient flow equation as \dot x = dH^\sharp (indeed \operatorname{grad} H := dH^\sharp).

So what’s confusing me right now is the role of the covectors over the input and output spaces of the open system. That is, why {T^*X \choose X} \leftrightarrows {A \choose B} + a Poisson structure on X isn’t enough to model an open mechanical system?

I can answer the question if I replace ‘open mechanical system’ with ‘gradient-based learner’: the learner needs a feedback (in the form of an infinitesimal loss function, i.e. a covector) on how good it did on the output, and then that feedback is divided between inputs and parameters, so T^*A is there mainly for compositional reasons. But I have no intuition on why this should apply to mechanical systems too :thinking:

Also two other points of confusion for me:

  1. if the T^*B input is where we get our ‘loss’ from, why consider systems which come with their own Hamiltonian on the parameter space? Wouldn’t a Poisson structure be enough?
  2. what is a GENERIC system?

Finally, I’d like to find a good excuse to justify the assumption of equipping also the cotangents of A and B with an anchor map… so either making them Poisson or Riemannian. This would fix the functoriality problems on charts. Or maybe I should just accept T^* really has just a bicategory as a domain.

Ah OK, I think I have a good answer for all of this!

In the setup of:

\begin{pmatrix} T^* A \\ A \end{pmatrix} \otimes \begin{pmatrix} T X \\ X \end{pmatrix} \leftrightarrows \begin{pmatrix} T^* B \\ B \end{pmatrix}

each thing plays a different role.

The input A is the part of the system that is fixed by something else. It’s coupled to X via the Hamiltonian, but we don’t know how it evolves; some other system handles that; we just pass back the force on it, which is in T^*A.

The output B sets the state of some other system, and receives force from that system that it passes back to this system, in T^*B.

The point of having both a Hamiltonian and the output is that the force on X comes both from something external and also from the internal dynamics. A closed system, i.e. a map from 1 to 1 should still have dynamics.

GENERIC systems are a formulation of non-equilibrium thermodynamics: I highly recommend Beyond Equilibrium Thermodynamics, but the TL;DR is that they are an extension of Hamiltonian systems to have both an energy E and entropy S potential. Then the differential of energy is turned into a vector via a Poisson structure, and the differential of entropy is turned into a vector via basically a degenerate Riemannian structure (symmetric non-negative definite, not necessarily full rank), and then those are added together to make the dynamics in the equation

\dot{x} = J(x) dE + M(x) dS

where J(x) is antisymmetric, and M(x) is symmetric nonnegative definite.

1 Like

Ah great, so that’s basically the same intution I have from gradient-based learning. Makes sense!

Uhm this seems like a juicy insight… but I don’t get it yet :thinking:

I’m used to think about the Hamiltonian as arising from closing the system with a costate {T^*B \choose B} \leftrightarrows {1 \choose 1}, which amounts to a 1-form over B, e.g. dh for h:B \to \R. More poignantly, costates of the latter kind are given by {T^*h \choose h}: {T^*B \choose B} \leftrightarrows {\R \choose \R} ‘closed off’ by the ‘walking differential’, {\R \choose \R} \leftrightarrows {1 \choose 1} which corresponds to the constant 1-form dt over \R.

Once you close off a system like that, the dynamics is given by the same equations you wrote but without the hamiltonian term, which instead appears as \phi. Hence in my head the ‘system’ doesn’t have its own Hamiltonian but instead gets its Hamiltonian covectors from the outside. Instead, this BYOH (Bring Your Own Hamiltonian) systems remind me of predictive coding… but I don’t understand the latter very well.

Thanks for the pointer! Do you have a more precise reference inside that book? I’d like to understand why a GENERIC system is structured as such.

Yeah, the first chapter!

This is exactly right, and you can construct my systems by having the Hamiltonian be an output and then “closing off” that output. It’s just convenient in \mathsf{OpenCM} to have internal Hamiltonians, but you could do the same construction without Hamiltonians, and then add in Hamiltonians by considering the Kleisli category of (-) \times \mathbb{R} that comes from the additive monoid structure on \mathbb{R}; this approach was suggested by @dspivak.

Incidentally, I think you should be closing off with \begin{pmatrix} \mathbb{R} \\ 1 \end{pmatrix}, right? This is the monoidal unit for the tensor product coming from tensor product of vector bundles. Do you think about the two monoidal products on vector bundles, one coming from product of vector bundles, and the other coming from tensor product? What role do these play in your theory?

That’s what I don’t understand, what role do these play? They don’t look totally unreasonable but I lack intuition about them.

Perhaps an example of a ‘naturally occuring’ BYOH system would help me.

(As I write, I’m actually getting some intution… a BYOH system has its own dynamics, and it’s perfectly fine on its own, except it is open so it can exchange energy with other systems through its interface… still, an example would do wonders)

A fun fact about BYOH processes is that a BYOH costate {T^*B \choose B} \leftrightarrows {1 \choose 1} is the data of an Hamiltonian K:B \times Y \to \R and a (not necessarily exact) 1-form \kappa. If we insist that the backward pass in this lenses is linear, then \kappa = 0 necessarily. So it seems closing off a process in this setting would amount to specify an ‘external’ Hamiltonian. Cool!

So the tensor product is nasty. If we consider (\bf VecBun, \otimes) instead of (\bf VecBun, \oplus) then the ‘diegetic structure’ given by T^* becomes ludic, which is terminology I just made up to say that the bifunctor \bf \mathbb Para(Ham) \to \mathbb Para(\mathbb Arena_{LinDiff}) (I don’t know if this even typechecks but you see what I mean) induced by T^* is only lax functorial. I call it ‘ludic’ since this laxity is what makes game theory an interesting subject (for games, that amounts to the Nashator). When you use (\bf VecBun, \oplus), such laxity disappears and you get back the tame world of backprop.

Now the truth is that’s all I know and I didn’t explore yet the relationship between \oplus and \otimes, and whether you can use one for parameters and the other one for other things.

The GENERIC equation is very much like predictive coding, and indeed some of the people in that community often try to make an analogy with non-equilibrium dynamical systems. Understanding what’s going on there is bumping around on my to-do list, so I’m very glad to see this discussion!

1 Like

It feels wrong to me to not have any linear structure, but perhaps the right thing to do is to consider affine maps of bundles. Then we can close off with \begin{pmatrix} 1 \\ 1 \end{pmatrix}. Does this seem reasonable? Have you worked with this before?

Then I don’t need to care about \otimes. This has resolved a long-standing confusion I’ve had about lenses of vector bundles; this makes me happy.

1 Like

Excellent! Yeah, I’ve been thinking a lot about GENERIC recently, and I’m planning to write something up on the connection between GENERIC and exergy soon.

I don’t quite get the characterization you give, Matteo, about vector bundles as (\mathbb{R},\cdot)-actions. The MO post you referenced—though really cool and fascinating, so thanks for the reference!—also assumes that each fiber is isomorphic to a vector space; you need that right? But somehow I still don’t see how it works: an open subset U\subseteq X is a submersion, and every fiber is a vector space, but it is not locally trivial. Ah, I see by looking back at the MO post, they assume every fiber has the same dimension. I guess my feedback is that I feel like your statement is interesting but needs more caveats :slight_smile:

The way this whole post started is that I was telling Owen and others that I wanted to develop a compositional physics, and Owen wanted to press on the initial idea I had, which was pretty inchoate and physics-wise-uninformed in a few places. The idea was to have a colored operad where the objects are interfaces and the morphisms are arrangements that change in time. The interfaces are polynomials (possibly in \mathbf{Mfd} or something) whose positions would be “configurations” and for which a direction is a “way to change configurations”. Imagine A as the space of embeddings S^1\to \mathbb{R}^3 of a circle, modulo the action of Euclidean transformations on \mathbb{R}^3; let’s refer to an a:A as a configuration of the circle. Then a cotangent vector d:T^*_a(A) measures changes to that configuration. With Owen’s insight and physics intuition, we got a lot further on this idea, as you can see by his post.

The way I’m now seeing it, a morphism in the operad is a map
\Phi\colon {T^*A_1 \choose A_1}\otimes\cdots\otimes{T^*A_k \choose A_k}\otimes {TX\choose X}\to{T^*B\choose B},
equipped with two more things: a map T^*X\to TX and a current point x:X. Here, the whole map \Phi could be called the arrangement policy, and an element x:X is the current arrangement. Given this policy, a length r-path [0,r]\to A_1\times\cdots\times A_k together with a covector field on B determines a length-r trajectory in X and all the while passes back covectors to the A_i. Thus the arrangement is changing in time, and this whole thing is compositional.

This setup turns out to be very close to what I’ve been working on with Samantha Jarvis throughout the summer, so it’s nice that they’re coinciding. The main difference with her was that we didn’t require continuity, i.e. we used a map T^*X\to X rather than T^*X\to TX.


Thanks for taking this apart David! I didn’t realize that fact because I unconsciouly believed all submersions had all fibers of the same dimension. Instead, that’s only true for locally trivial fibrations (aka fiber bundles)! Fortunately, Ehresmann’s theorem tells us all proper submersions are fiber bundles, which is most bundles I’d say, but still, good catch!

This tension in continuity is also very relevant to ML, where gradient descent has the former signature (hence being discontinuous), while gradient flow has the latter. I envisioned one would get the former from the latter using some discretization scheme at solving-time, like Euler, so I prefer working continuously.

I’d say I understand the rest of your post formally but not intuitively, since I don’t completely understand what an ‘arrangement’ is supposed to be/represent.

The discretization of a gradient flow is exactly how I write down predictive coding in my thesis (sketched eg in Remark 6.3.19 and used in Corollary 7.3.11). I’ve been investigating the link with gradient descenders a bit more lately; I’ll have more to say about this soon!

1 Like