Approximate Morphisms

At FRA3, Vanessa Kosoy and Owen Lynch and I discussed a setting in which to talk about “approximate morphisms” between objects in categories — and how bad they are at being actual morphisms.

There are lots of situations where a morphism is a function between sets which needs to satisfy some property — some equation or inequality. In many of these cases, we can also quantify how much an arbitrary function fails to satisfy this property.

For example if X and Y are metric spaces and f : X \to Y is a continuous function, then the quantity

\sup_{x_1,\,x_2} d_Y(f(x_1), f(x_2)) - d_X(x_1, x_2)

(where the subtraction symbol r - s denotes the truncated subtraction which is 0 whenever s is greater than r) is a measure of how far off the function f is from being a short map (a.k.a. a 1-Lipschitz map). If this quantity is 0, then the map is short.

We can also quantify how far off a square of metric spaces is from commuting: if X : X_0 \to X_1 and Y : Y_0 \to Y_1 are short maps, and f_0 : X_0 \to Y_0 and f_1 : X_1 \to Y_1 are also short maps, then the quantity

\sup_{x \in X_0} d_{Y_1}(Y(f_0(x)), f_1(X(x)))

measures how far the square

is from actually commuting. If the quantity is 0, then the square commutes.

These measures of failure of functions to be morphisms can be used to quantify the success of approximate decompositions of systems. Suppose that T_W : W \to DW is a Markov model (where D is some sort of probability distribution monad, and W is a set of “world-states”). Suppose that we know that a world state can be described as a pair of an “agent-state” and an “environment-state”: i : W \hookrightarrow E \times A. Given Markov models T_A : A \to DA and T_E : E \to DE, we can ask how well this decomposed model of the world approximates the original world-picture T_W. One way to do this would be to appeal to the quantity

\sup_{w \in W} D_{KL}(i_{\ast}T_W(w) \,||\, (T_E \otimes T_A)(iw))

which measures the failure of the commutativity of the square

in the sense of Paolo Perrone’s “divergences on Markov categories”. If this square commutes, then the agent and the environment are idependent from each other.

To summarize the above, we find ourselves with some category of “bad maps”, each of which is equipped with a quantity that measures how bad it really is. The “good maps” are those maps for which the quantity vanishes — a good map has no badness. To button up our terminology a bit more, we could call these bad maps “approximate morphisms”, and the good maps “actual morphisms”. We can call the badness quantity the “degree of approximation”. A simple formalization of this situation is to consider our category of approximate maps as enriched in sets weighted in the non-negative real numbers.

Locality of approximation

However, in each case above we actually started with a more primitive quantity which defined locally over the domain, and then performed an “aggregation” — in the above cases, supremum — to turn it into a single quantity. The dependence of the degree of approximation on where we are in the domain is important. For example, when describing the failure of a system to decompose into a composite of systems, we may expect this decomposition to be generally false but still usefully true in a neighborhood of a given starting state. This fine grained analysis of degrees of approximation is not handled by simply enriching our category in weighted sets.

Vanessa, Owen, and I came up with the following way of organizing the local character of these degrees of approximation. We will define a 2-category \mathbf{Error}, and a lax functor from our category \mathcal{A} of approximate maps to this 2-category \mathbf{Error} will equip the appromate maps with a degree of approximation which we will call their “error”.

Since we are aiming to maintain the local dependence of degrees of approximation of an approximate map f : X \to Y over where we are in the domain X, we will need to associate to each domain a notion of locality, together with a system of locally variable quantities which we will use to measure our error. For this reason, we will assign to each object X \in \mathcal{A} a rigged topos (\mathcal{X},\, \mathcal{O}_X); that is, a topos \mathcal{X} and a rig-object (or semiring object) \mathcal{O}_X \in \mathbf{Sh}(X) (here we are using the Anel-Joyal terminology of “topo-logie”, where the category — often called the “topos” — is understood to be the category of sheaves on the actual space which is the topos). Often, the object X itself will be a topological space and \mathbf{Sh}(X) will simply be sheaves on that space, and \mathcal{O}_X will be the sheaf of upper-semicontinuous non-negative-real valued functions on X.

For this reason, we take the objects of our 2-category \mathbf{Error} to be rigged toposes (\mathcal{X}, \mathcal{O}_{\mathcal{X}}).

Next we assign to every approximate map f : X \to Y in \mathcal{A} a morphism (f : \mathcal{X} \to \mathcal{Y}, \mathcal{O}_f : f^\ast \mathcal{O}_Y \to \mathcal{O}_X) of rigged toposes, together with a global error function e_f : \ast \to \mathcal{O}_X defined on the domain. In the case that X is a topological space and \mathcal{O}_X is the sheaf of upper-real functions on X, then e_f is an upper-semicontinuous function defined on all of X. We think of e_f(x) as the error in the function f at point x.

For this reason, we take the maps of our 2-category \mathbf{Error} to be the morphisms (f : \mathcal{X} \to \mathcal{Y}, \mathcal{O}_f : f^\ast \mathcal{O}_{\mathcal{Y}} \to \mathcal{O}_{\mathcal{X}}) of rigged toposes, together with a global element p_f : \ast \to \mathcal{O}_{\mathcal{X}}.

Finally, to every pair of composable arrows f : X \to Y and g : Y \to Z in \mathcal{A}, we assign the following relation between the data (f, \mathcal{O}_f, e_f), (g, \mathcal{O}_g, e_g), and (gf, \mathcal{O}_{gf}, e_{gf}): namely, we have an inequality e_{gf} \leq f^{\ast}e_g + e_f, where the inequality is defined in the usual way in semiring theory (that is, a \leq b if there is a c such that a + c = b). In the case that the domains are topological spaces, this inequality is global. Informally, no matter how bad two approximate maps f and g are, their composite gf can’t be worse.

In order to secure this last condition as part of a lax functor \mathcal{A} \to \mathbb{Error}, we will define the composition in \mathbb{Error} to be the composite of morphisms of rigged toposes, with p_{gf} := f^{\ast}p_g + p_f. We will define a 2-cell to be a natural transformation \alpha : f^{\ast} \Rightarrow g^{\ast} for which \mathcal{O}_g \circ \alpha_{\mathcal{O}_Y} = \mathcal{O}_f together with an inequality p_g \leq p_f.

If our spaces are Hausdorff, then every such natural transformation \alpha is an equality.

Finally, we can turn the examples we began this post with into lax functors into \mathbf{Error} as follows:

  • In the first case, our category of approximate maps \mathcal{A} is the category of metric spaces and continuous maps. We assign to each metric space X the topos X \times X, and e_f(x_1, x_2) := d_Y(f(x_1), f(x_2)) - d_X(x_1, x_2).

  • In the second case, our category of approximate maps is the category of non-commuting squares of short maps of metric spaces, and we assign to each short map X : X_0 \to X_1 the topos X_0, and e_f(x) := d_{Y_1}(Y(f_0(x)), f_1(X(x)))

  • In the third case, our category of approximate maps is the category of Markov processes with arbitrary functions on the state spaces. In this case, we send a Markov process T_X : X \to DX to the discrete topos on the set X, and e_f(x) := D_{KL}(T_{Y}(f(x)) \,||\, f_{\ast}T_{X}(x)). By being a bit more careful, we should be able comfortably accomodate the measurability of X in a slightly more constraing topos.

This is only the beginning of trying to understand approximate decompositions of systems.

5 Likes

Nice post! Is there any way to relate this to Puca et al.'s Obstructions to Compositionality. E.g., they can use their framework to measure failure to be a (good) isomorphism. Perhaps one might consider local failure to be iso.

I’m wondering if we can extend this to look at generalized pointwise metrics or divergences. I.e., consider a category enriched in the bicategory with the same objects as \mathbf{Error}, but a morphism from (\mathcal{X}, \mathcal{O}_{\mathcal{X}}) to (\mathcal{Y}, \mathcal{O}_{\mathcal{Y}}) is a geometric morphism as before, along with a \mathcal{O}_\mathcal{X}-metric space A, where a \mathcal{O}_{\mathcal{X}}-metric space is a category enriched in \mathcal{O}_{\mathcal{X}}, internal to \mathrm{Sh}(\mathcal{X}). Composition of (f,A) and (g,B) is given by (g \circ f, A \times f^\ast(B)).

Let’s call this bicategory \mathbf{Metric}. A \mathbf{Metric}-enriched category is a set of objects, along with for each object a topos with a semirig object. Then for each pair of objects, you have a metric space internal to the codomain of morphisms…

It doesn’t actually seem to quite work out. Well, this is something to think about.