Taming argmax

mattecapu · February 1, 2024, 11:41am

The following has been informed by numerous conversations, among which some with David Spivak, Nima Mothamed, David Jaz Myers, and Nathaniel Virgo.

Argmax is my nemesis. It keeps popping up everywhere in my work and yet escapes a structuralist treatment. It has very bad formal properties and thus it’s hard to justify (which means saying argmax is ‘just’ a blah in a blah). That makes me mad!

For now, let’s work in \bf Cat. Consider a functor f:X \to V therein (typicially, f:X \to \R where X is discrete). An object x^* \in X is in the argmax iff fx^* is “terminal in the image of f”, i.e. iff for all x \in X, there exists a (perhaps unique) fx \to fx^*.

If one tries to read this as a universal property, they are going to get all dizzy. The universal quantification ranges over objects of X but the invoked comparison maps are in V. This is weird.

One can frame \argmax as follows: consider the map f:X \to \R as a map of sets, and use it to pullback the order structure on \R onto X. This makes X look like the ‘level sets’ of f since x \leq x' \iff fx \leq fx', and thus in particular x \cong x' \iff fx = fx'. Then \argmax f is the isomorphism class of terminal objects in X.

‘Pulling back’ the order structure can be seen in various way. One is as a literal cartesian lift for the forgetful functor U:\bf Ord \to Set from orders to sets, which turns out to be a fibration. Indeed, the level sets construction is literally the functor \mathsf{lvl} : {\bf Set}/ U \to \bf Ord defining a cleavage for this fibration. Moreover {\bf Set}/U has a neat interpretation as ‘sets with utility’, and thus \mathsf{lvl} is a case of very natural and concrete construction (sending a utility function to its induced preference relation) having a decent universal property (being a right adjoint, being a cartesian lift functor, etc.).

That’s already quite cool! Let’s make this cooler by summoning equipments.

The operation of taking level sets can also be seen as follows: given a functor f:X \to V, one factors it as X \to \mathsf{lvl}f \to V where the first is a bijective-on-objects functor and the second is a fully-faithful functor (think: bijective-on-morphisms functor). Formally, this factorization comes from the a vertical-cartesian factorization system on \bf Cat induced by its fibration of objects \rm Ob: \bf Cat \to Set, which in turn is just an extension to categories of the fibration U above.

One can see this factorization in yet another way, by approaching categories as monads in spans. Recall that functors are tight maps in that equipment, thus in particular squares

where X_0,V_0 are the set of objects of X and V respectively, X_1 and V_1 the spans of morphisms, and f_0 and f_1 are similarly the action of f on objects and maps.

Now by being an equipment, we know this square has to factor along a restriction, thus yielding:

And this is basically the bo/ff factorization of f we had above. In fact it’s exactly the same since the 2-cell I labeled \text{cart} is the cartesian lift of the ‘category structure’ on V along the map of objects f_0.

But we can do something a bit different now. Let’s switch to think about this in the equipment \bf \mathbb Cat = \mathbb Mod(\mathbb Set) of categories, functors, profunctors and transformations. There, we can restrict the identity profunctor on V = (V_0, V_1) along the entire functor f, yielding:

Now V(f,f) is extra structure on the category X: it’s a promonad on it, thus the data of new morphisms on X, namely the ones it would have ‘in V’. This seems useful to talk about properties of elements of X with respect to morphisms between their f-images!

Besides, here’s a crucial insight: \argmax f should really be thought as a predicate on X. For instance, consider things like \mathrm{softmax} which assign a ‘maximality score’ to all things in a set: this smells a lot like literally argmax but seen as a ‘generalized predicate’ X \to [0,1]. And even without resorting to generalization, in a constructive setting one ends up encoding \argmax f as the X-indexed type which assign to each x the type of proofs of its argmaximality, in a classic proposition-as-types move.

So here’s a conjecture (I’m sure duality is gonna bite me): take the following lift in \bf Prof, the bicategory (you could probably swap if for the equipment above but I just don’t know how would that work):

By duality, one can compute this as a Kan extension, specifically (I’m stealing this idea from the nLab):

\argmax f = \mathsf{Rift}_1V(x^*) = \int_{x:X} (1(x) \to V(fx,fx^*)) = \int_x V(fx, fx^*)

where we are considering profunctors 1 \nrightarrow X directly as copresheaves over X.

It’s not hard to convince oneself that the above formula corresponds to defining:

x^* \in \argmax f \iff \forall x\in X,\ fx \leq fx^*.

Clearly, had we setup the above in the 2-\bf \mathbb Prof (aka \bf \mathbb Mod(\mathbb Rel)), this is exactly what we would have found!

Moreover, in this fashion, we can also describe constrained optimization problems. Say you have P:X \to \bf 2 \hookrightarrow Set describing the predicate which constrains the optimization. This is again a copresheaf 1 \nrightarrow X (in fact a subsingleton), and lifting V(f,f) against it you get:

(\argmax_P f )(x^*)= \int_x (P(x) \to V(fx,fx^*)).

Notice we can also talk about whether an object x^* \in X ‘is in \argmax f’: this categorifies to asking how much it is in \argmax f, and is answered by checking membership categorically, i.e. by mapping from X(x^*,-) in {\bf Psh}(X), which by Yoneda simply returns the value of \argmax_P f at x^*.
When working in \bf \mathbb Mod(\mathbb Rel), the only possible answers are given by the only possible degrees of ‘being’ in \argmax f: either not at all (\bot) or completely (\top), whereas in other settings one will get different kinds of answers (e.g. in \bf \mathbb Prof, one gets either \varnothing or a set of witnesses that “fx \leq fx^*”, actually the hom-set fx\to fx^*).

Assuming this stuff is right, I can finally say: argmax is defeated! …or is it?

Let’s check what happens when we compute \argmax f in the equipment of [0,1]-enriched categories, where ([0, 1], \leq, {\cdot}, \multimap) is the base of enrichment, with a \multimap b =e ^{b\ \dot- a} = \min(e^b/e^a, 1).
Given f:X \to [0,1], we have

(\argmax f)(x^*) = \int_x 1(x) \multimap (fx \multimap fx^*) = \dfrac{fx^*}{\int^x fx}

Despite the suggestive notation, \int^x fx is actually the maximum of f. If f=e^{-u}, where e^{-(-)} is base change [0,\infty] \to [0,1], one would expect \argmax f to be \mathrm{softmax}\ u, but the normalization is somehow wrong.

So it appears there is a room for another chapter in the battle against \argmax!

evanwashington · February 1, 2024, 9:26pm

Here’s how I’ve been thinking about \argmax as it’s popped up for me (in modeling some problems in social choice theory). The framing is a bit different, so I wonder if anything here strikes you as familiar?

We have a category \mathsf{Util} of utility functions which are just sets X together with a map f: X \to V where V is an ordered set, and arrows (X, f) \to (Y, g) are, of course, just maps h: Y \to X such that g = f h (it’s a Grothendieck construction).

There’s also a category \mathsf{ChSet} of choice-sets, which are just sets X together with maps to 2 (so, predicates).

\argmax is a functor \mathsf{Util} \to \mathsf{ChSet} which is constructed (modularly) with the following ingredients:

A “quantifier” \max: \mathsf{Util} \to V which just chooses the maximum value in V attained by f: X \to V (note that it essentially has quantifier type (X \to V) \to V for V-valued predicates).
A map \widehat{(-)}: V \to (V \to 2) where (\hat{v})(w) = 1 iff v \leq w in V.

Then I define \argmax(X, f) := (X, \widehat{\max(X, f)} \circ f).

mattecapu · February 8, 2024, 5:25pm

This is similar to the excellent work of Escardò (and others) on selection and quantifier monads. That works but does it single out a universal property for argmax? You could have used any other quantifier rather than \max. Indeed Escardò shows how any quantifier q induces an \arg q selection function using an operation analogue to your \hat{(-)}. That’s nice but I’m asking a different, if related, question here.

Topic		Replies	Views
Cofree Tambara modules Research	0	149	February 2, 2024
Semantics of GATs Learning	4	170	August 27, 2023
Powers of polynomial monads General	0	79	February 14, 2024
Categories are monads in spans Learning	2	4932	November 28, 2023
Convolution, abstractly Research	5	365	April 3, 2024

Taming argmax

Related Topics