Induction is induction

David Corfield made a very interesting observation: the three types of logical reasoning of Peirce’s, deduction, induction, abduction, correspond to three very elementary operations in category theory: composition, extension and lifting.

Let’s see what this means.

Deduction. I observe A \to B and B \to C. Then, by modus ponens, I can conclude A \to C.
Induction. I observe A \to B, and also A \to C. I conclude B \to C.
Abduction. I observe A \to C and B \to C, I conclude A \to B.

Clearly induction and abduction are not valid reasoning rules! But they are rules reasoners have to use if they want to make new knowledge out of the data they have. For instance, we use induction all the time, both ‘unconsciously’ when we learn facts from the world (‘I said the word mama, mama smiled, therefore the word mama makes mama smile’) and very consciously in science (‘We collected these data samples, and we infer a functional relationship of this kind’).

In fact you can see how the data for induction is literally the data of an indexed family of pairs (a_i, b_i)_{i \in I}, and the result of induction is to build a map f:A \to B such that f(a_i) = b_i for each i \in I: it’s an interpolation problem!

But, again, this isn’t a valid logical rule: there might not be such f, since e.g. there might a_i=a_j with b_i \neq b_j (so no function can interpolate the points), or even if the points are possible to interpolate, there’s just so many different functions that do so!

So to give some logical credence to induction, we have to find a way to at least solve the second problem, and thus make induction the more conservative conclusion we can make having observed that I \to A and I \to B. In other words, solve the extension problem in a universal way.

This is the job of Kan extensions!

This means, first of all, moving from the unspecified 1-category I’ve been working on so far to an unspecified 2-category. Then a (right) Kan extension looks as follows:

The dashed arrow and the filling 2-cell are terminal at their job: every other such pair factors uniquely through them:

So this is the sense in which {\rm ran}_a b is the ‘least general solution’ to this extension problem: every other solution factors through it. The right Kan extension only contains what’s justified to believe about the implication A \to B.

There is also a nice formula for computing {\rm ran}_a b in reasonable cases, if it exists:

{\rm ran}_a b(a) = {\large\textstyle\int_{i:I} \int_{p:A(a,a_i)} b_i}

An intepretation of this formula is that the value of the interpolation of (a_i,b_i)_i at some given point a is the limit over i of all the values b_i over a_i a is related to. In other words, we ‘fill the gaps’ in the data by taking limits.

Of course this is far from what actual interpolation looks like, a problem which requires spending a bit more time thinking about what are the right generalizations for all these concepts to a ‘quantitative’ setting.

Still, we can test the proposed definition of ‘induction’ on something else: Peano induction! Is it a special case of induction? I claim it is.

What is mathematical induction? We are given a predicate \varphi : \N_0 \to 2, where \N_0 is the set of natural numbers, which we know satisfy \varphi(k) \to \varphi(k+1) and \varphi(b) for some b \in \N. We conclude that \forall n \in \N,\ (b \leq n) \to \varphi(n).

So let’s work in \bf Pos, the 2-category of posets. We have a map i:\N_0 \to \N embedding the set of naturals in the poset of naturals. We have a predicate on \N_0. We form its right Kan extension:

Such a Kan extension has form

{\rm ran}_i \varphi(k) = \forall {n \in \N},\ (k \leq n) \to \varphi(n)

which reads as ‘{\rm ran}_i \varphi(k) is true when \varphi is always true from k onwards’.

How is this any useful for induction? Well, when \varphi satisfy the induction property then \varphi:\N_0 \to 2 actually lifts to \N \to 2, since \varphi(k) \to \varphi(k+1) is a monotonicity property.
Then by universal property of the Kan lift, there is a (necessarily unique in this context) map into it:

This map corresponds to the implication

\forall k \in \N,\ \varphi(k) \to \forall {n_0 \in \N_0}\ (k \leq n_0) \to \varphi(n_0)

which is equivalent to

\forall k \in \N, \forall n \in \N, \varphi(k) \land (k \leq n) \to \varphi(n).

Then given a base case \varphi(b)=\top, we can conclude

\forall n \in \N, (b \leq n) \to \varphi(n).

So induction is a form of… induction after all!


A coda for my post. What I really prove above is that for any poset W, right extension along the inclusion i:W_0 \to W gives you well-founded induction. Indeed, for general posets we have

{\rm ran}_i\varphi(v) = \forall w \in W, (v \leq w) \to \varphi(w)

and ‘functoriality’ of \varphi is

v \leq w \implies \varphi(v) \to \varphi(w)

Thus the universal map \varphi \to {\rm ran}_i\varphi says that

\forall v \in W\!,\ \varphi(v) \to (\forall w \in W, (v \leq w) \to \varphi(w))

which is easily seen to be equivalent to

\forall w \in W\!,\ (\forall v\in W\!, (v \leq w) \to \varphi(v)) \to \varphi(w).

Of course this extends to categories as well, where you’d get some kind of ‘proof-relevant’ induction. One can give the same definition anywhere you can talk about right Kan extensions and the inclusion W_0 \to W.


Great stuff! There should be loads of things to do with the basic idea.

David Jaz Myers told us of an approach to the nearest neighbor algorithm which sees it as form of extension problem and then uses enriched profunctors.

Screenshot 2024-02-27 07.25.32

It does seem to be a form of induction.

Regarding abduction, sometimes we have that B \to C already in place, but sometimes there are a range of candidates, B_i \to C, and we’re to select one. Sometimes we have a number of arrows f_i: A \to B [corrected] and we’re selecting one. Sometimes abduction is also used for the process of coming up with such a B in the first place.

If we take things in a Kan extension/lift direction, it may be interesting to reflect on the asymmetry (why the former is so much more common). Todd Trimble offers an interesting explanation here.


Uh thanks for the pointer, I forgot about NN!

Re the rest, I think there’s a lot of mileage to be gotten out of working in proarrow equipments of bimodules (e.g. Rel, Prof) for these things, which probably address this other generalized form you bring up.

I wonder if there’s something to make of Lawvere and Schanuel’s

Screenshot 2024-03-01 111640

from their Conceptual Mathematics. Recall that these are:

Screenshot 2024-03-01 111806

Perhaps we might say sampling and classifying. Then in the case of induction we’re often taking a sample A of a domain B, so that if we know the sample’s C-values we can extend to a full function f: B \to C.

An extreme case is an isomorphism, G: A \cong B, where we derive f via g^{-1}. Otherwise we worry about the sample being too small, or non-representative. There is no forced way to extend (cf. Goodman’s ‘grue’ problem). Some spatial structure, as with the nearest neighbor algorithm, may justify a choice of extension.

In the case of classifying, on the basis of a classification of A, via its map to C, and of B, via its map to C, we would like to classify A by B, via f: A \to B. An extreme case would have g: B \cong C, when we could use g^{-1}. Otherwise we worry about the B classification being too coarse-grained.

Now what would correspond to placing the use of spatial structure? What would the dual to NNA be like?

For A and B with given metrics, I guess there might be some algorithm working by the choice of a function in the slice over C minimizing the Lipschitz constant.