I was recently browsing through some of the source code for narya, and I noticed a pattern that Shulman was using in order to do type-level programming in a non-dependently typed language. If you want to see the code that I read to figure this out, you can look at n.ml.
Part 1 of this pattern is to encode unary numbers via type constructors. This works like
type zero = Dummy_z
type 'a suc = Dummy_s
One can then represent the number 4 as the type (((zero suc) suc) suc) suc
(note that in ocaml, type constructors are applied on the right).
Part 2 is to use these type-level numbers to index GADTs. GADT stands for “generalized algebraic data type”, and you should really read an introduction to these elsewhere if you haven’t seen them before.
We can, for instance, use these to give a fixed-length vector type.
type (_, _) vec =
| Empty : (zero, 'a) vec
| Cons : 'a -> ('n, 'a) vec -> ('n suc, 'a) vec
This is interesting, because the kind signature for vec
is type x type -> type
. There’s nothing that says that the first argument has to be a type-level natural number. However, if we look at the constructors for vec
, we can see that in fact it’s impossible to construct an element of vec ('b, 'a)
if 'b
is not of the form (...(zero suc)...) suc
.
So this lets us actually use type-level unary numbers to meaningfully constrain values. But that’s only half of the picture; in order to well-type operations like concat
, we need to perform operations on type-level unary numbers! We can’t do this the naive way, which would be to write a function type x type -> type
, because, for one, there are plenty of elements of type
which don’t represent unary natural numbers.
So Shulman uses a trick: instead of encoding addition as a function, he encodes it as a relation n + m = l
. This can be done in the following way.
type (_, _, _) plus =
| Zero : ('n, zero, 'n) plus
| Succ : ('n, 'm, 'l) plus -> ('n, 'm suc, 'l suc) plus
There are then some other tricks to transform plus
into a function and so on, but I want to just focus for now on the philosophy of this above trick. Essentially, the idea is that the existence of a term for some GADT can prove that some relationship holds between types, and this relationship can be inductively defined. If you are familiar with prolog, this should look very familiar! You can think of a GADT as defining a prolog-style relation on type
, where each constructor gives a different :-
rule. So in a prolog-type syntax, we’d have
:- plus(N, zero, N)
plus(N, M, L) :- plus(N, suc(M), suc(L))
Now, prolog would then allow us to query things like plus(2, 3, ?)
, but there’s no analogous capability in ocaml; Shulman has to build this in manually. But that makes me think; could one give a minimal extension of Standard ML in which GADTs really did work like prolog relations, e.g. where you could actually query them?
And I have some inkling that the answer is: yes, it’s called the Haskell typeclass system. But that’s just a guess based on hearing that typeclass resolution is similar in some way to prolog. Anyways, that’s all for now!