Typed References

I’ve been thinking a lot recently about pointers and memory management, and one thing in particular I’ve been thinking about is “user-managed allocators”. I.e., instead of having one global heap allocator, users can request a chunk of memory and then parcel it out. This is one of the big features of Zig that makes idiomatic Zig code look different from code in another language; you are often explicitly passing around allocators.

But is hard to do in a well-typed way. In most languages, you can request a big typed array, i.e. Vector<T> and then hand out indices to that array. This is sometimes convenient, because it allows you to completely bypass the borrow checker/garbage collector/whatever; your pointers are now just integers, and can be moved around at will. But on the other hand, it’s inconvenient because… your pointers are now just integers. Of course, you can take a pointer to an element of the array… but then you have to worry about taking multiple mutable pointers that could overlap.

The more salient point, however, is that when you request memory you don’t necessarily know ahead of time what the types you want to use that memory for are. I.e., you just get a big chunk of bytes, and then when you allocate stuff, you cast those bytes into your relevant types. Casting is a classic example of circumventing the type system.

Finally, in a garbage-collected language, references are “safe”. I.e., if I have a reference to something of type T and I dereference it, I know I’m getting something of type T. Moreover, nobody else should have modified that in the mean time. But integer indices to an array have no such guarantees.

So typically writing a custom allocator is something left to “unsafe, janky code”; you can only do it if you can somehow get around your type system.

What would a way of fixing this look like?

One possibility is something like the ST monad in Haskell. The ST monad allows you to allocate data and get mutable typed references to it, i.e. STRef a t. The a there is a lifetime parameter, and the t is the type of the reference.The type system guarantees that you can’t return any of those typed references outside of the “scope”.

The ST monad is nice, because you can allocate arbitrary data using it, and you get typed, guaranteed access to that data.

It would be interesting to see something like the ST monad where not only could you not leak any mutable references, you couldn’t leak any data that was allocated. So everything allocated in ST code could use, for instance, a bump allocator, and then after the ST code finishes, all of the memory allocated could be released at once.

You could also have something like ST monad where you used a hashcons as the allocator instead of normal memory allocation. So your references would be hashes into the hashcons, instead of just pointers.

I’m not really sure where to go with this, but I wanted to write up my thoughts in case anyone else had any ideas about this.

The programming language koka seems to have an interesting approach to garbage collection; I want to look further into exactly how this works.