Combinatorial data structures via finite existential types

Idea

InterTypes plans to have three basic types: sum types, product types, and acsets/array systems. Of these three, sum and product seem more fundamental; acsets/array systems seem comparitively adhoc.

This morning, I had an idea for how to break array systems into a composition of simpler features. Essentially, it came from the remembering that an array of type A could be expressed by the existential type

\sum_{X \in \mathsf{Fin}} A^X

So why not just separate the two components of this? This is done in programming languages like Scala and OCaml for normal types, e.g. you can have a trait in Scala that looks like:

trait Monoid {
  type T
    
  val identity: T

  def mul(x1: T, x2: T): T
}

Similarly, you can have a module in OCaml which looks like

module type MONOID = sig
  type t

  val mempty : t

  val mappend: t -> t -> t
end

Of course, in a language with dependent types this is just a dependent record, but as OCaml and Scala show, it’s possible to do this without full dependent types.
In F-ing modules, Rossberg shows how a ML-style module system can be encoded via existential types, and Scala’s DOT calculus gives a similar picture.

In fact, this sort of “existential type” is how the type system of many object-oriented languages like Java work under the hood. An instance of a class is a specific, but unknown type (because it could be a subclass), along with a vtable which provides all of the methods that are defined in the class. This encoding is shown explicitly in the Haskell wiki page on existential types.

The point is that this is quite well-understood and implementable from a type-theoretic standpoint.

So what if we take this theory, and make a restriction. Specifically, to the underlying type theory we add an additional judgment form \Gamma \vdash T \: \mathbf{ftype} (where \mathbf{ftype} stands for “finite type”), along with axioms

and for all n \in \mathbb{N}

Having “finite types” in a language has been explored in the context of high-performance array processing in dex-lang and moreover the array support in SciLean is based off of a similar idea: see this localcharts comment for an example of how this works.

We could also allow finite types to be created from namespaces, which are prefix-free finite sets of lists of strings, which we represent as separated via dots. E.g. stuff like \{ \mathtt{a.b}, \mathtt{c}, \mathtt{p.f} \}. This representation of finite types is useful for finite sets of variables. We could also have finite sets of UUIDs, or of generational indices.

The neat thing about this approach is that different representations of finite sets can live together, because ftype is a judgment, not a type.

Then we restrict function type creation to the case when T is an ftype, i.e.

We similarly restrict existential types. The end result would then be something like (in pseudo-Scala syntax)

type WGraph[T] {
  ftype V
  ftype E
  
  def src(e: E): V
  def tgt(e: E): V
  def weight(e: E): T
}

Making this into a type theory would allow us to flexibly compose types. For instance, we could declare

type WGraphHom[T] {
  val dom: WGraph[T]
  val codom: WGraph[T]
  
  def apply(v: dom.V): codom.V
  def apply(e: dom.E): codom.E
}

Something similar to Scala’s support for refinement types would then allow things like

type WGraphCospan[T] {
  val leftfoot: WGraph[T]
  val apex: WGraph[T]
  val rightfoot: WGraph[T]
  
  val leftinj: WGraphHom[T] {
    val dom = leftfoot
    val codom = apex
  }
  
  val rightinj: WGraphHom[T] {
    val dom = rightfoot
    val codom = apex
  }
}

We could then have additional rules which specified that certain types were ftypes if all of their fields were. For instance, if we had

type Pair[S,T] {
  val _1: S
  val _2: T
}

then if E were an ftype, we could also allow Pair[E,E] to be an ftype.

Assuming all of the type theory works out (which I think it does, because this is just a restriction of an existing type theory), it should be straightforward to serialize/deserialize any instance of these types. Namely, the ftype fields are stored as whatever representation of finite set we want (integers, namespaces, finite sets of UUIDs, etc.), and the defs are stored as arrays/tries/hashmaps/etc. val fields are stored as normal.

Extensions

It could be possible to give a category of instances for each type, where the morphisms were “diffs”. The category of diffs for a primitive type like Int would simply be a codiscrete category, but the category of diffs for a ftype would record what elements were added and deleted.

Each of the type formation operations could then act on these categories. For instance, a product type would simply be a product of the categories of diffs. A sum type might be the “connected sum” of two categories, which is the coproduct, but also adds a unique morphism between each pair of objects in the opposite sides. So there would be a unique diff from Left(x) to Right(y), but possibly many diffs from Left(x) to Left(x'), depending on the diffs from x to x'. I’m not totally sure how to interpret the existential types yet; the success of this depends whether or not there is a clean way of doing this.

Along with this, there should also be a notion of migrations between different versions of the same type. For instance, it should be possible to add a new optional field to a record type.

So essentially, I’m hoping that this could form the core type theory of chit.

Another extension is to add laws to the types. The laws could be first-order logical statements in which the only quantification was over ftypes. Thus, the laws would be decidable.

And of course, it becomes very tempting to add function definitions as well. At a certain point, this just becomes its own programming language if we do this too much, and my hope is for this to not be its own programming language, rather this should be compiled into types in existing programming languages. Which is not incompatible with having function definitions, but ideally restricts the scope of function definitions somewhat.

We could also consider ftype to stand for finitely presented types, or add a new judgment fptype. This would add all sorts of complications, but might be possible. The serialization of a finitely presented type would be the list of generators, along with an e-graph expressing the equalities between those generators.

Where to go from here?

I would be very happy if someone picked up on this and fleshed it out, either on the theory side or the implementation side. I can’t focus on this right now, because it is too greenfield and I need to deliver on other projects, but if you are interested in working on this, please contact me.

1 Like

With some judicious use of the subtyping and path types used in the DOT calculus, it might also be possible to recover some of the features of nested acsets. Consider:

type Ports {
  ftype In
  ftype Out
}

type HasSrcTgt[T <: Ports] {
  val src: T
  val tgt: T
  
  val srcPort: src.Out
  val tgtPort: tgt.In
}

type DWD {
  ftype Box <: Ports
  ftype Wire <: SrcTgt[Box]
}

Also, from the F-ing modules paper:

My idea is essentially to have “large, small, and very small types”!

Another extension of this: sorted finite types, with affordances for comparing two elements, getting the elements in sorted order, making a new element that is “larger than everything else”, “smaller than everything else”, “comes right before element x”, “comes right after element x”. This would be implemented by assigning each element an integer key, but those integer keys could be reallocated opaquely.

For instance if the keys were “2,4”, and I wanted to put an element before 4, then I could just put it at 3.

But if the keys were “3,4”, and I wanted to put an element before 4, then I could “garbage collect”, remake the keys to be “0,10” and put the new element at 5. With sufficient spacing, this garbage collection should happen rarely, so it should be efficient.

Simpler alternative to previous wiring diagram schema:

type HasPorts {
  ftype In
  ftype Out
}

type DWD {
  ftype Box <: HasPorts
  ftype Wire

  type OutPort {
    val box: Box
    val port: box.Out
  }
  
  type InPort {
    val box: Box
    val port: box.In
  }
  
  def src(w: Wire): OutPort
  def tgt(w: Wire): InPort
}

Variation with outer ports:

type HasPorts {
  ftype In
  ftype Out
}

type DWD <: HasPorts {
  ftype Box <: HasPorts
  ftype Wire

  enum SrcPort {
    case Inner(box: Box, port: box.Out)
    case Outer(port: In)
  }
  
  enum TgtPort {
     case Inner(box: Box, port: box.In)
     case Outer(port: Out)
  }
  
  def src(w: Wire): SrcPort
  def tgt(w: Wire): TgtPort
}

Putting things in the boxes via refinement types.

type FilledDWD[T <: HasPorts] <: HasPorts {
  val dwd: DWD
  type In = dwd.In
  type Out = dwd.Out
  
  def content(b: dwd.Box): T { ftype In = b.In, ftype Out = b.Out }
}

enum NestedDWD[T <: HasPorts] <: HasPorts {
  case Primitive(value: T) {
    ftype In = value.In
    ftype Out = value.Out
  }
  case Nested(filled: FilledDWD[NestedDWD[T]]) {
    ftype In = filled.In
    ftype Out = filled.Out
  }
}

A putative “Julia-like” syntax for this:

  @record FancyGraph{T} begin
    name::Symbol

    V1::TABLE
    V2::TABLE
    E::TABLE

    label(v::V1)::Point{T}
    label(v::V2)::I64

    src(e::E)::V1
    tgt(e::E)::Either{V1, V2}
    weight(e::E)::Map{I64, I64}
  end

  @record FancyGraphHom{T} begin
    dom::FancyGraph{T}
    codom::FancyGraph{T}

    apply(v::dom.V1)::codom.V1
    apply(v::dom.V2)::codom.V2
    apply(v::dom.E)::codom.E
  end

  @record FancyGraphCospan{T} begin
    leftfoot::FancyGraph{T}
    apex::FancyGraph{T}
    rightfoot::FancyGraph{T}

    leftinj::FancyGraphHom{T} where {
      dom = leftfoot
      codom = apex
    }

    rightinj::FancyGraphHom where {
      dom = rightfoot
      codom = apex
    }
  end

  @record Ports begin
    In::TABLE
    Out::TABLE
  end

  @record InPort{T<:Ports} begin
    val::T
    port::val.In
  end

  @record OutPort{T<:Ports} begin
    val::T
    port::val.Out
  end

  @record DWD <: Ports begin
    Box::TABLE{Ports}
    Wire::TABLE

    src(w::Wire)::OutPort{Box}
    tgt(w::Wire)::InPort{Box}
  end