Capturing syntax trees

A basic syntax tree can be thought of as the positions of the free monad on some polynomial functor. For instance, if you take the polynomial

p = \{+,\times\} y^2 + \mathbb{Q}

then the free monad on this has positions given by basic arithmetic expressions.

However, programming languages have more complex syntax trees than this. Specifically, programming languages have binding expressions, like let x = 2 in blah, where the syntax trees in blah are allowed to refer to x.

We can model this with a endobicomodule on \mathbb{N} y, which looks like

y_n \mapsto \{+,\times\} y^2_n + \mathbb{Q} + \{1,...,n\} + \{\mathrm{let}\}y_n y_{n+1}

This says that a syntax tree with n free variables consists of either

  • a binary operator applied to two subtrees with n free variables
  • a constant
  • a reference to one of the variables
  • a let binding, which has one subtree with n free variables, which is the value that we are binding to a new variable, and then one subtree with n+1 free variables, which is the body of the let binding that we will evaluate with an additional binding.

If you then take the free monad on this endobicomodule, you get syntax trees which are allowed to introduce bindings and use them in subexpressions! And of course, this works for other binding structures.

You could even have this instead be a endobicomodule on \mathsf{FinSet}_{mono}, which would allow you to talk about weakening an expression with n free variables into being an expression with n+k free variables, or think about typed variables by slicing.

Exciting stuff!

1 Like