Wizards as State Machines
In this post, I’m going to lay out a lot of conjectures for a mathematical formalism of wizards. Unfortunately, I don’t mean wizard like this:
I mean wizard like this:
Why do I care about wizards? I think that there is a general pattern in wizards that is applicable to many domains beyond software installation, and moreover this general pattern has compositional structure.
What is a wizard? Well, first of all, a wizard is a state machine. A wizard has a current state, and that state can be advanced by interaction with the wizard.
In order to describe this, we use polynomial coalgebras. To understand how this works, let’s give an example. We want to model a wizard which at each step, displays some text to the user and gives the user a choice of two buttons.
We can model this with the polynomial functor.
where \mathtt{String} is the set of unicode strings, for instance.
A coalgebra of this functor consists of a set S and a function S \to p(S), i.e. a function S \to \mathtt{String} \times S^2. For each state, we get a string, which is the message to display the user, and then a pair of states, one for each button displayed. When the user pushes a button, we change to the corresponding state, and the process is repeated.
But this misses two important parts of a wizard: they begin and they end. We can model this by having a function 1 \to S, which marks the initial state, and changing the function S \to p(S) to a function S \to p(S) + 1, which allows some subset of states to be marked as final.
At this point, we can make an observation: why not allow wizards to begin and end in different ways? That is, instead of a map 1 \to S, we could have a map A \to S, where A is the set of “entry points” into the wizard, and instead of a map S \to p(S) + 1, we could have a map S \to p(S) + B, where B is the set of “ways the wizard can end”. For instance, we could have A = \mathtt{String}, which would allow us to start the wizard pointing at a specific file, and B = 2, which would allow the wizard to end signaling failure or success.
This now makes wizards look suspiciously like morphisms from A to B. Let’s see if this can work for real.
A Category of Wizards
For any polynomial functor p, let \mathsf{Wiz}_p to be the category where
- Objects are sets A, B
- A morphism from A to B consists of a tuple (S, \mathtt{init} \colon A \to S, \mathtt{update} \colon S \to p(S) + B)
To define composition, suppose that we have sets A,B,C and morphisms (S_1, \mathtt{init}_1 \colon A \to S_1, \mathtt{update}_1 \colon S_1 \to p(S_1) + B) and (S_2, \mathtt{init}_2 \colon B \to S_2, \mathtt{update}_2 \colon S_2 \to p(S_2) + C).
Then their composite is the tuple (S, \mathtt{init} \colon A \to S, \mathtt{update} \colon S \to p(S) + C) where
And \mathtt{init} and \mathtt{update} are defined by
where \mathtt{inj}_1 \colon S_1 \to S_1 + S_2 and \mathtt{inj}_2 \colon S_2 \to S_1 + S_2 are the canonical injections.
The intuition behind composition of wizards is that you go through the first wizard and then when it has completed, go on to the second wizard.
The identity on A consists of (A, 1_A \colon A \to A, \mathtt{inj}_2 \colon A \to p(A) + A). This is the “do nothing” wizard, which returns immediately with the value it was passed.
Now, notice that \mathsf{Wiz}_p is not a category, because A + S \neq S so composing with the identity doesn’t give the same morphism. In order to fix this, we need to quotient wizards by their behavior. That is, we consider (S, \mathtt{init}, \mathtt{update}) to be behaviorally equivalent to (S', \mathtt{init}', \mathtt{update}') if the user can’t tell the difference between them; i.e. for any a, the same choices of buttons to push give you the same sequence of texts, and you finish at the same time with the same end value. This can be formalized by defining a wizard to be a function A \to \mathbb{C}_{p+B}(1), where \mathbb{C}_{p + B} is the cofree comonad on p + B. Essentially, an element of \mathbb{C}_{p + B}(1) is a “behavior tree” of p + B.
So really, we want morphisms in \mathsf{Wiz}_p to be maps A \to \mathbb{C}_{p+B}(1). Because these are the same as equivalence classes of state machines, our earlier definition of composition still works; we leave it to the reader to formulate composition in a more “polynomial-ish” manner.
There is now a symmetric monoidal structure on \mathsf{Wiz}_p given on objects by coproduct, and on morphisms A_1 \to \mathbb{C}_{p + B_1}, A_2 \to \mathbb{C}_{p + B_2} by the composite
Therefore, we can consider string diagrams of wizards. We can go further than this and also try to put a traced structure on \mathbf{Wiz}_p; this allows wizards to be run in loops. However, in order for this to work, we probably need to switch to partial wizards, i.e. A \to \mathbb{C}_{p + B}(1) + 1, to allow for infinite loops.
Migrating Wizards
One might wonder if this construction is functorial in p. And I think it is, if we have a map of polynomials p \to q we get a functor \mathsf{Wiz}_p \to \mathsf{Wiz}_q given on objects by the identity, and on morphisms by functoriality of \mathbb{C}_{p + B} in p. So \mathsf{Wiz} is a functor from \mathsf{Poly} to \mathsf{Cat}.
We can think of this functor like a “stateless” effect handler; an effect in (position of) p is mapped to an effect in q, and the response to the effect is carried back.
However, as David Spivak helped me see, one can go even further than this and consider stateful effect handlers. An element m \in \mathbb{C}_{[p, q]}(1) represents a way of translating p effects into q effects, and then updating some internal state, so that the next p effect might be handled in a different way. There is a natural map
So specifically, there is a function
and thus for every m \in \mathbb{C}_{[p,q]}, a function
which is precisely what we need to turn a function A \to \mathbb{C}_{p + B}(1) to a function A \to \mathbb{C}_{q + B}(1).
There is a category \mathsf{Org} where the objects are polynomials, and a morphism from p to q is an element of \mathbb{C}_{[p, q]}(1). Then \mathsf{Wiz} is a functor from this category into \mathsf{Cat} as well, with the action on morphisms given by the above.
All of this needs to be proven in much more detail, and there are probably higher-categorical dimensions to this as well that we are ignoring. But this is enough to start guiding an implementation of wizards that hopefully I’ll be able to show off in Semagrams! If anyone wants to work on formalizing this, it would be very welcome; there’s probably at least a paper’s worth of material here.