How notebooks should work

Love them or hate them, notebooks (jupyter, pluto, livebook, observable) are here to stay.

What I like about notebooks is that they are a very nice way of presenting computation; intermixing prose, code, and output is a genius idea.

What I hate about notebooks is that they encourage a very ad-hoc programming style, and also that state-management in notebooks is a nightmare. It seems to me that there are two options for state management. One is to have the notebook be a glorified repl, where each cell is executed with side effects, and you better get the order of execution right or everything will be messed up. The second is to try to sequence the computation in a notebook reactively, so that cells rerun whenever their inputs change.

Both of these approaches can be very annoying. In the repl-based approach, you have to rerun the notebook from scratch in order to get reproducability. And in the reactive approach, dependency tracking can easily mess things up when the underlying language wasn’t built from the ground up to work with it.

What would a principled take on notebooks that would actually solve these problems look like?

Let’s start by listing some requirements.

  1. I want intelligent caching of computation. If I change something, and then change it back to the way it was before, I don’t want to have to rerun anything.
  2. I want to be able to save my progress in a notebook, so that if I reboot my computer I don’t have to recompute everything from scratch.
  3. I want to use imperative programming and mutation to compute things.
  4. I want my notebook to be totally reproducible.
  5. I want to use multiple programming languages in a single notebook.

Here’s a design for notebooks that I think could satisfy these requirements.

First of all, the language written inside the notebook should not be Julia/C/Python/etc. Instead, we should have a very basic domain specific language with the following properties.

  1. Each type is serializable and hashable (so specifically, there are no function types!). Basically, it should just be something like algebraic data types + acsets.
  2. Procedures are statically typed and side-effect free.

The cells in the notebook should only be written in this language, and they should look something like:

x, y = f(g(a,b), c)

I.e., essentially we support the operations of a cartesian category and not much else.

Then there should be “foreign function interfaces” to other languages, which allow declaration of procedures. The definition of these procedures should be done in the normal fashion, i.e. as libraries. This likely necessitates copying of data to and from the “real languages”, but as you aren’t running notebook cells in a hot loop, this overhead shouldn’t matter too much.

Now, here’s the trick. These foreign function interfaces should hash the exported functions. In a compiled language like Rust, this could be a hash of the fully qualified name of the function plus the hash of the source code of the package along with the versions of dependencies used. In a dynamic language like Julia, where functions can be redefined live, this could use the compilation cache somehow, i.e. we hash the code that is generated for the types of the arguments.

More generally, it should be possible to have a “dynamic foreign function interface”, where a running process can notify the notebook runtime of redefined functions.

Then the notebook runs computations and caches the results, where the cache is keyed on the hashes of the input data and the hash of the procedure.

Now, the brilliant thing here is that now this cache can actually be shared across notebooks! If two notebooks start off by computing the same thing, the second one can simply notice that its first couple steps were already computed, and pull that in instead of recomputing it from scratch. And moreover, this cache could be shared across computers. So you could have some very expensive computation that you run in CI on your server, and your notebook just pulls in the result. Notebooks remain reproducible because in theory everything could be run from scratch, but take advantage of cache to be much faster and responsive.

The key thing here is that procedures can use mutation internally, but as long as they are run in an isolated manner, this doesn’t matter.

One way of describing this could be “Nix for scientific notebooks”; Nix has a similar design, where software builders just run arbitrary bash scripts, but it achieves reproducibility by running those bash scripts in isolated environments with hashed inputs. And it would probably be good to use Nix to make the software environment around these notebooks reproducible as well.

Doing this properly would be a lot of work, but I think that ultimately something like this has to exist for notebooks to be a reliable and scalable tool for scientific computing.


Semi-related and cool: GitHub - tweag/jupyenv: Declarative and reproducible Jupyter environments - powered by Nix.

I’ve never really used notebooks, and I don’t think this feature set would sell it to me, so I don’t have that much to say. It reminds me, though, of content-addressed definitions in Unison 💡 The big idea · Unison programming language

1 Like

Yeah, that’s definitely an inspiration. But this is only useful if it can hook into mainstream scientific computing software, just like Nix would be useless if you could only use it with programs specifically written for it.

Update: I just discovered: Home - Polynote, which seems like it uses some of the principles laid out here! Very neat.

Another notebook software: The Book of Clerk