Positive impact of AlgebraicJulia

@Brendan_Fong asked me, @kris-brown, and @david.jaz to think about ways that AlgebraicJulia concretely helps people, and in conversation we came up with something that vaguely resembles the following post.

As mathematicians, we often like to work from axioms and make all of our assumptions very clear. Moreover, we understand that models are approximations to reality, so we never want to get stuck in only one model. However, this is not how people typically learn science.

Rather, science unfolds as a sequence of well-intentioned lies. In physics, you first learn classical mechanics, and then learn that actually when things go really fast or when things are really small, classical mechanics no longer accurately describes the world.

Why do we structure education like this? Why would you teach something known to be a flawed model? Well, first of all, classical mechanics does do a pretty good job at describing things at “normal scale”, so it’s actually still very useful for reasoning about the world. But more importantly, going straight to relativity would be pedagogically very difficult. One cannot conceive of the defects of the system until one first understands the system on its own terms.

In a similar vein, my friend once took a Mandarin class where a large portion of the assignments consisted of memorization of dialogues, and then recitations of these dialogues with a partner. The idea was that first you learn set forms, then you learn how to improvise slight variations on those set forms, and as you progress you can start making larger and larger variations.

This is often how scientists are taught (explicitly or implicitly) how to code. Their community of practice has some software libraries built for computing certain types of models, and then scripts and notebooks using those libraries are socially transmitted via copy paste. When a student first learns the library, they are taught what seems to be a mystical series of incantations that produces the desired result. They then might become more comfortable making variations on those incantations, still without fully understanding what they mean, to accomplish a variety of tasks. As they progress, they learn that certain incantations encode assumptions about their problem domain, and thus should not be used when those assumptions do not hold.

But at a certain point, the underlying assumptions of the software libraries stop this process. For instance, the software library might assume that your system is continous, and you need to model a discontinuous shockwave.

In many software systems, breaking an underlying assumption like this would throw you off a cliff in terms of software engineering difficulty. You might have to find a new library with totally different API, completely incompatible with all of the systems that you have so far made. You might have to modify the library you are working with, diving into an unfamiliar codebase that could be even written in an unfamiliar language like C or Fortran.

In this way, the scientist becomes trapped by the implicit assumptions of the library.

AlgebraicJulia is designed to avoid this trap. In the early stages of use, a scientist should be able to follow “social scripts” for AlgebraicJulia, by copy-pasting and modifying premade models. However, as familiarity with the system increases, guardrails can be progressively taken off, and the library can be used more and more flexibly, until the scientist is constructing wholly new paradigms of models.

Moreover, the design of AlgebraicJulia can encourage this process. As the scientist grows to understand a particular social script, they can start to see the “walls of the fishbowl” that they are in, so to speak. That is, the design of AlgebraicJulia can encourage scientists to progressively question their assumptions, as it becomes clear that the choices embedded in the social script were not necessitated by something fundamental to the software framework.

I recently witnessed an “aha” moment in this vein, when I was teaching Jeff Bezanson how to use the agent-based modeling framework in AlgebraicJulia. He realized that the fact that agents lived on a graph was not actually an underlying assumption of the software system: the vertices and edges of the graph and the agents living on that graph were all modeling assumptions that could be changed. The technological artifact of the AlgebraicJulia codebase sparked a conceptual shift in the meaning of agent based model for Jeff.

The implicit claims, assumptions, and biases embedded within software have, do, and will shape the future of the world. I think that creating software which supports scientists in their journey of learning to question and go beyond the assumptions of their predecessors is something that I would point to as a concrete benefit, for scientists and the world at large.

1 Like

All this is great. But personally when I think about “ways that AlgebraicJulia concretely helps people”, I think about how Nate Osgood is planning to run bootcamps for public health experts using ModelCollab, the modeling software that our team developed based on AlgebraicJulia. This will concretely help people when those public health experts use the software to fight disease! So to me it’s very urgent to get to that point. We are also running a 6-week program in Edinburgh to develop software for agent-based models in AlgebraicJulia, again for public health modeling.

The advantages of this software over previous software packages that do roughly the same job are completely due to category theory and the ability of AlgebraicJulia to handle category-theoretic constructs. In my talk on this stuff (first link) I emphasized three points:

  • functorial semantics for using models in flexible ways,

  • decorated and structure cospans for model composition, and

  • pullbacks for model stratification.

These three points are very general and robust, but we will only concretely help people when we start using category-based modeling tools to make people’s lives better. So I hope a lot of people jump on this particular bandwagon.

2 Likes

I would say that these are good examples of how AlgebraicJulia allows you to flexibly iterate on model building. However, I think that the connection between “being able to make lots of different models” and “fighting disease” is not actually a priori obvious. Being able to make lots of different models also means that if you are a policy maker, and you want to make a certain policy, it is easy to find a model which predicts that that policy is good. Epidemiology is not like electrical engineering; the epistemology of “when does a model actually help make decisions” is highly nontrivial, and something that we should be worrying about.

However, I do think that being able to make lots of different models is a good thing. In my post above, I was trying to argue why I think this by drawing the line between being able to make lots of different models and increasing the ability of scientists to think about the world. The important thing here is not that the models are necessarily good, the important thing is that the models help you explore the impacts of different assumptions, so that you don’t just blindly follow the 30 year old Fortran code that your adviser’s adviser wrote and nobody knows how to change.

In the long-term future of AlgebraicJulia, however, I think we have to confront that we are doing not just math, but science. It’s very tempting as a mathematician and as a software engineer to blindly abstract patterns that we see in scientific models, but once we are asking scientists to use these scientific models, we are implicitly assuming asking scientists to trust that these are good scientific models. We have to live up to that trust by also providing tools that let scientists reason about the validity of the models we provide.

So for the short-term, you are right that providing scientists with more powerful tools for modeling is a concrete benefit to society. But there are some subtle points to the implication “easier to make models —> fighting disease better”; I was trying to draw out one of them in my post. But we should not necessarily take this for granted, and I think in the future we have the responsibility to develop more theory and software about not just modeling, but reasoning about the validity of modeling.

In a rather different direction (but perhaps still “concrete”): I recently learned about John Cartmell’s work on “generalized algebraic theories” from AlgebraicJulia which has filled in a piece of a mental puzzle that I have been working to progress for some years now regarding becoming a more proficient programmer by learning more ways to model and compute with categorical constructions, for which help I am very grateful!

3 Likes

Saying that categorical techniques “let you make lots of different models” really understates their advantages. They let us make better models, by improving the whole process whereby people interact to create a model. In public health, these people crucially include:

  • programmers who are good at writing software but aren’t experts on epidemiology or the many issues specific to the particular community facing a disease outbreak,

  • epidemiologists trained in building disease models, who typically use off-the-shelf software packages for building these models, who may be able to program but aren’t expert programmers, and who know many of the questions to ask community members but may be oblivious to important facts that some community members know,

  • community members who have access to many pieces of information required to make a good model, but may not always know which pieces of information are important, and typically don’t know programming, epidemiology, or model-building techniques.

So you shouldn’t imagine serious model-building in public health as done by a small team of experts who all understand the software being used. It requires the cooperation of a much larger, much more diffuse group: dozens to hundreds of people, none of whom understands the whole picture!

Community-based system dynamics addresses the difficult challenge of getting all these people to interact in a way that leads to a good model. Community-based system dynamics already realizes that diagrams are important for communicating between different people. You can’t show community members code or differential equations and expect them to understand!

Community-based system dynamics already uses three kinds of diagrams. In order of increasing complexity these are

  • causal loop diagrams
  • system structure diagrams
  • stock and flow diagrams

The last can be translated into code that simulates a set of ordinary differential equations. Community-based system dynamics gives a way to build stock and flow diagrams by

  • starting with causal loop diagrams (which are purely qualitative are easy for anyone to understand after 10 minutes of training),

then

  • building them up to system structure diagrams (which are also purely qualitative but more detailed),

and then

  • adding quantitative information to get stock and flow diagrams.

This process requires lots of people talking to each other and showing each other diagrams. Typically this is done in a room with lots of big pads of paper. At the end some poor schlep has to enter the resulting stock and flow model into AnyLogic, a piece of software that

  • only lets one person interact with it at a time,

  • doesn’t allow you to save and reuse smaller models,

  • doesn’t provide support for composing models,

  • doesn’t provide support for ‘stratifying’ models (breaking stocks into smaller stocks based on characteristics like age and sex),

  • doesn’t rigorously relate the three kinds of diagrams,

  • isn’t open source.

AlgebraicJulia can help us fix all these things. What we’ve done is show how each kind of diagram is a morphism in a different category, with forgetful functors going between these categories:

stock and flow diagrams →
system structure diagrams →
causal loop diagrams

We’ve programmed these categories and functors into AlgebraicJulia, and we’ve begun building a graphical user interface called ModelCollab that runs on a web browser and lets multiple people work with diagrams and use them to build and run models.

This sets the stage for programmers, epidemiologists and community members to interact in the way already envisioned by community-based system dynamics - but with computer support based on rigorous math aiding every step of the process.

I believe the functors relating different diagram languages should make it possible to rigorously check that the more sophisticated diagrams are correctly fleshing out the less sophisticated ones. The use of pullbacks for model stratification (i.e. refining models by breaking a single stock into many stocks) should make this process less error-prone: it’s usually done by hand in an ad hoc way.

And so on: there are actually many ways that category theory implemented in software can make community-based model building better. It’s only when I began to realize how modeling is currently done that I began realizing the scope for improvement!

4 Likes

I’m glad that I managed to poke you into writing this up; this is a great response to Brendan’s original question!

It’s also lighting the fire under me to build tools for this type of collaboration in AlgebraicJulia and Semagrams; @davidad started writing a version control system for categorical databases that I want to take inspiration from and make work.

2 Likes

Great, I’m glad you liked this! The ideas here were also in my talk at ACT2023, which someday should appear on YouTube. I want everyone to know about this stuff.

But I want to keep getting deeper into the practical aspects of modeling for public health and the human response to climate change. Before I started working with people who do public health modeling for a living, my ideas about modeling were - surprise! - very simplistic. And the neat thing is that the extra complexities I’m learning about are not all ‘negative’: they also reveal more opportunities for category theory and software to do a lot of good!

For example, I now think that compositionality and community based modeling fit hand in glove: compositionality becomes important when a lot of people are trying to work together.

It’s also lighting the fire under me to build tools for this type of collaboration in AlgebraicJulia and Semagrams.

Great! I hope you talk to Nathaniel Osgood and me about this. After all, he’s the one who wants software for manipulating diagrams to teach the next generation of public health modelers - and he teaches hundreds of such people every year! So the practical impact of your work will come faster if you team up with him.

2 Likes