I’ve been thinking a lot about improving my current technology for scientific publishing, and I wanted to write down some of my thoughts, so I can stop obsessing over them see what other people have to think.
Current state of the art
First, I want to list some of the technologies that I have been looking at, which all in my opinion fall short in one way or another. It is as of yet unclear to me whether the correct way forward is to adapt one of these, or to write something from scratch.
LaTeX. In some ways, LaTeX is the gold standard for scientific publishing. However, it has a key failure, which is that it does not produce HTML. There have been many projects over the years to fix this, and they all… kind of work. But I think that good HTML will only really happen via something that natively “thinks” in web technology, and LaTeX is not that.
Gerby. A software to make wikis out of LaTeX files, used for the stacks project and Kerodon.
Quarto. In some ways, quarto is the gold standard for scientific publishing on the web. Notable stand-out features involve good cross-referencing and literate programming. But quarto consists of 4 different languages (haskell for pandoc, lua for pandoc filters, server-side javascript for the main bulk, and then client-side javascript for other stuff), that are angrily duct-taped together, and so it is pretty hard to push it to do anything it wasn’t explicitly designed to do.
TeXmacs. TeXmacs is a really cool WYSIWYG editor (what you see is what you get) editor for scientific publishing. It does have html output, but the html output is fairly hard-coded for one specific style, and does not compose well with other stuff. This is generally the case with TeXmacs; it does everything itself. This also means it’s hard to publish papers written in TeXmacs.
Pollen. Pollen is a racket DSL for creating beautiful web books. Having a real language under the hood instead of just a formatting language is incredibly powerful. However it is fairly customized for the purpose of books; some work is required to support other purposes.
Astro. Astro is a static site generator written in javascript. In some ways, astro is the state-of-the-art for static non-technical content, and its advantage is that a lot of the stuff that other frameworks have to include client-side javascript for (like MathJax), it runs at compile-time, so even though astro is written in javascript, it actually produces pages with the least javascript.
LaTeX.css. This just turns an html page into something that looks like LaTeX. However, it uses css counters for theorems, etc., which means that you can’t refer to them by number with an equivalent of \ref.
Typst. Typst is the up-and-coming contender for LaTeX. It boasts a very impressive collaborative editor, and like scribble, it is its own language, so it is very extensible. However, it doesn’t support HTML yet, and was not built to support HTML from the beginning, so I have my doubts about the quality of the eventual HTML output, if they do add HTML output. Also, I don’t know how it would be integrated in an overall website.
Forum Magnum. ForumMagnum is what LessWrong runs on, and its native publishing format has a lot of great features, like crossreferences and potentially a collaborative editor. But it’s heavily customized for LessWrong itself; I wanted to run LocalCharts on ForumMagnum originally, but it wouldn’t have worked.
Desiderata
In this section, I list features that I want, and under each feature, current technology that has those features.
A document programming language, rather than a markup language.
Nice list! I too would really like to see progress in this area. It has always seemed kind of crazy to me that the mathematical sciences have been stuck with LaTeX for so long. I don’t even want to think how many person-hours have been lost trying to cope with this technology.
But here’s my hot take. I think that to get out of the local optima that are LaTeX or souped-up Markdown, you can’t just have marginally better features or technical design, you need to be radically different and better in some dimension. A very interesting direction for me would be a mathematical authoring system that incorporates some semantic content (while not being anywhere near complete formalization like in a proof assistant). For example, being able to track the logical dependencies between concepts and results, or being able to express hierarchical decompositions of proofs into layers of increasing detail. Getting the design right would be tricky but this seems to me like something that could actually be built.
Connected to that, it would be cool to do something similar to the TeX library that Dmitri Pavlov cooked up: https://arxiv.org/pdf/2005.05284.pdf, where technical terms are pervasively hyperlinked.
But I honestly think that the killer feature for typst was built-in realtime collaboration. When we compete against LaTeX, we’re not just competing against LaTeX, we’re competing against overleaf. I think that the thing to be “radically different and better” is to make it easier for collaboration. I think that this is compatible with what you are saying though; one killer feature that overleaf doesn’t have, and would be almost completely impossible to add without a radical redesign, is cross-document hyperlinked references. I.e., being able to “semantically depend on” specific results in other documents.
Getting this right might also require some sort of built-in versioning, so that you could reference a definition at a specific version, and not have it grow stale, but have the option to update to another version.
I definitely like “levels of increasing detail” for proofs as well though, that’s a feature that people might really love. In general, making it easier to hide/show different parts of a document based on different uses for that document would make for a radically new experience.
Cool! I’ve recently become pretty invested in forester, so I don’t think I will use this any time soon, but I’m glad to know that other people are taking this problem seriously.
@owenlynch - I don’'t see literate programming on your list, which I would include as I often find myself wanting to present code alongside math/text and sometimes the code is the math.
I haven’t tried Typst yet but it looks very promising. I agree that lack of HTML output is a big drawback but the Typst team say that they are actively working on it. Something to keep an eye on.
And I’m certainly excited for html output, but I’m a little skeptical it will be very good. My feeling is that what I want from something that outputs html is the ability to CSS-style the output, and I think that typst assumes that typst will be fully in control of styling.
Came across this paper, “A Core Calculus for Documents”, and thought it would be of particular interest to those in this discussion, as it gives a type-theoretic taxonomy of document languages. Here’s the abstract:
Passive documents and active programs now widely commingle. Document languages include Turing-complete programming elements, and programming languages include sophisticated document notations. However, there are no formal foundations that model these languages. This matters because the interaction between document and program can be subtle and error-prone. In this paper we describe several such problems, then taxonomize and formalize document languages as levels of a document calculus. We employ the calculus as a foundation for implementing complex features such as reactivity, as well as for proving theorems about the boundary of content and computation. We intend for the document calculus to provide a theoretical basis for new document languages, and to assist designers in cleaning up the unsavory corners of existing languages.