Desiderata for an adequate scientific publishing platform

I’ve been thinking a lot about improving my current technology for scientific publishing, and I wanted to write down some of my thoughts, so I can stop obsessing over them see what other people have to think.

Current state of the art

First, I want to list some of the technologies that I have been looking at, which all in my opinion fall short in one way or another. It is as of yet unclear to me whether the correct way forward is to adapt one of these, or to write something from scratch.

  1. LaTeX. In some ways, LaTeX is the gold standard for scientific publishing. However, it has a key failure, which is that it does not produce HTML. There have been many projects over the years to fix this, and they all… kind of work. But I think that good HTML will only really happen via something that natively “thinks” in web technology, and LaTeX is not that.
  2. Gerby. A software to make wikis out of LaTeX files, used for the stacks project and Kerodon.
  3. Quarto. In some ways, quarto is the gold standard for scientific publishing on the web. Notable stand-out features involve good cross-referencing and literate programming. But quarto consists of 4 different languages (haskell for pandoc, lua for pandoc filters, server-side javascript for the main bulk, and then client-side javascript for other stuff), that are angrily duct-taped together, and so it is pretty hard to push it to do anything it wasn’t explicitly designed to do.
  4. TeXmacs. TeXmacs is a really cool WYSIWYG editor (what you see is what you get) editor for scientific publishing. It does have html output, but the html output is fairly hard-coded for one specific style, and does not compose well with other stuff. This is generally the case with TeXmacs; it does everything itself. This also means it’s hard to publish papers written in TeXmacs.
  5. Pollen. Pollen is a racket DSL for creating beautiful web books. Having a real language under the hood instead of just a formatting language is incredibly powerful. However it is fairly customized for the purpose of books; some work is required to support other purposes.
  6. Astro. Astro is a static site generator written in javascript. In some ways, astro is the state-of-the-art for static non-technical content, and its advantage is that a lot of the stuff that other frameworks have to include client-side javascript for (like MathJax), it runs at compile-time, so even though astro is written in javascript, it actually produces pages with the least javascript.
  7. LaTeX.css. This just turns an html page into something that looks like LaTeX. However, it uses css counters for theorems, etc., which means that you can’t refer to them by number with an equivalent of \ref.
  8. Typst. Typst is the up-and-coming contender for LaTeX. It boasts a very impressive collaborative editor, and like scribble, it is its own language, so it is very extensible. However, it doesn’t support HTML yet, and was not built to support HTML from the beginning, so I have my doubts about the quality of the eventual HTML output, if they do add HTML output. Also, I don’t know how it would be integrated in an overall website.
  9. Forum Magnum. ForumMagnum is what LessWrong runs on, and its native publishing format has a lot of great features, like crossreferences and potentially a collaborative editor. But it’s heavily customized for LessWrong itself; I wanted to run LocalCharts on ForumMagnum originally, but it wouldn’t have worked.

Desiderata

In this section, I list features that I want, and under each feature, current technology that has those features.

  1. A document programming language, rather than a markup language.
  • LaTeX
  • TeXmacs
  • Pollen
  • Astro (sort of, through mdx)
  • Typst
  1. High-quality cross-referencing and bibliography.
  • LaTeX
  • TeXmacs
  • Quarto (sort of, I have some problems with its cross-referencing)
  • Typst
  • Gerby
  1. High-quality typesetting on web, including math.
  • Astro
  • Pollen
  • Forum Magnum
  • Quarto (sort of… I don’t like Bootstrap)
  • TeXmacs (sort of… not very customizable)
  • LaTeX.css
  • Gerby (sort of… not very customizable)
  1. LaTeX-interoperability, so that a blog post can easily turn into a journal article.
  • LaTeX
  • Quarto
  • Scribble (another racket tool similar to Pollen)
  • TeXmacs (sort of… not very compatible)
  • Gerby
  1. Incremental compilation, so that edits to a single document don’t require recompilation of the whole project
  • LaTeX (with subfiles)
  • Quarto
  • Astro
  • Pollen maybe?
  • Gerby maybe?
  1. Real-time collaboration.
  • LaTeX, with overleaf.
  • Markdown-based solutions with hedgedoc, sort of.
  • Any text-based format with visual studio code.
  1. Something more user friendly for collaboration than git
  • LaTeX, with overleaf
  • Forum Magnum.
  • LocalCharts with docs.localcharts.org (which I’m writing this on now)
  • Astro, via content management systems
  1. TikZ support
  • LaTeX, but no method for LaTeX->HTML supports this
  • Quarto, with custom lua filters
  • TeXmacs (jankily)
  • Astro, with a markdown plugin
  • not Gerby, even though it is LaTeX-based
  1. Large-scale project support, for wikis/blogs/books.
  • Quarto
  • Pollen
  • Astro
  • Forum Magnum
  • Gerby
  1. Presentations.
  • LaTeX
  • Quarto
  • TeXmacs
  • Typst?
3 Likes

Other things I have found recently:

  • djot. An improved markdown written by the author of pandoc. Still not quite a document language, but very nice.
  • zettlr. Looks like this is mostly for personal note-taking and only export to LaTeX is well-supported.
  • emanote. A nice architecture and good live-updating but the output seems to be fairly fixed to a single html layout.
1 Like

Nice list! I too would really like to see progress in this area. It has always seemed kind of crazy to me that the mathematical sciences have been stuck with LaTeX for so long. I don’t even want to think how many person-hours have been lost trying to cope with this technology.

But here’s my hot take. I think that to get out of the local optima that are LaTeX or souped-up Markdown, you can’t just have marginally better features or technical design, you need to be radically different and better in some dimension. A very interesting direction for me would be a mathematical authoring system that incorporates some semantic content (while not being anywhere near complete formalization like in a proof assistant). For example, being able to track the logical dependencies between concepts and results, or being able to express hierarchical decompositions of proofs into layers of increasing detail. Getting the design right would be tricky but this seems to me like something that could actually be built.

3 Likes

Connected to that, it would be cool to do something similar to the TeX library that Dmitri Pavlov cooked up: https://arxiv.org/pdf/2005.05284.pdf, where technical terms are pervasively hyperlinked.

But I honestly think that the killer feature for typst was built-in realtime collaboration. When we compete against LaTeX, we’re not just competing against LaTeX, we’re competing against overleaf. I think that the thing to be “radically different and better” is to make it easier for collaboration. I think that this is compatible with what you are saying though; one killer feature that overleaf doesn’t have, and would be almost completely impossible to add without a radical redesign, is cross-document hyperlinked references. I.e., being able to “semantically depend on” specific results in other documents.

Getting this right might also require some sort of built-in versioning, so that you could reference a definition at a specific version, and not have it grow stale, but have the option to update to another version.

I definitely like “levels of increasing detail” for proofs as well though, that’s a feature that people might really love. In general, making it easier to hide/show different parts of a document based on different uses for that document would make for a radically new experience.

3 Likes

It would be really cool if we had such an expansive notion of “hiding and showing content” that presentations came out as a special case.

1 Like

Other prior art I forgot to include: patoline, SILE. These are more pdf-focused, and focused on typography rather than content/collaboration.

Lean 4 verso? Lean is very extensible and uses a lot of linking for blueprint docs like taos recent proof

1 Like

I know about Lean 4, but what is “verso”? Googling lean 4 verso didn’t get me anything.

https://www.youtube.com/watch?v=FZFOJBxzAo0
https://github.com/leanprover/verso
https://github.com/leanprover/doc-gen4

1 Like

Cool! I’ve recently become pretty invested in forester, so I don’t think I will use this any time soon, but I’m glad to know that other people are taking this problem seriously.

1 Like

@owenlynch - I don’'t see literate programming on your list, which I would include as I often find myself wanting to present code alongside math/text and sometimes the code is the math.

I’ll mention entangled in this regard.

1 Like

I looked at that just now; does it allow you to take results from running the code and put them into the html generated from the markdown?

I think so in a limited way (example), though I’ve never used the project myself so I don’t know for all it can it.

A note about Typst: the latest release of Quarto now supports it.

I haven’t tried Typst yet but it looks very promising. I agree that lack of HTML output is a big drawback but the Typst team say that they are actively working on it. Something to keep an eye on.

Only as an output format though, I believe.

And I’m certainly excited for html output, but I’m a little skeptical it will be very good. My feeling is that what I want from something that outputs html is the ability to CSS-style the output, and I think that typst assumes that typst will be fully in control of styling.

1 Like