Desiderata for an adequate scientific publishing platform

owenlynch · November 27, 2023, 7:53pm

I’ve been thinking a lot about improving my current technology for scientific publishing, and I wanted to write down some of my thoughts, so I can ~~stop obsessing over them~~ see what other people have to think.

Current state of the art

First, I want to list some of the technologies that I have been looking at, which all in my opinion fall short in one way or another. It is as of yet unclear to me whether the correct way forward is to adapt one of these, or to write something from scratch.

LaTeX. In some ways, LaTeX is the gold standard for scientific publishing. However, it has a key failure, which is that it does not produce HTML. There have been many projects over the years to fix this, and they all… kind of work. But I think that good HTML will only really happen via something that natively “thinks” in web technology, and LaTeX is not that.
Gerby. A software to make wikis out of LaTeX files, used for the stacks project and Kerodon.
Quarto. In some ways, quarto is the gold standard for scientific publishing on the web. Notable stand-out features involve good cross-referencing and literate programming. But quarto consists of 4 different languages (haskell for pandoc, lua for pandoc filters, server-side javascript for the main bulk, and then client-side javascript for other stuff), that are angrily duct-taped together, and so it is pretty hard to push it to do anything it wasn’t explicitly designed to do.
TeXmacs. TeXmacs is a really cool WYSIWYG editor (what you see is what you get) editor for scientific publishing. It does have html output, but the html output is fairly hard-coded for one specific style, and does not compose well with other stuff. This is generally the case with TeXmacs; it does everything itself. This also means it’s hard to publish papers written in TeXmacs.
Pollen. Pollen is a racket DSL for creating beautiful web books. Having a real language under the hood instead of just a formatting language is incredibly powerful. However it is fairly customized for the purpose of books; some work is required to support other purposes.
Astro. Astro is a static site generator written in javascript. In some ways, astro is the state-of-the-art for static non-technical content, and its advantage is that a lot of the stuff that other frameworks have to include client-side javascript for (like MathJax), it runs at compile-time, so even though astro is written in javascript, it actually produces pages with the least javascript.
LaTeX.css. This just turns an html page into something that looks like LaTeX. However, it uses css counters for theorems, etc., which means that you can’t refer to them by number with an equivalent of \ref.
Typst. Typst is the up-and-coming contender for LaTeX. It boasts a very impressive collaborative editor, and like scribble, it is its own language, so it is very extensible. However, it doesn’t support HTML yet, and was not built to support HTML from the beginning, so I have my doubts about the quality of the eventual HTML output, if they do add HTML output. Also, I don’t know how it would be integrated in an overall website.
Forum Magnum. ForumMagnum is what LessWrong runs on, and its native publishing format has a lot of great features, like crossreferences and potentially a collaborative editor. But it’s heavily customized for LessWrong itself; I wanted to run LocalCharts on ForumMagnum originally, but it wouldn’t have worked.

Desiderata

In this section, I list features that I want, and under each feature, current technology that has those features.

A document programming language, rather than a markup language.

LaTeX
TeXmacs
Pollen
Astro (sort of, through mdx)
Typst

High-quality cross-referencing and bibliography.

LaTeX
TeXmacs
Quarto (sort of, I have some problems with its cross-referencing)
Typst
Gerby

High-quality typesetting on web, including math.

Astro
Pollen
Forum Magnum
Quarto (sort of… I don’t like Bootstrap)
TeXmacs (sort of… not very customizable)
LaTeX.css
Gerby (sort of… not very customizable)

LaTeX-interoperability, so that a blog post can easily turn into a journal article.

LaTeX
Quarto
Scribble (another racket tool similar to Pollen)
TeXmacs (sort of… not very compatible)
Gerby

Incremental compilation, so that edits to a single document don’t require recompilation of the whole project

LaTeX (with subfiles)
Quarto
Astro
Pollen maybe?
Gerby maybe?

Real-time collaboration.

LaTeX, with overleaf.
Markdown-based solutions with hedgedoc, sort of.
Any text-based format with visual studio code.

Something more user friendly for collaboration than git

LaTeX, with overleaf
Forum Magnum.
LocalCharts with docs.localcharts.org (which I’m writing this on now)
Astro, via content management systems

TikZ support

LaTeX, but no method for LaTeX->HTML supports this
Quarto, with custom lua filters
TeXmacs (jankily)
Astro, with a markdown plugin
not Gerby, even though it is LaTeX-based

Large-scale project support, for wikis/blogs/books.

Quarto
Pollen
Astro
Forum Magnum
Gerby

Presentations.

LaTeX
Quarto
TeXmacs
Typst?

owenlynch · November 27, 2023, 9:21pm

Other things I have found recently:

djot. An improved markdown written by the author of pandoc. Still not quite a document language, but very nice.
zettlr. Looks like this is mostly for personal note-taking and only export to LaTeX is well-supported.
emanote. A nice architecture and good live-updating but the output seems to be fairly fixed to a single html layout.

epatters · November 28, 2023, 5:31am

Nice list! I too would really like to see progress in this area. It has always seemed kind of crazy to me that the mathematical sciences have been stuck with LaTeX for so long. I don’t even want to think how many person-hours have been lost trying to cope with this technology.

But here’s my hot take. I think that to get out of the local optima that are LaTeX or souped-up Markdown, you can’t just have marginally better features or technical design, you need to be radically different and better in some dimension. A very interesting direction for me would be a mathematical authoring system that incorporates some semantic content (while not being anywhere near complete formalization like in a proof assistant). For example, being able to track the logical dependencies between concepts and results, or being able to express hierarchical decompositions of proofs into layers of increasing detail. Getting the design right would be tricky but this seems to me like something that could actually be built.

owenlynch · November 28, 2023, 5:47am

Connected to that, it would be cool to do something similar to the TeX library that Dmitri Pavlov cooked up: https://arxiv.org/pdf/2005.05284.pdf, where technical terms are pervasively hyperlinked.

But I honestly think that the killer feature for typst was built-in realtime collaboration. When we compete against LaTeX, we’re not just competing against LaTeX, we’re competing against overleaf. I think that the thing to be “radically different and better” is to make it easier for collaboration. I think that this is compatible with what you are saying though; one killer feature that overleaf doesn’t have, and would be almost completely impossible to add without a radical redesign, is cross-document hyperlinked references. I.e., being able to “semantically depend on” specific results in other documents.

Getting this right might also require some sort of built-in versioning, so that you could reference a definition at a specific version, and not have it grow stale, but have the option to update to another version.

I definitely like “levels of increasing detail” for proofs as well though, that’s a feature that people might really love. In general, making it easier to hide/show different parts of a document based on different uses for that document would make for a radically new experience.

owenlynch · November 28, 2023, 5:52am

It would be really cool if we had such an expansive notion of “hiding and showing content” that presentations came out as a special case.

owenlynch · November 28, 2023, 6:05am

Other prior art I forgot to include: patoline, SILE. These are more pdf-focused, and focused on typography rather than content/collaboration.

alok · January 14, 2024, 4:32am

Lean 4 verso? Lean is very extensible and uses a lot of linking for blueprint docs like taos recent proof

owenlynch · January 16, 2024, 6:04pm

I know about Lean 4, but what is “verso”? Googling lean 4 verso didn’t get me anything.

alok · January 16, 2024, 8:01pm

https://www.youtube.com/watch?v=FZFOJBxzAo0
https://github.com/leanprover/verso
https://github.com/leanprover/doc-gen4

owenlynch · January 16, 2024, 10:54pm

Cool! I’ve recently become pretty invested in forester, so I don’t think I will use this any time soon, but I’m glad to know that other people are taking this problem seriously.

bsaul · January 23, 2024, 4:31pm

@owenlynch - I don’'t see literate programming on your list, which I would include as I often find myself wanting to present code alongside math/text and sometimes the code is the math.

I’ll mention entangled in this regard.

owenlynch · January 23, 2024, 5:34pm

I looked at that just now; does it allow you to take results from running the code and put them into the html generated from the markdown?

bsaul · January 23, 2024, 5:48pm

I think so in a limited way (example), though I’ve never used the project myself so I don’t know for all it can it.

epatters · March 6, 2024, 1:36am

A note about Typst: the latest release of Quarto now supports it.

I haven’t tried Typst yet but it looks very promising. I agree that lack of HTML output is a big drawback but the Typst team say that they are actively working on it. Something to keep an eye on.

owenlynch · March 6, 2024, 1:40am

Only as an output format though, I believe.

And I’m certainly excited for html output, but I’m a little skeptical it will be very good. My feeling is that what I want from something that outputs html is the ability to CSS-style the output, and I think that typst assumes that typst will be fully in control of styling.

evanwashington · July 9, 2024, 7:59pm

Came across this paper, “A Core Calculus for Documents”, and thought it would be of particular interest to those in this discussion, as it gives a type-theoretic taxonomy of document languages. Here’s the abstract:

Passive documents and active programs now widely commingle. Document languages include Turing-complete programming elements, and programming languages include sophisticated document notations. However, there are no formal foundations that model these languages. This matters because the interaction between document and program can be subtle and error-prone. In this paper we describe several such problems, then taxonomize and formalize document languages as levels of a document calculus. We employ the calculus as a foundation for implementing complex features such as reactivity, as well as for proving theorems about the boundary of content and computation. We intend for the document calculus to provide a theoretical basis for new document languages, and to assist designers in cleaning up the unsavory corners of existing languages.

Also it comes from a research group Cognitive Engineering Lab at Brown which you all might find interesting

Topic		Replies	Views
Reproducible Literate Programming: A Correct and Efficient Design General	1	198	January 23, 2024
Call for content! Meta	1	134	April 17, 2023
A categorical approach to scientific data management General	0	69	April 14, 2023
How to write a fantastic book in four months, Part II General	0	3	December 4, 2024
How to write a fantastic book in four months, Part II General	0	36	July 4, 2024

Desiderata for an adequate scientific publishing platform

Current state of the art

Desiderata

Related topics