Mathematical Structures in Programming Languages

by Stefano Volpe

Monday, Aug 4, 2025 | 9 minute read
part of 2025 ii | #Computation #Mathematics

Mathematical Structures in Programming Languages

If you enjoyed the Mathematical Structures in Logic course from last semester, you are going to love Nick’s rebranding for the new year: Mathematical Structures in Programming Languages! Alright, you got me, I haven’t managed to convince Nick yet. Still, don’t you find sticking exclusively to logics kind of limiting? There are so many more formal languages out there. What do their semantics look like? What kind of dualities between those shall emerge? Here, let’s stick to a very simple programming language, Hutton’s razor, defined by the following grammar.

$$L ::= \star \mathbb{N} | L \oplus L$$

Observe that specifying a grammar is akin to presenting a signature $\Sigma$ (in the sense of first order logic, and of universal algebra): we are stating that $\star$ represents an $\mathbb{N}$-indexed family of 0-ary operators, i.e. constants ($L$ does not appear next to it), and that $\oplus$ is a binary operator. We can think of $\Sigma$ as a “syntax” functor mapping each possible choice of a domain (in first order logic, or carrier in universal algebra) to the type of “ingredients” needed to construct a program using our grammar.

$$\Sigma(X) = \mathbb{N} \uplus X \times X$$

The above can be read as “You can construct programs by either specifying a natural number (and using $\star$) or by specifying two programs (and combining them with $\oplus$)”.

Denotational and Operational Semantics

A logician would probably conclude that the semantics of this language should be given by a function $[ \cdot] : L \to \mathbb{N}$ like the following.

$$ [x] = \begin{cases} n & \text{if $x = \star n$} \\ [t_1] + [t_2] & \text{if $x = [ t_1 \oplus t_2]$} \end{cases} $$

As this function is expressing what object each program is meant to denote, we call this kind of semantics denotational. Let’s spell out the mental process here: a domain, $\mathbb{N}$, was chosen, and our semantics have been specified as a recursively defined function from our grammar to said domain. Equivalently, this map $\Sigma(X) \to X$ is an algebra for the signature mentioned above (in the universal algebra sense, or term-level semantics for our first order theory): we fixed a carrier, $\mathbb{N}$, an $\mathbb{N}$-indexed family of constants, which are specified by the base case, and a binary operator, which is the natural numbers sum operator, as given by the inductive case. Visually, evaluating such a structurally recursive function (algebra) looks like folding the input in on itself until we obtain the result. So denotational semantics are algebras for the syntax functor. We use these algebras to know *how* to fold into a single value from our semantic domain. So, much like logics, programming languages also have algebraic semantics! The operation of folding according to our algebra is itself a computable function translating programs written in a high level language (the object language given by our syntax) into a low level language (the metalanguage). Computer scientists call such translators compilers.

Believe it or not, programming languages can be so tricky that coming up with this kind of nice compositional semantics is, to date, still quite difficult, and requires learning hard maths. To avoid doing so, computer scientists come up with a proof system instead, and lazily defer the semantics problem to derivability in such a system. This is the same approach as proof-theoretic semantics, which can save you from learning hard maths, as well. In the case of our programming language, the judgements of the proof system will be pairs whose left member will be a program $t$. On the right of the arrow, the second element will either be a natural number $n$ (indicating that $t$ cannot be simplified any further, and that its result is $n$), or a second program $u$ (meaning that $t$ reduces to $u$ after one step of computation). In each rule, we use $t$, $u$, $v$ as program variables, and $n$, $m$ as natural number variables.

                t ⇝ n   u ⇝ m
-------------  ----------------
    ⋆n ⇝ n     t ⊕ u ⇝ ⋆(n + m)

    t ⇝ u           t ⇝ u
-------------  ----------------
t ⊕ v ⇝ u ⊕ v   v ⊕ t ⇝ v ⊕ u

The binary relation $\rightsquigarrow \subseteq L \times (\mathbb{N} \uplus L)$ of all pairs derivable in the proof system describes what kind of operations a machine can perform to manipulate our program, so we refer to these semantics as operational. In fact, $\rightsquigarrow$ can be represented as the function $\rightsquigarrow[\cdot] : L \to \mathbb{N} \uplus \mathcal{P}_{\text{fin}} L$ mapping irreducible programs to their result, and reducible programs to the collection of possible immediately successive execution steps. To us,

$$b(X) = \mathbb{N} \uplus \mathcal{P}_{\text{fin}} (X)$$

is the “behaviour” functor of our programming language. That is, it describes how programs behave at runtime: they either return a result or reduce non-deterministically, but still with finite branching. We called functions of type $\Sigma X \to X$ for some $X$ “algebras” over syntax functor $\Sigma$. Hence, it is only natural to call all functions that, like $\rightsquigarrow[\cdot]$, have type $X \to b(X)$ for some $X$ coalgebras over behaviour functor $b$. Visually, any of these coalgebras tells us how to unfold an initial program into a trace of possible execution paths. Hence, operational semantics are coalgebras for the behaviour functor. We use them to know *how* to unfold a program repeatedly until (hopefully) we terminate and obtain the collection of all its possible output values. If we call $f$ the operation of repeatedly unfolding according to the $\rightsquigarrow[\cdot]$ coalgebra, we have, for example:

$$\begin{align*} &f((\star 3 \oplus \star 2) \oplus (\star 1 \oplus \star 0)) \\ &= \{f(\star 5 \oplus (\star1 \oplus \star0)), f((\star 3 \oplus \star 2) \oplus \star 1)\} \\ &= \{\{f(\star 5 \oplus \star 1)\}\} \\ &= \{\{\{f(\star 6)\}\}\} \\ &= \{\{\{6\}\}\} \\ \end{align*}$$

This is a trace, or execution tree, for our program (think of the syntax tree of $\{\{\{6\}\}\}$). Because of how $b$ is defined, its internal nodes are not labelled. In addition, duplicated branches are not allowed, so this trace is conflating the two possible reductions chains. At any rate, this time around, the carrier of our coalgebras is not semantical (the denotational domain), but rather syntactical: it is the language itself! Unfolding programs from such a language into their traces using the specified coalgebra is akin to a procedure simulating their execution step-by-step. Computer scientists call such programs interpreters. The duality between denotational and operational semantics is now apparent. A gentler exposition on this is given by (Hutton 2023).

Full Abstraction

Syntax, algebraic semantics, and coalgebraic semantics: our triptych is now complete. To relate $[\cdot]$ and $\rightsquigarrow[\cdot]$, we introduce a property known as full abstraction. A function defined using our language as domain is fully abstract if it maps two programs to the same object just in case such programs are unfolded to the same traces by our operational semantics. Ideally, we would want folding using our denotational semantics to be fully abstract. This would guarantee our denotational semantics are abstract enough to equate two programs with the same traces, but still informative enough to map programs with different traces to different objects. As a simple exercise, can you prove that full abstraction does not hold between $[\cdot]$ and $\rightsquigarrow[\cdot]$?

Distributivity Laws

Ever been worried of coming up with mismatching algebraic and topological semantics for your logic, and then wasting time trying to prove a representation theorem that does not hold? With programming languages, distributivity laws (Paviotti and Wu 2023) can help us mitigate the analogous problem for denotational and operational semantics, while providing a satisfying bird’s-eye view on our duality. These laws describe how our syntax functor $\Sigma$ distributes over the behaviour functor $b$. They are parametric in an arbitrary type $X$, and take the form of a family of functions

$$(\lambda_X : \Sigma (b X) \to b (\Sigma X))_{X : \text{Type}}$$

specifying how our syntax interacts with our behaviour. In particular, by defining how $\Sigma$ can be “pushed past” $b$, these functions are witnessing how behaviour behaves compositionally with reference to our syntax: behaviour for a program can be obtained composing the ones of the immediate subprograms. Note that our language $L$, being inductively defined using $\Sigma$, is the smallest object closed under it, i.e. the least fixpoint $\mu \Sigma$. So what $b$-coalgebra of type $\mu \Sigma \to b(\mu \Sigma)$ shall act as our operational semantics? Because the domain of the desired function is our language, the result is, unsurprisingly, a fold. Specifically, the fold using the following $\Sigma$-algebra.

$$b(\text{In}) \circ \lambda_{\mu \Sigma} : \Sigma (b (\mu \Sigma)) \to b (\mu \Sigma)$$

What is going on here? Using a fold over a $\Sigma$-algebra means we are trying to define our operational semantics recursively on the syntax of the program. The recursive clauses of this definition will be defined componing two operations. First, we use distributivity to compose behaviours for the immediate subterms into a single behaviour for a “standalone” object storing such subterms. Now, $\text{In} : \Sigma (\mu \Sigma) \to \mu \Sigma$ is a constructor for our inductively defined language. Under normal conditions, it would take the standalone object mentioned above as its input and produce a program. Our second and last step is lifting $\text{In}$ using the action on maps of functor $b$ so we can have this construction happen at the behaviour level. The algebra in now complete. Isn’t it funny how operational semantics are meant to be consumed by an unfolding, and we are defining ours via folding? Anyway, time to operate the dual construction. The type of traces, being defined coinductively using $b$, is the biggest object closed under it, i.e. the greatest fixpoint $\nu b$. Our denotational semantics are the unfold using the following $b$-coalgebra, where $\text{Out}: \nu b \to b(\nu b)$ is a destructor observing the very beginning of our trace.

$$\lambda_{\nu b} \circ \Sigma(\text{Out}) : \Sigma (\nu b) \to b (\Sigma (\nu b))$$

So coming up with a distributivity law is always enough to get both semantics for free! Of course, folding using these denotational semantics (or any other algebra, really) is compositional (since all folds are compositional by construction. Similarly, unfolding using these operational semantics is fully abstract with reference to them (trivially, by definition of full abstraction). The punchline here is that it can be shown that folding a program using these denotational semantics and unfolding it using these operational semantics are actually the same function, that deserves the name of universal semantics. So folding using the denotational semantics, a.k.a. our universal semantics, is always fully abstract, while still being compositional!

Bibliography

Hutton, Graham. 2023. “Programming Language Semantics: It’s Easy as 1,2,3.” Journal of Functional Programming 33. https://doi.org/10.1017/s0956796823000072 .

Paviotti, Marco, and Nicolas Wu. 2023. “Full Abstraction for Free.” CoRR abs/2303.09358. https://doi.org/10.48550/arXiv.2303.09358 .

Previous page Ingredients and Logic - Is putting pineapple on pizza rational?

Next page Mathematicians, list your ingredients!