Ingredients and Logic - Is putting pineapple on pizza rational?

by Giuseppe Manes

Monday, Aug 4, 2025 | 8 minute read
part of 2025 ii | #Philosophy #Computation

Ingredients and Logic - Is putting pineapple on pizza rational?

A MoL dinner

After a week of intense study, four MoL students - $E$, $M$, $S$, and $J$ - met at a renowned Amsterdam pizzeria. The restaurant offered a special deal:

Any group of students ordering pizza from the menu (excluding pineapple) would get 10% off and free dessert.

The chef, $B$, considered pineapple to be the worst topping and explained that it is a very bad combination in terms of flavors, and also does not correspond to traditional standards. So, he deliberately excluded it from the deal in order to convince people not to order it. The menu is divided in two sections Traditional and Non-Traditional. Apart from the Pineapple Pizza, all the other ones in the menu are valid for the deal, therefore the students can choose freely. Furthermore, we notice that the number of Traditional and Non-Traditional pizzas is the same, excluding the pineapple one.

All the students being Logicians, they think of themselves as rational agents, in the Bayesian sense, so each of them has a coherent initial distribution, i.e. their credence distribution over their belief is a probabilistic distribution. In particular, we have that given the set of toppings they assign their preference corresponding to the belief in sentences of the form

I like the pizza (with) $A$

respecting Kolmogorov’s Axioms.

We call the sentences of this type $\varphi_{\text{A}}$, where $A$ is the flavor of the pizza. In order to study the distribution we will use the credence function $\mathsf{cr}: \varphi_{\text{A}}\to[0,1]$. The distributions will be the following:

$E$ is a food anarchist and loves pizza with strange flavors like Natto¹ (that unfortunately is not on the menu), and would prefer to eat a pizza with no conventional topping instead of the “boring” traditional ones. Therefore, she will have a distribution in the sentences of the type $\varphi$ as follows, where $A$ is the section of Traditional Pizzas in the menu and $B$ is the section of Non-Traditional Pizzas:
- $\mathsf{cr}(\varphi_{\text{Natto}})=0.5$
- $\mathsf{cr}(\varphi_{b\in\text{B}})=0.3/|\text{B}|$
- $\mathsf{cr}(\varphi_{a \in \text{A}})=0.2/|\text{A}|$
$M$ loves Sauerkraut pizza (not on the menu) since it was a pizza that his mom always cooked for him when he was a child. He also loves Italy and traditional cuisine so he prefers traditional ingredients in his pizzas. Therefore, he will have a distribution in the sentences of the type $\varphi$ as follows:
- $\mathsf{cr}(\varphi_{\text{Sauerkraut}})=0.4$
- $\mathsf{cr}(\varphi_{a\in\text{A}})=0.4/|\text{A}|$
- $\mathsf{cr}(\varphi_{b\in\text{B}})=0.1/|\text{B}|$
$S$ is an open-minded guy that loves traditional food but also likes experimenting, and he has a slight preference for traditional tastes to which he is used to but doesn’t mind in principle to try new food. Therefore, he will have a distribution in the sentences of the type $\varphi$ as follows:
- $\mathsf{cr}(\varphi_{a\in\text{A}})=0.6/|\text{A}|$
- $\mathsf{cr}(\varphi_{b\in\text{B}})=0.4/|\text{B}|$
$J$ is an elegant and sophisticated person, and she loves flavors combinations in this pizzeria, of which she is a common client. The only pizzas that she will not eat are the pizzas with meat since she is a vegetarian. Therefore, she will have a distribution in the sentences of the type $\varphi$ as follows:
- $\mathsf{cr}(\varphi_\text{NoMeat})=1/|\text{NoMeat}|$

Bayesian Principles and Irrationality Measures

According to the Bayesian framework we are using, we have that an agent should accord their credence to the credence of an expert. This law is called the Expert Principle (Elga 2007), which is formally expressed by the following conditional probability

$$\mathsf{cr}(\varphi\mid\mathsf{cr}_E(\varphi)=x)=x,$$

where $\mathsf{cr}_E$ is the credence distribution of the expert. This principle is one of the many rationality constraints possible in the Bayesian framework.

Now, let’s return to our group of students. We’ll now present an example to show how these kinds of principles can be applied within the theory.

Let our Logicians update their credence distribution according to the credence of our expert, namely the chef $B$. All of them will align their credence on the proposition $\varphi_{\text{Pineapple}}$ with the credence of the chef in it, that is $0$, leaving the other propositions in the same way. This updating lead our Logicians to behave as irrational agents: by the Bayesian Probabilistic framework (i.e. the Bayesian epistemological account with the least amount of principles that an agent should respect to be rational, namely only Kolmogorov’s Axioms) the agent always needs to have normalized credences distributions. In our example this means that each agent needs to have credences such that $\mathsf{cr}(\varphi_{\text{Margherita}})+\mathsf{cr}(\varphi_{\text{Figs}})+\mathsf{cr}(\varphi_{\text{Pineapple}})=1$ and all the credences have to be more non-negative real values (just by a simple application of Kolmogorov Axioms). If our agents don’t normalize their credence in the update process, we have that they cannot be coherent and, therefore irrational.

Now, the question that we raise and that we are interested to investigate is the following:

Which of our agents is the least irrational?

This is equivalent to asking which of our agents is closer to a coherent distribution.

In order to study this matter, we consider the theory exposed in (Staffel 2019) where the general aim is to construct a theoretical framework that can compute the distance of an irrational agent to the closest coherent position. We will not consider the whole theory, but only a naïve version of it, where we take as our distance measure the Absolute Distance between a point of a rational distribution and a point of an irrational distribution, which is also argued in (Staffel 2019) to be the best one for this purpose.

More formally, each agent has a set of credences $\mathcal{B}=\{\varphi_1, \dots, \varphi_n\}$² for some $n\in\mathbb{N}$, and we can represent this set with a point in a $n$-dimensional Cartesian Space. Then, we will uniquely assign each agent to a point and, based on principles of probability, we can construct the region of our space that corresponds to a rational distribution. Then using the Absolute Distance we can calculate the minimum distance between the points representing the irrational credences distributions of the agents. We can also optimize the region selected using Scores, like Brier’s or the Absolute one (Titelbaum 2022), which are distance measures between the credences and the possible worlds (in order to do this we also need some minimization in the process).

We can see an example as follows:

Consider our set $A=\{\varphi_{\text{Margherita}}\}$ and $B=\{\varphi_{\text{Pineapple}}, \varphi_{\text{Figs}}\}$. Accordingly we have that our distributions will be the following after the update

$E$ will have a credence of $0.2$ in $\varphi_{\text{Margherita}}$ and $0.15$ in $\varphi_{\text{Figs}}$ (Pizza Natto is not in the menu so we will not consider it)
$M$ will have a credence of $0.4$ in $\varphi_{\text{Margherita}}$ and $0.05$ in $\varphi_{\text{Figs}}$ (Pizza Sauerkraut is not in the menu so we will not consider it)
$S$ will have a credence of $0.6$ in $\varphi_{\text{Margherita}}$ and $0.2$ in $\varphi_{\text{Figs}}$
$J$ will have a credence of $1/3$ both in $\varphi_{\text{Margherita}}$ and $\varphi_{\text{Figs}}$

Applying our further constraint we have that $\mathsf{cr}(\varphi_{\text{Pineapple}})=0$. Since all our agents have a $0$ credence on $\varphi_{\text{Pineapple}}$, as they update according to the Expert Principle, we can consider a plane to represent the measures.

Cartesian plane. The X-axis reads “Phi Margherita”. The Y-axis reads
“Phi Figs”. The data points labels are E, J, B, S, M. A grey square is
in the first quadrant of the plane. Two of its sides belong to the
cartesian axis. B is the only data point on the red diagonal w2-w1 from
the top left to the bottom right of the grey square.

It is easy to see that any rational distribution (with all the constraints involved) has to be a point in the line $y=-x+1$ for $x\in[0,1]$. Then minimizing the Absolute Score (which we choose in order to use a same distance measure) we obtain that the best point is the uniform distribution between the two propositions, i.e. the point $B=(0.5,0.5)$ in our plane. Then doing some calculation we can easily see that the closest one to minimize his credence is $J$. Also, $S$ is really close to get a coherent distribution, if we don’t consider the Score, but it wouldn’t be the optimal distribution given all the possible worlds (in our case 2 since we must choose a pizza and only one pizza, so one of them has to be true).

Computational cost of the Bayesian Framework

This framework is really nice, but there is an issue given by the computational cost of Bayesian epistemology that grows with the number of principles that we consider, and the number of propositions in the input (thus the number of possible worlds). Also, considering the simple case presented in this article, one can easily imagine that computing the most rational agent, when both the number of students and the number of pizzas increase, becomes demanding from a computational point of view. In fact, a lot of different algorithms have been presented using different types of measures that grow exponentially with the input size. So our last question is: Can a real agent realistically use this theory to guide their reasoning in daily situations?

In (Kwisthout, Wareham, and Van Rooij 2011), this computational cost has been studied, and furthermore has been evidenced that a lot of problems in the field can be proven to be $\mathsf{NP}$-complete problems. A problem of this kind is the Most Probable Explanation $(\textsf{MPE})$. Here, given a set of hypotheses and a set of observed pieces of evidence, along with a probabilistic model that describes how these are related (such as a Bayesian network), the task is to determine which truth assignment to the hypotheses is most likely to be correct, given the observed evidence. In other words, we are looking for the overall combination of truth values for the hypotheses that has the highest conditional probability according to the model.

Algorithms of the kind that we presented here, and more generally in all Bayesian Epistemology, can be proven to be $\mathsf{NP}$-complete as well. This kind of task can quickly become too demanding for a real agent, because the computational complexity of the underlying algorithm grows exponentially with the amount of information involved. In other words, as the number of hypotheses and observations increases, finding the most probable assignment becomes increasingly hard from a computational point of view.

Therefore, while this theory offers a comprehensive framework for assessing both rationality and irrationality, its computational demands make it impractical for use by real agents.

Bibliography

Elga, Adam. 2007. “Reflection and Disagreement.” Noûs 41 (3): 478–502. https://doi.org/10.1111/j.1468-0068.2007.00656.x .

Kwisthout, Johan, Todd Wareham, and Iris Van Rooij. 2011. “Bayesian Intractability Is Not an Ailment That Approximation Can Cure.” Cogn. Sci. 35 (5): 779–84.

Staffel, Julia. 2019. Unsettled Thoughts: A Theory of Degrees of Rationality. Oxford University Press.

Titelbaum, Michael G. 2022. Fundamentals of Bayesian Epistemology 1: Introducing Credences. Oxford University Press.

Natto is a traditional Japanese food made from soybeans that have been fermented. ↩︎
We are assuming that this set is finite. ↩︎

Previous page Engineering Sexual Orientation

Next page Mathematical Structures in Programming Languages