Some more issues with bayesianism

Max · April 4, 2026, 3:28am

I came up with some more criticisms of Bayesianism:

B-ism requires Evidence Individuation for every update, which it offers no guidance on; the criticism relates to theory laden values.
A Hypothesis space scope problem (note: distinct from catch-all problem) which concerns what the hypothesis space is about and what hypothesis spaces you have. The hypotheses spaces themselves are dependent on prior theories/hypotheses. This is similarly a CR inspired criticism. It also makes the complexity of bayesianism much higher. It interacts with EI too.
- One solution might be to have just one hypothesis space over everything and the hypotheses in there are sets/combinations of hypotheses from other more particular spaces (example: H_k: general relativity AND genomic theory AND not astrology AND lizardmen control the US government AND koalamen control the Aussie government AND (lizardmen and koalamen get along on easter iff lunch is provided) AND …).
- Naturally, this seems like a bad answer and most bayesians probably wouldn’t like it.

(Note: I’m only going into detail about the first in this post; I’ll expand on the second one later when I have more time)

Also, offtopic, I think there’s still lots of philosophy left to do in the world (things to discover, new thoughts to have), even about topics that might seem boring and worn out. Some of my thoughts here might be novel, and others (or similar enough ideas) I found in the literature with 202X publication dates. That’s significant (to me) because they happened after the last time I thought about bayesianism (like 2020), so if I’d had some of these thoughts earlier, then they’d be novel enough to publish apparently (not that I would have done the work to publish them but yeah).

I don’t think the stuff I thought of here was that hard to conceive, so it’s reasonable that there’s still more lowish hanging fruit. (Or maybe I’m unaware of prior art from earlier philosophers (like Popper) because they used different phrasing or something that makes it harder to search for)

Also also offtopic, I got AI to do some expansion of these ideas and some mathematical proofs. One thing to come out of that is a trilemma of existing problems (the trilemma appears novel). Repo is here. One thing that occurs to me is that I didn’t do a good job of isolating my inputs (both data and prompts) or saving raw copies, so it’s hard to tell now what I came up with vs what was deduced. I steered the AI plenty while planning and doing the research, but it also just found and connected things on its own. I ran the stuff I generated through some adversarial chatgpt loops and even when I framed it with social bias and in a tone that I expected it to be slop, it still said that there were some good new ideas in there.

Evidence Individuation (EI)

There is prior work around double counting evidence, but maybe not from this angle.

Foundational claims: evidence/data is theory laden, and interpreting data requires an explanatory theory or equivalent (these are often called ‘models’ in bayesianism).

Claim: Bayesianism requires and depends on explanations before any update can be performed. This is because, before evidence, data, or an event can be used for a bayesian update, it needs to be individuated (like epistemically deduplicated). Even simply counting events might be impossible (in B-ism) without an extension to B-ism that explains how to count, and what an event is, etc.

Additional: learning about a hypothesis can force you to update any/all hypothesis spaces (and thus priors) for any/all things you believe. This is because it can arbitrarily recontextualize events and data.

Additional: disagreements about EI can lead to divergence between bayesians even when observing the same events (not just different rates of convergence). This means that the bayesianist claim to convergence depends on an unknown and unspecified model for EI.

Example

Say you were researching supernova or some bright astronomical event, and you observe rare double/triple/quadruple (super)novas or similar. Every time you see a group like this, the timing between them is different, and they’re in a different spot in the sky.

How should a B-ist use these events to do bayesian updates? Are they individual data points? What about when you have a hypothesis that it’s some interaction (like a chain reaction) between binary/ternary star systems? And what about when you add the hypothesis that it’s due to gravitational lensing? I think in these kinds of cases, it’s not trivial or obvious how to calculate the update.

Without the right hypothesis/explanation, a B-ist might treat them as independent data points, and part of the problem here is that this artificially increases their confidence/posterior and you need external (to B-ism) error checking to detect it.

Moreover, if you accidentally double counted some evidence for this or other matters (eg in testing predictions of how frequently supernovas should occur based on nuclear theories about star internals), then learning about a new hypothesis in one area can retroactively affect other topics in unpredictable ways. You might learn that, if a named hypothesis H_i is true, then you’ve been double-counting for decades. Those other topics/hypothesis spaces got contaminated by the incorrect prior understanding of what was being observed. Updates must therefore be non-local at least some of the time, and therefore the output of every update is technically an input to all other updates.

Possible Solutions

One way I can think for B-ism to handle this is to partition priors conditional on hypotheses.^[1] so you get multiple sets of priors where each set assumes some true hypothesis for the purpose of event individuation and then you hope the sets eventually converge (intuition: it’s possible to mathematically prove that ‘it always converges’ is false). This might be a lot of computation (compared to normal b-ism) but it seems feasible for finite hypothesis spaces (eg isolated coin-flip type stuff).

However, this formulation of B-ism completely breaks down with open hypothesis spaces (like science) because you need catch-all hypotheses (though there are attempts like solomonoff induction to try and address this). Catch-alls are a problem because they are opaque (and not explanatory) and thus cannot tell you how to individuate events and thus you can’t partition over catch-alls. Catch-alls encapsulate what you don’t know, and EI requires an explanation of the structure and meaning behind the data. If you could expand out catch-alls into actual theories then the computational requirement would explode. That said, figuring out how to do the expansion would be an achievement in and of itself – it’d require like iterating over all possible ideas, e.g., as computer programs or something.

Another way B-ism could handle EI is to formulate the hypothesis space so that any and all data observed is an event. Don’t worry about individuation at all (since all data is atomic and is a unique event), and just feed in data at the max bandwidth. This seems kind of like what LLMs do for training. This maybe kinda works, but the tradeoff is that you lose all explanatory power. Using the B-ists ‘ideal bayesian reasoner’ fallback, the best the ideal b-ist reasoner can be with this formulation is a kind of black-box oracle.

The problem for B-ism is that it doesn’t do either of these, and, practically speaking, people don’t do the necessary steps (often they seem unaware of the problem) and probably don’t want to. People like working at the level of explanations and conceptualized events rather than raw data. Both the solutions, at the very least, move the process used by ‘ideal bayesian reasoner’ further away from what people do and what is practical.

Also posted this to my blog: https://xk.io/n/10031
(I’m trying to generate some some incoming links so google doesn’t forget about me)

related prior works (AI generated list but should be very accurate)

Fitelson (2001), “A Bayesian Account of Independent Evidence with Applications,” Philosophy of Science 68: S123-S140. DOI — Defines confirmational independence as hypothesis-relative via screening-off. The formal foundation for why independence isn’t a property of the evidence itself.
Novack (2007), “Does Evidential Variety Depend on How the Evidence Is Described?”, Philosophy of Science 74(5): 701-711. Cambridge — Shows Bayesian diversity measures track description artifacts, not genuine variety.
Jones (2018), “Critical Epistemology for Analysis of Competing Hypotheses,” Intelligence and National Security 33(2): 273-289. DOI — Demonstrates that splitting or combining evidence items changes hypothesis rankings. Explicitly calls for “structured methods for individuating items of evidence.”
Moretti & Akiba (2007), “Probabilistic Measures of Coherence and the Problem of Belief Individuation,” Synthese 154(1): 73-95. DOI — The belief-side analogue: all major coherence measures are sensitive to how beliefs are individuated.
Wheeler & Scheines (2013), “Coherence and Confirmation through Causation,” Mind 122(485): 135-170. — Same coherence among evidence can boost, do nothing, or reduce confirmation depending on causal structure.
Gelman & Shalizi (2013), “Philosophy and the Practice of Bayesian Statistics,” BJMSP 66(1): 8-38. DOI — The likelihood function embeds independence assumptions that can’t be justified by Bayesian updating. Model checking is a frequentist/falsificationist procedure, not a Bayesian one.
Longino (1979), “Evidence and Hypothesis,” Philosophy of Science 46(1): 35-56. — The same data constitutes different evidence under different theoretical backgrounds. Classic theory-ladenness paper.
Hurlbert (1984), “Pseudoreplication and the Design of Ecological Field Experiments,” Ecological Monographs 54(2): 187-211. — Defined pseudoreplication; whether measurements are independent depends on theoretical understanding of the system.
Lazic (2010), “The Problem of Pseudoreplication in Neuroscientific Studies,” BMC Neuroscience 11: 5. DOI — Quantified the damage: within-group correlation of 0.30 inflates Type I error from 5% to 37%.
Shimony (1970), “Scientific Inference,” in The Nature and Function of Scientific Theories, pp. 79-172. — Introduced the catch-all hypothesis and the “tempering condition.” The original statement of the problem you’re calling hypothesis space individuation.
Stanford (2006), Exceeding Our Grasp: Science, History, and the Problem of Unconceived Alternatives, Oxford UP. — Historical induction that scientists routinely fail to conceive of successors to their best theories. The empirical case for why hypothesis spaces are always incomplete.
Karni & Vierø (2013), “Reverse Bayesianism: A Choice-Based Theory of Growing Awareness,” American Economic Review 103(7): 2790-2810. AER — Axiomatized belief revision when the state space expands. Proposes proportional redistribution but notes it doesn’t determine probabilities for new events.
Broessel & Huber (2015), “Bayesian Confirmation: A Means With No End,” BJPS 66(4): 737-763. — Argues Bayesian confirmation presupposes what it aims to establish – the full probability measure must be specified before confirmation can proceed.
Stegenga & Menon (2017), “Robustness and Independent Evidence,” Philosophy of Science 84(3): 414-435. DOI — Conditional probabilistic independence is hypothesis-relative; ontic independence from different instruments/methods is not sufficient for robustness.
Senn (2009), “Overstating the Evidence – Double Counting in Meta-Analysis,” BMC Medical Research Methodology 9: 10. — Identifies species of evidence double-counting in practice.
Earman (1992), Bayes or Bust?, MIT Press. — Chapter 8 confronts theory-ladenness with Bayesian confirmation theory. Acknowledges that paradigm shifts create gaps in the Bayesian formalism.

I need to look into hierarchical Bayesian models which apparently do this sort of thing but then give you one number (a nice smooshy average) out. I’m not sure that works for this though, because the events are different and might be native to different hypothesis spaces. ↩︎

Elliot · April 4, 2026, 4:03am

Normally I’d say you should seek out more clarity on what their position is before critiquing it much. But I don’t know how. There’s no one to give official or canonical answers to questions. And I’ve read https://www.readthesequences.com (if you haven’t, you can get some info there – even just sticking it in AI and having the AI search it for relevant parts) as well as https://hpmor.com and https://equilibriabook.com I’ve tried to look for some textbook and academic paper type stuff but I’ve had a hard time finding good ones and it’s especially hard to find things that address Popperian type questions about their premises and omissions.

yes

I wrote about something similar recently. For MCDM, you can bias results by using factors that overlap (not fully independent/deduplicated). But judging overlap and getting a complete, perpendicular set of factors (or using math to factor out repetition) is a really hard problem. It’s related to the problem of judging similarity between things which comes up in induction. Whereas with CF it’s a non-issue: you can have tons of duplication/redundancy in your binary pass/fail factors without affecting the conclusion.

Max · April 4, 2026, 4:11am

Yeah. One thing I ran in to a lot (particularly with more complex arguments or anything that does math and applies it broadly/generally) was the ‘well modern bayesians don’t really think this’ kind of excuse. Which okay, fair enough I’m not going to convince them on that point, maybe, but it seems like ‘modern bayesianism’ is ill defined and there’s equivocation whenever you try to pin down what it is (some b-ists to X and some do Y and this camp argues you need these models and bayesians don’t argue that bayesianism is complete anymore (even though the lie that it’s ‘the logic of science’ keeps getting repeated).

Elliot · April 4, 2026, 4:13am