← Blog

REBUS as a Machine Learning Method

2026-06-11

rebus
reinforcement-learning
belief-updating

The REBUS model (Carhart-Harris & Friston, 2019) describes how psychedelics relax the precision-weighting of high-level priors, allowing prediction errors to propagate upward and update beliefs faster. It is a theory of belief updating in the brain under uncertainty.

It is also a description of a machine learning parameter that can be formalised, implemented, and tested.

The IRIS framework uses a GRU-based recurrent belief-state estimator to maintain a distribution over partner types. In the standard configuration, belief update speed is fixed — the learning rate parameter does not change regardless of whether the agent's predictions are accurate or wildly wrong. This is computationally efficient but biologically implausible and operationally limiting: an agent that updates its beliefs at the same speed whether it is colliding every episode or sailing through smoothly cannot adapt to regime changes.

REBUS suggests a different approach. When prediction errors spike — collisions, unexpected partner switches, persistent deadlock — the precision of existing priors should relax, and the effective learning rate should increase. The agent should update its beliefs faster when it discovers its model of the world is wrong.

This is implementable as a dynamic learning rate parameter on the belief logits:

\alpha_t = \alpha_{base} + k \cdot \sigma(|\text{collision}_t - \text{expected}|)

The prediction error signal — already computed in the training loop as the cross-entropy between the belief state and the true partner type — modulates the confidence with which beliefs are updated. High surprise means high update rate. Low surprise means stable priors.

Importantly, the mechanism is general. It does not depend on the presence of any pharmacological agent. It is a formal description of how a system should update its beliefs under uncertainty — applicable whether the system is a biological brain, a POMDP policy, or a multi-agent orchestrator. The psychedelic science provides the mechanistic hypothesis and the empirical parameters (Kanen et al., 2023, measured LSD's effect on prediction-error sensitivity in a computational RL model). The IRIS framework provides the testbed.

The experimental design compares three conditions: a standard fixed-rate belief update, a REBUS-informed agent with precision-modulated belief update, and a combined condition in which all agents in the environment follow REBUS principles. The question is not whether REBUS works in the brain — it does. The question is whether a REBUS parameterisation of a machine learning belief update produces measurably better coordination outcomes than a fixed-rate alternative.