Dopamine reports reward prediction errors, but does not update policy, during inference-guided choice
Blanco-Pozo M., Akam T., Walton M.
Dopamine is thought to carry reward prediction errors (RPEs), which update values and hence modify future behaviour. However, updating values is not always the most efficient way of adapting to change. If previously encountered situations will be revisited in future, inferring that the state of the world has changed allows prior experience to be reused when situations are reencountered. To probe dopamine’s involvement in such inference-based behavioural flexibility, we measured and manipulated dopamine while mice solved a sequential decision task using state inference. Dopamine was strongly influenced by the value of states and actions, consistent with RPE signalling, using value information that respected task structure. However, though dopamine responded strongly to rewards, stimulating dopamine at the time of trial outcome had no effect on subsequent choice. Therefore, when inference guides choice, rewards have a dopamine-independent influence on policy through the information they carry about the world’s state.