Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

© 2017 IEEE. An important issue in reinforcement learning systems for autonomous agents is whether it makes sense to have separate systems for predicting rewards and punishments. In robotics, learning and control are typically achieved by a single controller, with punishments coded as negative rewards. However in biological systems, some evidence suggests that the brain has a separate system for punishment. Although this may in part be due to biological constraints of implementing negative quantities, it raises the question as to whether there is any computational rationale for keeping reward and punishment prediction operationally distinct. Here we outline a basic argument supporting this idea, based on the proposition that learning best-case predictions (as in Q-learning) does not always achieve the safest behaviour. We introduce a modified RL scheme involving a new algorithm which we call 'MaxPain' - which back-ups worst-case predictions in parallel, and then scales the two predictions in a multiattribute RL policy. i.e. independently learning 'what to do' as well as 'what not to do' and then combining this information. We show how this scheme can improve performance in benchmark RL environments, including a grid-world experiment and delayed version of the mountain car experiment. In particular, we demonstrate how early exploration and learning are substantially improved, leading to much 'safer' behaviour. In conclusion, the results illustrate the importance of independent punishment prediction in RL, and provide a testable framework for better understanding punishment (such as pain) and avoidance in humans, in both health and disease.

Original publication

DOI

10.1109/DEVLRN.2017.8329799

Type

Conference paper

Publication Date

02/04/2018

Volume

2018-January

Pages

140 - 147