Reluctant Reinforcement Learning

Chris Jones, Malcolm Crowe

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


This paper presents an approach to Reinforcement Learning that seems to work very well in changing environments. The experiments are based on an unmanned vehicle problem where the vehicle is equipped with navigation cameras and uses a multilayer perceptron (MLP). The route can change and obstacles can be added without warning. In the steady state, no learning takes place, but the system maintains a small cache of recent inputs and rewards. When a negative reward occurs, learning restarts, based not on the immediate situation but on the memory that has generated the greatest error, and the updated strategy is quickly reviewed using the cache of recent memories within an accelerated learning phase. In the resulting Reluctant Learning algorithm the multiple use of a small quantity of previous experiences to validate updates to the strategy moves the MLP towards convergence and finds a balance between exploration of improvements to strategy and exploitation of previous learning.
Original languageEnglish
Title of host publicationResearch and Development in Intelligent Systems XXXI
Subtitle of host publicationIncorporating Applications and Innovations in Intelligent Systems XXII
EditorsMax Bramer, Miltos Petridis
Place of PublicationCham
PublisherSpringer International Publishing AG
Number of pages15
ISBN (Electronic)978-3-319-12069-0
ISBN (Print)978-3-319-12068-3
Publication statusPublished - 2014


Dive into the research topics of 'Reluctant Reinforcement Learning'. Together they form a unique fingerprint.

Cite this