Abstract
This paper presents an approach to Reinforcement Learning that seems to work very well in changing environments. The experiments are based on an unmanned vehicle problem where the vehicle is equipped with navigation cameras and uses a multilayer perceptron (MLP). The route can change and obstacles can be added without warning. In the steady state, no learning takes place, but the system maintains a small cache of recent inputs and rewards. When a negative reward occurs, learning restarts, based not on the immediate situation but on the memory that has generated the greatest error, and the updated strategy is quickly reviewed using the cache of recent memories within an accelerated learning phase. In the resulting Reluctant Learning algorithm the multiple use of a small quantity of previous experiences to validate updates to the strategy moves the MLP towards convergence and finds a balance between exploration of improvements to strategy and exploitation of previous learning.
Original language | English |
---|---|
Title of host publication | Research and Development in Intelligent Systems XXXI |
Subtitle of host publication | Incorporating Applications and Innovations in Intelligent Systems XXII |
Editors | Max Bramer, Miltos Petridis |
Place of Publication | Cham |
Publisher | Springer International Publishing AG |
Pages | 85-99 |
Number of pages | 15 |
ISBN (Electronic) | 978-3-319-12069-0 |
ISBN (Print) | 978-3-319-12068-3 |
DOIs | |
Publication status | Published - 2014 |