Reluctant Reinforcement Learning

Chris Jones, Malcolm Crowe

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    Abstract

    This paper presents an approach to Reinforcement Learning that seems to work very well in changing environments. The experiments are based on an unmanned vehicle problem where the vehicle is equipped with navigation cameras and uses a multilayer perceptron (MLP). The route can change and obstacles can be added without warning. In the steady state, no learning takes place, but the system maintains a small cache of recent inputs and rewards. When a negative reward occurs, learning restarts, based not on the immediate situation but on the memory that has generated the greatest error, and the updated strategy is quickly reviewed using the cache of recent memories within an accelerated learning phase. In the resulting Reluctant Learning algorithm the multiple use of a small quantity of previous experiences to validate updates to the strategy moves the MLP towards convergence and finds a balance between exploration of improvements to strategy and exploitation of previous learning.
    Original languageEnglish
    Title of host publicationResearch and Development in Intelligent Systems XXXI
    Subtitle of host publicationIncorporating Applications and Innovations in Intelligent Systems XXII
    EditorsMax Bramer, Miltos Petridis
    Place of PublicationCham
    PublisherSpringer International Publishing AG
    Pages85-99
    Number of pages15
    ISBN (Electronic)978-3-319-12069-0
    ISBN (Print)978-3-319-12068-3
    DOIs
    Publication statusPublished - 2014

    Fingerprint

    Dive into the research topics of 'Reluctant Reinforcement Learning'. Together they form a unique fingerprint.

    Cite this