Gradient Monitored Reinforcement Learning

Hameed, Mohammed Sharafath Abdul; Chadha, Gavneet Singh; Schwung, Andreas; Ding, Steven X.

doi:10.1109/TNNLS.2021.3119853

Hameed, Mohammed Sharafath Abdul; Chadha, Gavneet Singh; Schwung, Andreas; Ding, Steven X.:

In: IEEE Transactions on Neural Networks and Learning Systems, Jg. 34 (2023), Heft 8, S. 4106 - 4119

2023Artikel/Aufsatz in ZeitschriftOA Grün

ElektrotechnikFakultät für Ingenieurwissenschaften » Elektrotechnik und Informationstechnik » Automatisierungstechnik und komplexe Systeme

Damit verbunden: 1 Publikation(en)

Titel in Englisch:

Gradient Monitored Reinforcement Learning

Autor*in:

Hameed, Mohammed Sharafath Abdul;Chadha, Gavneet Singh;Schwung, Andreas;Ding, Steven X.^UDE

Erscheinungsjahr:

2023

Open Access?:

OA Grün

arXiv.org ID

2005.12108v1

DOI

10.1109/TNNLS.2021.3119853

IEEE ID

9585302

Web of Science ID

000732425800001

PubMed ID

34695008

Scopus ID

85118596548

Sprache des Textes:

Englisch

Erschienen in

IEEE Transactions on Neural Networks and Learning Systems

Titel in Englisch (abgekürzt):

IEEE Trans Neural Networks Learn Sys

Erscheinungsort:

New York

Verlag:

IEEE

in:

Jg. 34 (2023), Heft 8, S. 4106 - 4119

ISSN

ISSN

ZDB ID

Signatur der UB:

Schlagwort, Thema:

Atari games ; deep neural networks (DNNs) ; Games ; gradient monitoring (GM) ; Monitoring ; MuJoCo ; multirobot coordination ; Neural networks ; OpenAI GYM ; Optimization ; Reinforcement learning ; reinforcement learning (RL). ; Task analysis ; Training

Abstract in Englisch:

This article presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning (RL). Particularly, we focus on the enhancement of training and evaluation performance in RL algorithms by systematically reducing gradient's variance and, thereby, providing a more targeted learning process. The proposed method, which we term gradient monitoring (GM), is a method to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself. We propose different variants of the GM method that we prove to increase the underlying performance of the model. One of the proposed variants, momentum with GM (M-WGM), allows for a continuous adjustment of the quantum of backpropagated gradients in the network based on certain learning parameters. We further enhance the method with the adaptive M-WGM (AM-WGM) method, which allows for automatic adjustment between focused learning of certain weights versus more dispersed learning depending on the feedback from the rewards collected. As a by-product, it also allows for automatic derivation of the required deep network sizes during training as the method automatically freezes trained weights. The method is applied to two discrete (real-world multirobot coordination problems and Atari games) and one continuous control task (MuJoCo) using advantage actor-critic (A2C) and proximal policy optimization (PPO), respectively. The results obtained particularly underline the applicability and performance improvements of the methods in terms of generalization capability.

Universitätsbibliographie

Publikationsverzeichnis der Universität Duisburg-Essen

Hameed, Mohammed Sharafath Abdul: Gradient Monitored Reinforcement Learning

Abstract in Englisch: