TITLE:
Offline Robustness of Distributional Actor-Critic Ensemble Reinforcement Learning
AUTHORS:
Zhongcui Ma, Dandan Lai, Jianxiang Zhu, Yaxin Peng
KEYWORDS:
Offline Reinforcement Learning, Distributional Reinforcement Learning, Robustness
JOURNAL NAME:
Advances in Pure Mathematics,
Vol.15 No.4,
April
18,
2025
ABSTRACT: Offline reinforcement learning (RL) focuses on learning policies using static datasets without further exploration. With the introduction of distributional reinforcement learning into offline RL, current methods excel at quantifying the risk and ensuring the security of learned policies. However, these algorithms cannot effectively balance the distribution shift and robustness, and even a minor perturbation in observations can significantly impair policy performance. In this paper, we propose the algorithm of Offline Robustness of Distributional actor-critic Ensemble Reinforcement Learning (ORDER) to improve the robustness of policies. In ORDER, we introduce two approaches to enhance the robustness: 1) introduce the smoothing technique to policies and distribution functions for states near the dataset; 2) strengthen the quantile network. In addition to improving the robustness, we also theoretically prove that ORDER converges to a conservative lower bound, which can alleviate the distribution shift. In our experiments, we validate the effectiveness of ORDER in the D4RL benchmark through comparative experiments and ablation studies.