TITLE:
Double Sarsa and Double Expected Sarsa with Shallow and Deep Learning
AUTHORS:
Michael Ganger, Ethan Duryea, Wei Hu
KEYWORDS:
Double Sarsa, Double Expected Sarsa, Reinforcement Learning, Deep Learning
JOURNAL NAME:
Journal of Data Analysis and Information Processing,
Vol.4 No.4,
October
17,
2016
ABSTRACT: Double Q-learning has been shown to be effective in reinforcement learning scenarios
when the reward system is stochastic. We apply the idea of double learning that
this algorithm uses to Sarsa and Expected Sarsa, producing two new algorithms
called Double Sarsa and Double Expected Sarsa that are shown to be more robust
than their single counterparts when rewards are stochastic. We find that these algorithms
add a significant amount of stability in the learning process at only a minor
computational cost, which leads to higher returns when using an on-policy algorithm.
We then use shallow and deep neural networks to approximate the actionvalue,
and show that Double Sarsa and Double Expected Sarsa are much more stable
after convergence and can collect larger rewards than the single versions.