Playing against Hedge


Hedge has been proposed as an adaptive scheme, which guides the player’s hand in a multi-armed bandit full information game. Applications of this game exist in network path selection, load distribution, and network interdiction. We perform a worst case analysis of the Hedge algorithm by using an adversary, who will consistently select penalties so as to maximize the player’s loss, assuming that the adversary’s penalty budget is limited. We further explore the performance of binary penalties, and we prove that the optimum binary strategy for the adversary is to make greedy decisions.

Share and Cite:

Anagnostou, M. and Lambrou, M. (2014) Playing against Hedge. International Journal of Communications, Network and System Sciences, 7, 497-507. doi: 10.4236/ijcns.2014.712050.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Robbins, H. (1952) Some Aspects of the Sequential Design of Experiments. Bulletin of the American Mathematical Society, 58, 527-535.
[2] Freund, Y. and Schapire, R.E. (1997) A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139.
[3] Auer, P., Cesa-Bianchi, N., Freund, Y. and Schapire, R.E. (2002) The Non-Stochastic Multi-Armed Bandit Problem. SIAM Journal on Computing, 32, 48-77.
[4] Allenberg-Neeman, C. and Neeman, B. (2004) Full Information Game with Gains and Losses. Algorithmic Learning Theory: 15th International Conference, 3244, 264-278.
[5] Dani, V., Hayes, T.P. and Kakade, S.M. (2008) The Price of Bandit Information for Online Optimization. In: Platt, J.C., Koller, D., Singer, Y. and Roweis, S., Eds., Advances in Neural Information Processing Systems, MIT Press, Cambridge, 345-352.
[6] Bartlett, P., Dani, V., Hayes, T., Kakade, S., Rakhlin, A. and Tewari, A. (2008) High-Probability Regret Bounds for Bandit Online Linear Optimization. Proceedings of 22nd Annual Conference on Learning Theory (COLT), Helsinki.
[7] Cesa-Bianchi, N. and Lugosi, G. (2012) Combinatorial Bandits. Journal of Computer and System Sciences, 78, 1404-1422.
[8] Uchiya, T., Nakamura, A. and Kudo, M. (2010) Algorithms for Adversarial Bandit Problems with Multiple Plays. In: Hutter, M., Stephan, F., Vovk, V. and Zeugmann, T., Eds., Algorithmic Learning Theory, Lecture Notes in Artificial Intelligence No. 6331, Springer, 375-389.
[9] Auer, P., Cesa-Bianchi, N., Freund, Y. and Schapire, R.E. (1995) Gambling in a Rigged Casino: The Adversarial Multi-Armed Bandit Problem. Proceedings of 36th Annual Symposium on Foundations of Computer Science, Milwaukee, 322-331.
[10] Hochbaum, D.S. (1995) Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston.
[11] He, D., Chen, W., Wang, L. and Liu, T.-Y. (2013) Online Learning for Auction Mechanism in Bandit Setting. Decision Support Systems, 56, 379-386.
[12] Park, C. and Lee, J. (2012) Intelligent Traffic Control Based on Multi-Armed Bandit and Wireless Scheduling Techniques. International Conference on Advances in Vehicular System, Technologies and Applications, Venice, 23-27.
[13] Bertsekas, D.P. (1998) Network Optimization. Athena Scientific, Belmont.
[14] Blum, A. and Burch, C. (2000) On-Line Learning and the Metrical Task System Problem. Machine Learning, 39, 35-88.
[15] Cole, S.J. and Lim, C. (2008) Algorithms for Network Interdiction and Fortification Games. Springer Optimization and Its Applications, 17, 609-644.
[16] Vanëk, O., Jakob, M. and Pëchoucek, M. (2011) Using Agents to Improve International Maritime Transport Security. IEEE Intelligent Systems, 26, 90-95.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.