Fluctuation-Model-Based Discrete Probability Estimation for Small Samples

A robust method is proposed for estimating discrete probability functions for small samples. The proposed approach introduces and minimizes a parameterized objective function that is analogous to free energy functions in statistical physics. A key feature of the method is a model of the parameter that controls the trade-off between likelihood and robustness in response to the degree of fluctuation. The method thus does not require the value of the parameter to be manually selected. It is proved that the estimator approaches the maximum likelihood estimator at the asymptotic limit. The effectiveness of the method in terms of robustness is demonstrated by experimental studies on point estimation for probability distributions with various entropies.

Share and Cite:

Isozaki, T. (2015) Fluctuation-Model-Based Discrete Probability Estimation for Small Samples. Open Journal of Statistics, 5, 465-474. doi: 10.4236/ojs.2015.55048.

Conflicts of Interest

The authors declare no conflicts of interest.

 [1] Shannon, C.E. (1948) A Mathematical Theory of Communication. Bell Systems Technical Journal, 27, 379-423, 623-656. [2] Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA. [3] Basu, A., Harris, I.R., Hjort, N.L. and Jones, M.C. (1998) Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika, 85, 549-559. http://dx.doi.org/10.1093/biomet/85.3.549 [4] Beran, R. (1977) Minimum Hellinger Distance Estimates for Parametric Models. Annals of Statistics, 5, 445-463. http://dx.doi.org/10.1214/aos/1176343842 [5] Jaynes, E.T. (1957) Information Theory and Statistical Mechanics. Physical Review, 106, 620-630. http://dx.doi.org/10.1103/PhysRev.106.620 [6] Kullback, S. and Leibler, R.A. (1951) On Information and Sufficiency. Annals of Mathematical Statistics, 22, 79-86. http://dx.doi.org/10.1214/aoms/1177729694 [7] Callen, H.B. (1985) Thermodynamics and an Introduction to Thermostatistics. 2nd Edition, John Wiley & Sons, Hoboken, NJ. [8] Kittel, C. and Kroemer, H. (1980) Thermal Physics. W. H. Freeman, San Francisco, CA. [9] Isozaki, T., Kato, N. and Ueno, M. (2009) “Data Temperature” in Minimum Free Energies for Parameter Learning of Bayesian Networks. International Journal on Artificial Intelligence Tools, 18, 653-671. http://dx.doi.org/10.1142/S0218213009000342 [10] Hofmann, T. (1999) Probabilistic Latent Semantic Analysis. Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI-99), Stockholm, 30 July-1 August 1999, 289-296. [11] LeCun, Y. and Huang, F.J. (2005) Loss Functions for Discriminative Training of Energy-Based Models. Proceedings of International Workshop on Artificial Intelligence and Statistics (AISTATS-05), Barbados, 6-8 January 2005, 206-213. [12] Pereira, F., Tishby, N. and Lee, L. (1993) Distributional Clustering of English Words. In: Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-93), Association for Computational Linguistics, Stroudsburg, 183-190. http://dx.doi.org/10.3115/981574.981598 [13] Ueda, N. and Nakano, R. (1995) Deterministic Annealing Variant of the EM Algorithm. Proceedings of Advances in Neural Information Processing Systems 7 (NIPS 7), Denver, 29 November-1 December 1994, 545-552. [14] Watanabe, K., Shiga, M. and Watanabe, S. (2009) Upper Bound for Variational Free Energy of Bayesian Networks. Machine Learning, 75, 199-215. http://dx.doi.org/10.1007/s10994-008-5099-x [15] Jones, M.C., Hjort, N.L., Harris, I.R. and Basu, A. (2001) A Comparison of Related Density-Based Minimum Divergence Estimators. Biometrika, 88, 865-873. http://dx.doi.org/10.1093/biomet/88.3.865 [16] Windham, M.P. (1995) Robustifying Model Fitting. Journal of the Royal Statistical Society B, 57, 599-609. [17] Pöschel, T., Ebeling, W., Frömmel, C. and Ramírez, R. (2003) Correction Algorithm for Finite Sample Statistics. The European Physical Journal E, 12, 531-541. http://dx.doi.org/10.1140/epje/e2004-00025-4