Practical Guidelines for Learning Bayesian Networks from Small Data Sets


Model learning is the process of extracting, analysing and synthesising information from data sets. Graphical models are a suitable framework for probabilistic modelling. A Bayesian Network (BN) is a probabilistic graphical model, which represents joint distributions in an intuitive and efficient way. It encodes the probability density (or mass) function of a set of variables by specifying a number of conditional independence statements in the form of a directed acyclic graph. Specifying the structure of the model is one of the most important design choices in graphical modelling. Notwithstanding their potential, there are several limitations to learning BNs from small data sets. In this paper, we introduce a set of practical guidelines a modeller can use to deal with these limitations. The main goal of the guidelines is to increase awareness of the underlying assumptions and the tacit implications of several learning techniques. Unsurprisingly, one of the authors’ findings is that learning BNs from small data sets is a complex and challenging task, yet potentially very rewarding. The paper also draws attention to the amount of subjective input needed from the modeller and the necessity to tailor solutions on the particularity of the application.

Share and Cite:

Bookholt, F. , Stuurman, P. and Hanea, A. (2014) Practical Guidelines for Learning Bayesian Networks from Small Data Sets. Open Access Library Journal, 1, 1-13. doi: 10.4236/oalib.1100481.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Lauritzen, S.L. (1996) Graphical Models. Clarendon Press, Oxford.
[2] Pearl, J. (2000) Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge.
[3] Krzanowski, W.J. (2000) Principles of Multivariate Analysis: A User’s Perspective. Oxford Statistical Science Series, Oxford University Press, Oxford.
[4] Langseth, H., Nielsen, T., Rumi, R. and Salmeron, A. (2009) Maximum Likelihood Learning of Conditional MTE Distributions. Proceedings of the 10th European Conference, ECSQARU 2009, Verona, 1-3 July 2009.
[5] Joe, H. (1997) Multivariate Models and Dependence Concepts. Chapman & Hall, London.
[6] Hanea, A.M. (2008) Algorithms for Non-Parametric Bayesian Belief Nets. Dissertation, Delft University of Technology, Delft.
[7] Friedman, N. and Koller, D. (2003) Being Bayesian about Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks. Machine Learning, 50, 95-125.
[8] Cooper, G.F. and Herskovits, E. (1993) A Bayesian Method for the Induction of Probabilistic Networks from Data. Technical Report KSL-91-02, Knowledge Systems Laboratory, Medical Computer Science, Stanford University School of Medicine, Stanford.
[9] Little, R.J.A. and Rubin, D.B. (2002) Statistical Analysis with Missing Data. Wiley-Interscience, Hoboken.
[10] Saltelli, A., Chan, K. and Scott, E.M. (2000) Sensitivity Analysis. John Wiley and Sons, Hoboken.
[11] Harris, N. and Drton, M. (2013) PC Algorithm for Nonparanormal Graphical Models. Journal of Machine Learning Research, 14, 3365-3383.
[12] Cheng, J., Bell, D.A. and Lin, W. (1997) An Algorithm for Bayesian Belief Network Construction from Data. School of Information and Software Engineering, University of Ulster at Jordanstown, Jordanstown.
[13] Cheng, J., Greiner, R., Kelly, J., Bell, D. and Liu, W. (2002) Learning Bayesian Networks from Data: An Information-Theory Based Approach. Artificial Intelligence, 137, 43-90.
[14] Kotsiantis, S. and Kanellopoulos, D. (2006) Discretization Techniques: A Recent Survey. GESTS International Transactions on Computer Science and Engineering, 32, 47-58.
[15] Agresti, A. (1990) Categorical Data Analysis. Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons, Hoboken.
[16] Mehta, C.R. and Patel, N.R. (1983) A Network Algorithm for Performing Fisher’s Exact Test in Contingency Tables. Journal of the American Statistical Association, 78, 427-434.
[17] Cardillo, G. (2010) MyFisher: The Definitive Function for the Fisher’s Exact and Conditional Test for Any Matrix.
[18] Baba, K., Ritei, S. and Masaaki, S. (2004) Partial Correlation and Conditional Correlation as Measures of Conditional Independence. Australian and New Zealand Journal of Statistics, 46, 657-664.
[19] Zhang, K., Peters, J., Janzing, D. and Schölkopf, B. (2012) Kernel-Based Conditional Independence Test and Application in Causal Discovery. Max Planck Institute for Intelligent Systems, Tübingen.
[20] Pearl, J. and Verma, T.S. (1990) Equivalence and Synthesis of Causal Models. Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence, Cambridge, 27-29 July, 220-227.
[21] Chickering, D.M. (2002) Learning Equivalence Classes of Bayesian-Network Structures. The Journal of Machine Learning Research, 2, 445-498.
[22] Ramsey, P.H. (1989) Critical Values for Spearman’s Rank Order Correlation. Journal of Educational and Behavioral Statistics, 14, 245.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.