Knowledge Discovery for Query Formulation for Validation of a Bayesian Belief Network
Gursel Serpen, Michael Riesen
.
DOI: 10.4236/jilsa.2010.23019   PDF    HTML     4,575 Downloads   8,555 Views   Citations

Abstract

This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data. Although do-main expertise is often available, the query formulation task is tedious and laborious, and hence automation of query formulation is desirable. In an effort to automate the query formulation process, a machine learning algorithm is lev-eraged to discover knowledge in the form of if-then rules in the data from which the Bayesian belief network model under validation was also induced. The set of if-then rules are processed and filtered through domain expertise to identify a subset that consists of “interesting” and “significant” rules. The subset of interesting and significant rules is formulated into corresponding queries to be posed, for validation purposes, to the Bayesian belief network induced from the same dataset. The promise of the proposed methodology was assessed through an empirical study performed on a real-life dataset, the National Crime Victimization Survey, which has over 250 attributes and well over 200,000 data points. The study demonstrated that the proposed approach is feasible and provides automation, in part, of the query formulation process for validation of a complex probabilistic model, which culminates in substantial savings for the need for human expert involvement and investment.

Share and Cite:

G. Serpen and M. Riesen, "Knowledge Discovery for Query Formulation for Validation of a Bayesian Belief Network," Journal of Intelligent Learning Systems and Applications, Vol. 2 No. 3, 2010, pp. 156-166. doi: 10.4236/jilsa.2010.23019.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] D. Heckerman, “Bayesian Networks for Data Mining,” Data Mining and Knowledge Discovery, Vol. 1, No. 1, 1997, pp. 79-119.
[2] K. B. Laskey and S. M. Mahoney, “Network Engineering for Agile Belief Network Models,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 4, 2000, pp. 487-498.
[3] K. B. Laskey, “Sensitivity Analysis for Probability As- sessments in Bayesian Networks,” Proceedings of the Ninth Annual Conference on Uncertainty in Artificial In-telligence, Washington, D.C., 1993, pp. 136-142.
[4] M. Pradham, G. Provan, B. Middleton and M. Henrion, “Knowledge Engineering for Large Belief Networks,” Proceedings of the Tenth Annual Conference on Uncer-tainty in Artificial Intelligence, Seattle, Washington, 1994, pp. 484-490.
[5] O. Woodberry, A. E. Nicholson and C. Pollino, “Param- eterising Bayesian Networks,” In: G. I. Webb and X. Yu Eds., Lecture Notes in Artificial Intelligence, Springer- Verlag, Berlin, Vol. 3339, 2004, pp. 1101-1107.
[6] S. Monti and G. Carenini, “Dealing with the Expert In- consistency in Probability Elicitation,” IEEE Transac- tions on Knowledge and Data Engineering, Vol. 12, No. 4, 2000, pp. 499-508.
[7] H. Witten and E. Frank, “Data Mining: Practical Machine Learning Tools and Techniques,” 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
[8] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, 1994, pp. 487-499.
[9] US Department of Justice, Bureau of Justice Statistics. National Crime Victimization Survey: Msa Data, 1979- 2004. Ann Arbor, MI: Inter-university Consortium for Political and Social Research, 2007-01-15. http://www. icpsr.umich. edu/cocoon/NACJD/STUDY/04576.xml.
[10] T. C. Hart and C. Rennison, Bureau of Justice Statistics, “Special Report”, March 2003, NCJ 195710. http://www.ojp.usdoj.gov/bjs/abstract/rcp00.html
[11] R. Blanco, I. Inza and P. Larra?aga, “Learning Bayesian Networks in the Space of Structures by Estimation of Distribution Algorithms,” International Journal of Intel- ligent Systems, Vol. 18, No. 1, 2003, pp. 205-220.
[12] R. Bouckaert, “Belief Networks Construction Using the Minimum Description Length Principle,” Lecture Notes in Computer Science, Springer-Verlag, Berlin, Vol. 747, 1993, pp. 41-48.
[13] L. M. de Campos, J. M. Fernández-Luna and J. M. Puerta, “An Iterated Local Search Algorithm for Learning Baye-sian Networks with Restarts Based on Conditional Inde-pendence Tests” International Journal of Intelligent Sys-tems, Vol. 18, No. 2, 2003, pp. 221-235.
[14] J. Cheng, R. Greiner, J. Kelly, D. A. Bell and W. Liu, “Learning Bayesian Networks from Data: An Informa- tion—Theory Based Approach,” Artificial Intelligence, Vol. 137, No. 1-2, 2002, pp. 43-90.
[15] D. Heckerman, D. Geiger and D. M. Chickering, “Learn- ing Bayesian Networks: The Combination of Knowledge and Statistical Data,” Machine Learning, Vol. 20, No. 3, 1995, pp. 197-243.
[16] T. V. Allen and R. Greiner, “Model Selection Criteria for Learning Belief Nets: An Empirical Comparison,” Pro-ceedings of International Conference on Machine Learn-ing, Stanford, 2000, pp. 1047-1054.
[17] Y. Guo and R. Greiner, “Discriminative Model Selection for Belief Net Structures,” Proceedings of the Twentieth National Conference on Artificial Intelligence, Pittsburgh, 2005, pp. 770-776.
[18] F. G. Cozman, “JavaBayes Software Package,” University of S?o Paulo, Politécnica, cited 2006. http://www.cs.cmu. edu/~fgcozman/home.html .
[19] R. Bouckaert, “Bayesian Network Classifiers in Weka,” Technical Report, Department of Computer Science, Waikato University, Hamilton, 2005.
[20] M. J. Druzdzel and L. C. van der Gaag, ”Building prob- abilistic Networks: Where do the Numbers Come from?” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 4, 2000, pp. 481-486.
[21] T. Boneh, “Visualisation of Structural Dependencies for Bayesian Network Knowledge Engineering,” Masters Thesis, University of Melbourne, Melbourne, 2002.
[22] M. J. Druzdzel and L. C. van der Gaag, “Elicitation of Probabilities for belief Networks: Combining Qualitative and Quantitative Information,” Proceedings of the Tenth Annual Conference on Uncertainty in AI, Seattle, 1995, pp. 141-148.
[23] H. Witten and E. Frank, “Generating Accurate Rule Sets without Global Optimization,” Proceedings of the Fif- teenth International Conference on Machine Learning, Madison, 1998, pp. 144-151.
[24] J. R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publishers, San Mateo, 1993.
[25] W. W. Cohen, “Fast Effective Rule Induction,” Pro- ceedings of the 12th International Conference on Machine Learning, Lake Tahoe, 1995, pp. 115-123.
[26] J. Hipp, U. Guntzer and G. Nakaeizadeh, “Algorithms for Association Rule Mining—A General Survey and Com- parison,” ACM SIGKDD Explorations, Vol. 2, No. 1, 2000, pp. 58-64.
[27] M. Riesen, “Development of a Bayesian Belief Network Model of NCVS Data as a Generic Query Tool,” Masters Project, Engineering, University of Toledo, Toledo, 2007.
[28] G. Davis, Private communications, College of Law, Uni- versity of Toledo, Toledo, 2008.
[29] P. Ventura, “Private Communications, Criminal Justice,” University of Toledo, Toledo, 2008.
[30] S. M. Catalano, Crime Victimization 2005, NCJ 214644. http://www.ojp.usdoj.gov/bjs/abstract/cv05.html.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.