CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities

Abstract

In this paper, a software bug classification algorithm, CLUBAS (Classification of Software Bugs Using Bug Attribute Similarity) is presented. CLUBAS is a hybrid algorithm, and is designed by using text clustering, frequent term calculations and taxonomic terms mapping techniques. The algorithm CLUBAS is an example of classification using clustering technique. The proposed algorithm works in three major steps, in the first step text clusters are created using software bug textual attributes data and followed by the second step in which cluster labels are generated using label induction for each cluster, and in the third step, the cluster labels are mapped against the bug taxonomic terms to identify the appropriate categories of the bug clusters. The cluster labels are generated using frequent and meaningful terms present in the bug attributes, for the bugs belonging to the bug clusters. The designed algorithm is evaluated using the performance parameters F-measures and accuracy. These parameters are compared with the standard classification techniques like Na?ve Bayes, Naive Bayes Multinomial, J48, Support Vector Machine and Weka’s classification using clustering algorithms. A GUI (Graphical User Interface) based tool is also developed in java for the implementation of CLUBAS algorithm.

Share and Cite:

N. Kumar Nagwani and S. Verma, "CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities," Journal of Software Engineering and Applications, Vol. 5 No. 6, 2012, pp. 436-447. doi: 10.4236/jsea.2012.56050.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] http://www.bugzilla.org
[2] http://www.perforce.com
[3] http://www.atlassian.com/software/jira/
[4] http://trac.edgewall.org
[5] IEEE Standard, Classification for Software Anomalies Working Group (IEEE 1044 WG Std), No. 1044, 1993, pp. 1-15.
[6] IEEE Standard, Classification for Software Anomalies Working Group (IEEE 1044 WG Std), No. 1044, (Revision), 2009, pp. 1-15.
[7] H. Kagdi and D. Poshyvanyk, “Who Can Help Me with This Change Request?” IEEE International Conference on Program Comprehension, Vancouver, 17-19 May 2009, pp. 273-277. doi:10.1109/ICPC.2009.5090056
[8] J. Anvik and G. C. Murphy, “Determining Implementation Expertise from Bug Reports,” 4th International Workshop on Mining Software Repositories, Minneapolis, 20-26 May 2007, pp. 9-16. doi:10.1109/MSR.2007.7
[9] H. Kagdi, M. L. Collard and J. I. Maletic, “A Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution,” Journal of Software Maintenance and Evolution: Research and Practice, Vol. 19, No. 2, 2007, pp. 77-131. doi:10.1002/smr.344
[10] H. Kagdi, M. Hammad and J. I. Maletic, “Who Can Help Me with this Source Code Change?” IEEE International Conference on Software Maintenance, Beijing, 28 September-4 October 2008, pp. 157-166. doi:10.1109/ICSM.2008.4658064
[11] N. Ayewah and W. Pugh, “Learning from Defect Removals,” 6th IEEE International Working Conference on Mining Software Repositories, Vancouver, Canada, 16-17 May 2009, pp. 179-182. doi:10.1109/MSR.2009.5069500
[12] S. Kim, E. J. Whitehead Jr. and Y. Zhang, “Classifying Software Changes: Clean or Buggy?” IEEE Transactions on Software Engineering, Vol. 34, No. 2, 2008, pp. 181-196. doi:10.1109/TSE.2007.70773
[13] G. Vijayaraghavan and C. Kaner, “Bug Taxonomies: Use Them to Generate Better Tests,” Software Testing Analysis and Review Conference (STAR EAST 2003), Orlando, 2003, pp. 1-40.
[14] G. Antoniol, K. Ayari, M. D. Penta, F. Khomh and Y. G. Gueheneuc, “Is It a Bug or an Enhancement? A Text- Based Approach to Classify Change Requests,” Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research, New York, 2008, pp. 304-318.
[15] K. Pan, S. Kim and E. J. Whitehead Jr., “Bug Classification Using Program Slicing Metrics,” 6th IEEE International Workshop on Source Code Analysis and Manipulation (SCAM), Philadelphia, 2006, pp. 31-42. doi:10.1109/SCAM.2006.6
[16] D. Lo, H. Cheng, J. W. Han, S. C. Khoo and C. N. Sun, “Classification of Software Behaviors for Failure Detection: A Discriminative Pattern Mining Approach,” ACM Knowledge Discovery in Databases, Paris, 2009, pp. 557-565.
[17] B. Fluri, E. Giger and H. C. Gall, “Discovering Patterns of Change Types,” Proceedings of the 23rd International Conference on Automated Software Engineering (ASE), L’Aquila, 15-19 September 2008, pp. 463-466.
[18] C. N. Sun, D. Lo, X. Y. Wang, J. Jiang and S. C. Khoo, “A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval,” ACM International Conference on Software Engineering, Cape Town, 1-8 May 2010, pp. 45-54.
[19] X. Y. Wang, L. Zhang, T. Xie, J. Anvik and J. Sun, “An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information,” ACM International Conference Software Engineering, Leipzig, 10-18 May 2008, pp. 461-470.
[20] Z. Li, S. Lu, S. Myagmar and Y. Zhou, “CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code,” IEEE Transactions on Software Engineering, Vol. 32, No. 3, 2006, pp. 176-192.
[21] N. Jalbert and W. Weimer, “Automated Duplicate Detection for Bug Tracking Systems,” IEEE International Conference on Dependable Systems & Networks, Anchorage, 24-27 June 2008, pp. 52-61.
[22] D. Cotroneo, S. Orlando and S. Russo, “Failure Classification and Analysis of the Java Virtual Machine,” Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, Lisboa, 4-7 July 2006, pp. 1-10.
[23] P. J. Guo, T. Zimmermann, N. Nagappan and B. Murphy, “Characterizing and Predicting which Bugs Get Fixed: An Empirical Study of Microsoft Windows,” ACM International Conference on Software Engineering, Cape Town, 1-8 May 2010, pp. 495-504.
[24] C. C. Williams and J. K. Hollingsworth, “Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques,” IEEE Transactions on Software Engineering, Vol. 31, No. 6, 2005, pp. 466-480. doi:10.1109/TSE.2005.63
[25] A. Huang, “Similarity Measures for Text Document Clustering,” 6th New Zealand Computer Science Research Student Conference, Christchurch, 14-18 April 2008, pp. 49-56.
[26] F. Beil, M. Ester and X. Xu, “Frequent Term-Based Text Clustering,” ACM Special Interest Group—Knowledge Discovery in Databases, Edmonton, 23-15 July 2002, pp. 436-442.
[27] S. Osinski, J. Stefanowski and D. Weiss, “Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition,” Proceedings of the Springer International Intelligent Information Processing and Web Mining Conference, Zakopane, 17-20 May 2004, pp. 359-368.
[28] O. Zamir and O. Etzioni, “Grouper: A Dynamic Clustering Interface for Web Search Results,” Computer Networks, Vol. 31, No. 11-16, 1999, pp. 1361-1374. doi:10.1016/S1389-1286(99)00054-7
[29] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Melbourne, 1998, pp. 46-54. doi:10.1145/290941.290956
[30] J. Stefanowski and D. Weiss, “Comprehensible and Accurate Cluster Labels in Text Clustering,” Proceedings of the 8th Conference on Information Retrieval and Its Applications, Pittsburgh, 2007, pp. 198-209.
[31] http://www.dcs.shef.ac.uk /~sam/simmetrics.html
[32] http://www.java.com/
[33] http://mysql.com
[34] www.cs.waikato.ac.nz/ml/weka
[35] http://code.google.com/p/android/issues
[36] https://issues.jboss.org/browse/JBSEAM
[37] https://bugzilla.mozilla.org
[38] http://bugs.mysql.com
[39] N. K. Nagwani and S. Verma, “A Frequent Term Based Approach for Generating Discriminative Terms in Software Bug Repositories,” IEEE 1st International Conference on Recent Advances in Information Technology, Dhanbad, 15-17 March 2012, pp. 433-435.
[40] G. H. John and P. Langley, “Estimating Continuous Distributions in Bayesian Classifiers,” Proceedings of 11th Conference on Uncertainty in Artificial Intelligence, San Mateo, 18-20 August 1995, pp. 338-345.
[41] A. Mccallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” Proceedings of AAAI-98 Workshop on Learning for Text Categorization, Madison, 26-27 July 1998, pp. 41-48.
[42] R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publishers, San Mateo, 1993.
[43] V. N. Vapnik, “The Nature of Statistical Learning Theory,” Springer-Verlag, New York, 1995.
[44] http://www.cs.iastate.edu/~yasser/wlsvm/
[45] A. Kyriakopoulou and T. Kalamboukis, “Text Classification Using Clustering,” Proceedings of ECML-PKDD, Discovery Challenge Workshop, Burlin, 18-22 September 2006, pp. 28-38.
[46] http://www.csie.ntu.edu.tw/~cjlin/libsvm/
[47] W. Li, “Random Texts Exhibit Zipf’s-Law-Like Word Frequency Distribution,” IEEE Transactions on Information Theory, Vol. 38, No. 6, 1992, pp. 1842-1845. doi:10.1109/18.165464
[48] W. J. Reed, “The Pareto, Zipf and Other Power Laws,” Economics Letters, Vol. 74, No. 1, 2001, pp. 15-19. doi:10.1016/S0165-1765(01)00524-9

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.