Accuracies and Training Times of Data Mining Classification Algorithms: An Empirical Comparative Study

Abstract

Two important performance indicators for data mining algorithms are accuracy of classification/ prediction and time taken for training. These indicators are useful for selecting best algorithms for classification/prediction tasks in data mining. Empirical studies on these performance indicators in data mining are few. Therefore, this study was designed to determine how data mining classification algorithm perform with increase in input data sizes. Three data mining classification algorithms—Decision Tree, Multi-Layer Perceptron (MLP) Neural Network and Naïve Bayes— were subjected to varying simulated data sizes. The time taken by the algorithms for trainings and accuracies of their classifications were analyzed for the different data sizes. Results show that Naïve Bayes takes least time to train data but with least accuracy as compared to MLP and Decision Tree algorithms.

Share and Cite:

Akinola, S. and Oyabugbe, O. (2015) Accuracies and Training Times of Data Mining Classification Algorithms: An Empirical Comparative Study. Journal of Software Engineering and Applications, 8, 470-477. doi: 10.4236/jsea.2015.89045.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Han, J.W., Kamber, M. and Pei, J. (2012) Data Mining Concepts and Techniques. 3rd Edition, Morgan Kaufmann Publishers, Waltham.
[2] Raj, K. and Rajesh, V. (2012) Classification Algorithms for Data Mining: A Survey. International Journal of Innovations in Engineering and Technology (IJIET), 1.
[3] Berkin, O., et al. (2006) An Architectural Characterization Study of Data Mining and Bioinformatics Workloads. Evanston.
[4] Pardeep, K., Nitin, V.K. and Sehgal, D.S.C. (2012) A Benchmark to Select Data Mining Based Classification Algorithms for Business Intelligence and Decision Support Systems. International Journal of Data Mining & Knowledge Management Process (IJDKP), 2.
[5] Thirunavukkarasu, K.S. and Sugumaran, S. (2013) Analysis of Classification Techniques in Data Mining. IJESRT: International Journal of Engineering Sciences & Research Technology, 3640-3646.
[6] Abirami, N., Kamalakannan, T. and Muthukumaravel, A. (2013) A Study on Analysis of Various Data Mining Classification Techniques on Healthcare Data. International Journal of Emerging Technology and Advanced Engineering, 3.
[7] Liu, Y., Pisharath, J., Liao, W.-K., Memik, G., Choudhary, A. and Dubey, P. (2002) Performance Evaluation and Characterization of Scalable Data Mining Algorithms. Intel Corporation, CNS-0406341.
[8] Syeda, F.S., Mirza, M.A.B. and Reena, M.P. (2013) Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis. IOSR Journal of Computer Engineering (IOSR-JCE), 10, 1-6.
[9] Gopala, K.M.N., Bharath, K.P., Nagaraju, O. and Suresh, B.M. (2013) Performance Analysis and Evaluation of Different Data Mining Algorithms Used for Cancer Classification. (IJARAI) International Journal of Advanced Research in Artificial Intelligence, 2.
[10] Nikhil, N.S. and Kulkarni, R.B. (2013) Evaluating Performance of Data Mining Classification Algorithm in Weka. International Journal of Application or Innovation in Engineering & Management, 2.
[11] Daniela, X., Christopher, J.H. and Roger, G.S. (2009) Naïve Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages. IJSCI: International Journal of Computer Science Issues, 4.
[12] Anshul, G. and Rajni, M. (2012) Performance Comparison of Naïve Bayes and J48 Classification Algorithms. International Journal of Applied Engineering Research, 7.
[13] Sampson, A. (2012) Comparing Classification Algorithms in Data Mining. A Thesis, Central Connecticut State University New Britain, Connecticut.
[14] Jyoti, S., Ujma, A., Dipesh, S. and Sunita, S. (2011) Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction. International Journal of Computer Applications, 17.
[15] John, S., Rakeeh, A. and Manish, M. (1997) SPRINT: A Scalable Parallel Classifier for Data Mining. IBM Almaden Research Center, San Jose.
[16] Robert, D.S. and Herbert, A.E. (1997) Scalable Data Mining, Two Crows Corp.
http://www.twocrows.com/intro-dm.pdf
[17] Huidong, J. (2002) Scalable Model-Based Clustering Algorithms for Large Databases and Their Applications. Ph.D. Thesis, The Chinese University of Hong Kong, Hong Kong.
[18] Lalitha, S.T. and Suresh, B.C. (2013) Optimum Learning Rate for Classification Problem with MLP in Data Mining. International Journal of Advances in Engineering & Technology.
[19] Michael, J.A.B. and Gordon, S.L. (2004) Data Mining Techniques for Marketing, Sales, and Customer Relationship Management. 2nd Edition, Wiley Publishing, Inc., Indianapolis.
[20] Brijesh, K.B. and Saurabh, P. (2011) Mining Educational Data to Analyze Students’ Performance. International Journal of Advanced Computer Science and Applications, 2.
[21] Chowdary, B.V., et al. (2012) Decision Tree Induction Approach for Data Classification Using Peanut Count Trees. International Journal of Advanced Research in Computer Science and Software Engineering, 2.
[22] Kabir, M.F., et al. (2011) Enhanced Classification Accuracy on Naive Bayes Data Mining Models. International Journal of Computer Applications, 28.
[23] Galathiya, A.S., et al. (2012) Improved Decision Tree Induction Algorithm with Feature Selection, Cross Validation, Model Complexity and Reduced Error Pruning. International Journal of Computer Science and Information Technologies, 3, 3427-3431.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.