Knowledge Discovering in Corporate Securities Fraud by Using Grammar Based Genetic Programming

Abstract

Securities fraud is a common worldwide problem, resulting in serious negative consequences to securities market each year. Securities Regulatory Commission from various countries has also attached great importance to the detection and prevention of securities fraud activities. Securities fraud is also increasing due to the rapid expansion of securities market in China. In accomplishing the task of securities fraud detection, China Securities Regulatory Commission (CSRC) could be facilitated in their work by using a number of data mining techniques. In this paper, we investigate the usefulness of Logistic regression model, Neural Networks (NNs), Sequential minimal optimization (SMO), Radial Basis Function (RBF) networks, Bayesian networks and Grammar Based Genet- ic Programming (GBGP) in the classification of the real, large and latest China Corporate Securities Fraud (CCSF) database. The six data mining techniques are compared in terms of their performances. As a result, we found GBGP outperforms others. This paper describes the GBGP in detail in solving the CCSF problem. In addition, the Synthetic Minority Over-sampling Technique (SMOTE) is applied to generate synthetic minority class examples for the imbalanced CCSF dataset.

Share and Cite:

Li, H. and Wong, M. (2014) Knowledge Discovering in Corporate Securities Fraud by Using Grammar Based Genetic Programming. Journal of Computer and Communications, 2, 148-156. doi: 10.4236/jcc.2014.24020.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Cumming, D.J., Hou, W.X. and Lee, E. (2011) The Role of Financial Analysts in Deterring Corporate Fraud in China.
[2] Chen, G.M., Firth, M., Gao, D.N. and Rui, O.M. (2006) Ownership Structure, Corporate Governance, and Fraud: Evidence from China. Journal of Corporate Finance, 12, 424-448. http://dx.doi.org/10.1016/j.jcorpfin.2005.09.002
[3] Agrawal, A. and Chadha, S. (2005) Corporate Governance and Accounting Scandals. Journal of Law and Economics, 48, 371-406. http://dx.doi.org/10.1086/430808
[4] Wang, T.Y., Winton, A. and Yu, X.Y. (2010) Corporate Fraud and Business Conditions: Evidence from IPOs. The Journal of Finance, 65, 2255-2292. http://dx.doi.org/10.1111/j.1540-6261.2010.01615.x
[5] Kirkos, E., Spathis, C. and Mano-lopoulos, Y. (2007) Data Mining Techniques for the Detection of Fraudulent Financial Statements. Expert Systems with Applications, 32, 995-1003. http://dx.doi.org/10.1016/j.eswa.2006.02.016
[6] Chen, G.M., Firth, M., Gao, D.N. and Rui, O.M. (2006) Ownership Structure, Corporate Governance, and Fraud: Evidence from China. Journal of Corporate Finance, 12, 424-448. http://dx.doi.org/10.1016/j.jcorpfin.2005.09.002
[7] Chen, G.M., Firth, M., Gao, D.N. and Rui, O.M. (2005) Is China’s Securities Regulatory Agency a Toothless Tiger? Evidence from Enforcement Actions. Journal of Accounting and Public Policy, 24, 451-488. http://dx.doi.org/10.1016/j.jaccpubpol.2005.10.002
[8] Cox, J.D., Thomas, R.S. and Kiku, D. (2003) SEC Enforcement Heuristics: An Empirical Inquiry. Duke Law Journal, 737-779.
[9] Loveard, T. and Ciesielski, V. (2001) Representing Classification Problems in Genetic Programming. Proceedings of the 2001 Congress on Evolutionary Computation, 2, 1070-1077.
[10] Ngan, P.S., Wong, M.L., Leung, K.S. and Cheng, J.C.Y. (1998) Using Grammar Based Genetic Programming for Data Mining of Medical Knowledge. Genetic Programming, 254-259.
[11] Lin, J.W., Hwang, M.I. and Becker, J.D. (2003) A Fuzzy Neural Network for Assessing the Risk of Fraudulent Financial Reporting. Managerial Auditing Journal, 18, 657-665. http://dx.doi.org/10.1108/02686900310495151
[12] Wong, M.L. and Leung, K. S. (1995) Inducing Logic Programs with Genetic Algorithms: The Genetic Logic Programming System. IEEE Expert, 10, 68-76. http://dx.doi.org/10.1109/64.464935
[13] Wong, M.L. and Leung, K.S. (1997) Evolutionary Program Induction Directed by Logic Grammars. Evolutionary Computation, 5, 143-180. http://dx.doi.org/10.1162/evco.1997.5.2.143
[14] Koza, J.R. (1990) Genetic Programming: A Paradigm for Genetically Breeding Populations of Computer Programs to Solve Problems. Department of Computer Science, Stanford University.
[15] Hopcroft, J.E. (2008) Introduction to Automata Theory, Languages, and Computation, 3/E. Pearson Education India.
[16] Goldberg, D.E. and Holland, J.H. (1988) Genetic Algorithms and Machine Learning. Machine Learning, 3, 95-99. http://dx.doi.org/10.1023/A:1022602019183
[17] Leung, Y. and Leung, K.S. (1992) Rule Learning in Expert Systems Using Genetic Algorithms: 1, Concepts. Proceedings of the 2nd International Conference on Fuzzy Logic and Neural Network, 1, 201-204.
[18] Liu, A., Ghosh, J. and Martin, C.E. (2007) Generative Oversampling for Mining Imbalanced Datasets. DMIN, 66-72.
[19] Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.