Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship

Abstract

Researchers have often commented on the high correlation between McCabe’s Cyclomatic Complexity (CC) and lines of code (LOC). Many have believed this correlation high enough to justify adjusting CC by LOC or even substituting LOC for CC. However, from an empirical standpoint the relationship of CC to LOC is still an open one. We undertake the largest statistical study of this relationship to date. Employing modern regression techniques, we find the linearity of this relationship has been severely underestimated, so much so that CC can be said to have absolutely no explana-tory power of its own. This research presents evidence that LOC and CC have a stable practically perfect linear rela-tionship that holds across programmers, languages, code paradigms (procedural versus object-oriented), and software processes. Linear models are developed relating LOC and CC. These models are verified against over 1.2 million randomly selected source files from the SourceForge code repository. These files represent software projects from three target languages (C, C++, and Java) and a variety of programmer experience levels, software architectures, and de-velopment methodologies. The models developed are found to successfully predict roughly 90% of CC’s variance by LOC alone. This suggest not only that the linear relationship between LOC and CC is stable, but the aspects of code complexity that CC measures, such as the size of the test case space, grow linearly with source code size across lan-guages and programming paradigms.

Share and Cite:

G. JAY, J. HALE, R. SMITH, D. HALE, N. KRAFT and C. WARD, "Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship," Journal of Software Engineering and Applications, Vol. 2 No. 3, 2009, pp. 137-143. doi: 10.4236/jsea.2009.23020.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] R. D. Banker, M. D. Srikant, C. F. Kemerer, and D. Zweig, “Software complexity and maintenance cost,” Communications of the ACM, Vol. 36, No. 11, pp. 81–94, 1993.
[2] G. K. Gill and C. F. Kemerer, “Cyclomatic complexity density and software maintenance productivity,” IEEE Transactions on Software Engineering, Vol. 17, No.12, pp. 1284–1288, 1991. (REF 15)
[3] F. G. Wilkie and B. Hylands, “Measuring complexity in C++ application software,” Software: Practice and Ex-perience, Vol. 28, No. 5, pp. 513–546, 1998. (REF 17)
[4] B. Curtis, S. B. Sheppard and P. Milliman, “Third time charm: Stronger prediction of programmer performance by software complexity metrics,” Proceedings of the 4th International Conference on Software Engineering, pp. 356–360, 1979.
[5] J. C. Munson and T. M. Khoshgoftaar. “The detection of fault-prone programs,” IEEE Transactions on Software Engineering, Vol. 18, No. 5, pp. 423–433, 1992.
[6] V. R. Basili and B. T. Perricone, “Software errors and complexity: An empirical investigation,” Communica-tions of the ACM, Vol. 27, No. 1, pp. 42–52. 1983. (REF 4)
[7] J. C. Munsona and T. M. Khoshgoftaar, “The dimension-ality of program complexity,” Proceedings of the 11th In-ternational Conference on Software Engineering, pp. 245–253, 1989.
[8] T. J. McCabe, “A complexity measure,” IEEE Transac-tions on Software Engineering, Vol. 2, No. 4, pp. 308–320, 1976.
[9] M. Shepperd, “A critique of cyclomatic complexity as a software metric,” Software Engineering Journal, Vol. 3, No. 2, pp. 30–36, 1988.
[10] A. R. Feuer and E. B. Fowlkes, “Some results from an empirical study of computer software,” Proceedings of the 4th International Conference on Software Engineering, pp. 351–355, 1979. (REF 6)
[11] R. Subramanyam and M. S. Krishnan, “Empirical analysis of CK metrics for object-oriented design complexity: Im-plications for software defects,” IEEE Transactions on Software Engineering, Vol. 29, No. 4, pp. 297–310. 2003.
[12] W. Li, “An empirical study of software reuse in recon-structive maintenance,” Software Maintenance: Research and Practice, Vol. 9, pp. 69–83, 1997.
[13] S. J. Sayyad and T. J. Menzies, The PROMISE Repository of Software Engineering Databases, School of Information Technology and Engineering, University of Ottawa, Can-ada, http://promise.site.uottawa.ca/SERepository.
[14] SourceForge, http://www.sourceforge.net/.
[15] Alexa, Top 500, 2007, http://www.alexa.com/.
[16] Hewlett Packard: HP-sponsored projects hosted on Source- Forge, 2007, http://hp.sourceforge.net/.
[17] IBM, http://sourceforge.net/powerbar/websphere/.
[18] Subversion, http://subversion.tigris.org/.
[19] RSM, http://www.msquaredtechnologies.com/.
[20] CCCC, C and C++ Code Counter, 2007, http://sourcefor- ge.net/projects/cccc/.
[21] T. S. Breusch and A. R. Pagan, “A simple test for hetero-scedasticity and random coefficient variation,” Economet-rica, Vol. 47, No. 5, pp. 1287–1294, 1979.
[22] A. F. Siegel, “Robust regression using repeated medians,” Biometrika, Vol. 69, No. 1, pp. 242–244, 1982.
[23] H. Zuse, “A framework of software measurement,” Walter de Gruyter, New York, 1998.
[24] J. Bowen, “Are current approaches sufficient for measur-ing software quality?” Proceedings of Software Quality Assurance Workshop, pp. 148–155, 1978.
[25] M. R. Woodward, M. A. Hennell and D. A. Hedley, “A measure of control flow complexity in program text,” IEEE Transactions on Software Engineering, Vol. 5, No. 1, pp. 45–50, 1979.
[26] M. Paige, “A metric for software test planning,” Proceed-ings of COMPSAC 80 Conference, Buffalo, NY, pp. 499–504, Oct. 1980.
[27] T. Menzies, J. Greenwald and A. Frank. “Data mining static code attributes to learn defect predictors,” IEEE Transactions on Software Engineering, Vol. 33, No. 1, 2–13, 2007.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.