Measuring Whitespace Pattern Sequences as an Indication of Plagiarism

Abstract

There are several methods and technologies for comparing the statements, comments, strings, identifiers, and other visible elements of source code in order to efficiently identify similarity. In a prior paper we found that comparing the whitespace patterns was not precise enough to identify copying by itself. However, several possible methods for improving the precision of a whitespace pattern comparison were presented, the most promising of which was an examination of the sequences of lines with matching whitespace patterns. This paper demonstrates a method of evaluating the sequences of matching whitespace patterns and a detailed study of the method’s reliability.

Share and Cite:

N. Baer and R. Zeidman, "Measuring Whitespace Pattern Sequences as an Indication of Plagiarism," Journal of Software Engineering and Applications, Vol. 5 No. 4, 2012, pp. 249-254. doi: 10.4236/jsea.2012.54029.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] E. Brady and C. Morris, “Whitespace,” 2004. http://compsoc.dur.ac.uk/whitespace
[2] G. Cosma and M. Joy, “Source-Code Plagiarism: A UK Academic Perspective,” Research Report, University of Warwick, Coventry, 2006, pp. 116-120.
[3] G. Cosma, “An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis,” Ph.D. Thesis, University of Warwick, Coventry, 2008.
[4] P. J. Plauger, “Fingerprints,” Embedded Systems Programming, Miller Freeman, San Francisco, 1994, pp. 84-87.
[5] S. Schleimer, D. Wilkerson and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proceedings of the 2003 SIGMOD International Conference on Management of Data, San Diego, 9-12 June 2003, pp. 76-85.
[6] B. Cui, L. Han, Y. Hao, Z. Li, J. Wang and R. Zhang, “Type Redefinition Plagiarism Detection of Token-Based Comparison,” Proceedings of the 2010 International Conference on Multimedia Information Networking and Security of the IEEE Computer Society, Nanjing, 4-6 November 2010, pp. 351-355.
[7] G. Malpohl, M. Philippsen and L. Prechelt, “Finding Plagiarisms among a Set of Programs with JPlag,” Journal of Universal Computer Science, Vol. 8, No. 11, 2000, pp. 1016-1038.
[8] M. Wise, “YAP3: Improved Detection of Similarities in Computer Program and Other Texts,” Proceedings of the 27th SIGCSE Technical Symposium on Computer Science Education, Philadelphia, 15-18 February 1996, pp. 130-134.
[9] C. Anderson and M. Ellis, “Plagiarism Detection in Computer Code,” Rose-Hulman Institute of Technology, Terre Haute, 2005.
[10] H. T. Jonkowitz, “Detecting Plagiarism in Student Pascal Programs,” The Computer Journal, Vol. 31, No. 1, 1998, pp. 1-8. doi:10.1093/comjnl/31.1.1
[11] E. Merlo, “Detection of Plagiarism in University Projects Using Metrics-Based Spectral Similarity,” Dagstuhl Seminar Proceedings, Dagstuh1, Saarland, 2007.
[12] R. Zeidman, “Software Source Code Correlation,” Proceedings of the 5th IEEE/ACIS International Workshop on Component-Based Software Engineering, Honolulu, 10-12 July 2006, pp. 383-392. doi:10.1109/ICIS-COMSAR.2006.79
[13] R. Zeidman, “Multidimensional Correlation of Software Source Code,” Proceedings of the 3rd International Workshop on Systematic Approaches to Digital Forensic Engineering, Oakland, 22-22 May 2008, pp 144-156. doi:10.1109/SADFE.2008.9
[14] H. Li, Z. J. Li, H. H. Yan and H. Xiong, “BUAA_AntiPlagiarism: A System to Detect Plagiarism for C Source Code,” Proceedings of the International Conference on Computational Intelligence and Software Engineering, Wuhan, 11-13 December 2009, pp. 1-5. doi:10.1109/CISE.2009.5366790
[15] U. Bandara and G. Wijayarathna, “A Machine Learning Based Tool for Source Code Plagiarism Detection,” International Journal of Machine Learning and Computing, Vol. 1, No. 4, 2011, pp. 337-343.
[16] J. Hamblen and A. Parker, “Computer Algorithms for Plagiarism Detection,” IEEE Transactions on Education, Vol. 32, No. 2, 1989, pp. 94-99. doi:10.1109/13.28038
[17] C. Daly and J. Horgan, “A Technique for Detecting Plagiarism in Computer Code,” The Computer Journal, Vol. 48, No. 6, 2005, pp. 662-666. doi:10.1093/comjnl/bxh139
[18] S. Aliefendic, “Using Whitespace Patterns to Detect Plagiarism in Program Code,” School of Computer Science and Informatics University College Dublin, Dublin, 2003.
[19] R. Zeidman, “The Software IP Detective’s Handbook: Measurement, Comparison, and Infringement Detection,” Prentice Hall, Boston, 2011
[20] B. Baker, “On Finding Duplication and Near-Duplication in Large Software Systems,” Proceedings of the Second Working Conference on Reverse Engineering, Washington DC, 1995, pp. 86-95.
[21] I. Shay, N. Baer and R. Zeidman, “Measuring Whitespace Patterns as an Indication of Plagiarism,” Proceedings of the ADFSL Conference on Digital Forensics, Security and Law, St. Paul, 20 May 2010, pp. 63-72.
[22] N. Baer and B. Zeidman, “Measuring Software Evolution with Changing Lines of Code,” Proceedings of the 24th International Conference on Computers and Their Applications, New Orleans, 8-10 April 2009, pp. 264-270.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.