TITLE:
Measuring Whitespace Pattern Sequences as an Indication of Plagiarism
AUTHORS:
Nikolaus Baer, Robert Zeidman
KEYWORDS:
Plagiarism; Source Code; Source Code Similarity; Whitespace; Obfuscation; Indentation; Maintainability; Copyright Infringement; Intellectual Property; Litigation; Open Source
JOURNAL NAME:
Journal of Software Engineering and Applications,
Vol.5 No.4,
April
23,
2012
ABSTRACT: There are several methods and technologies for comparing the statements, comments, strings, identifiers, and other visible elements of source code in order to efficiently identify similarity. In a prior paper we found that comparing the whitespace patterns was not precise enough to identify copying by itself. However, several possible methods for improving the precision of a whitespace pattern comparison were presented, the most promising of which was an examination of the sequences of lines with matching whitespace patterns. This paper demonstrates a method of evaluating the sequences of matching whitespace patterns and a detailed study of the method’s reliability.