Similarity Texter

Additional Information


Software Profile | Testoverview | Summary| Screenshots | Business Promotion | Links

Software Profile

ID C12-07
Product SIM_TEXT
Company University of Amsterdam
Web Site
Software Type Desktop
Costs Free
Test Date 2012-03-27


Ranking for all tests: 3
Ranking for Source Code Test: 3
Ranking for Text Test: 6
Summary: useful


SIM_TEXT [] is installed locally on a computer system by downloading the C source code files from the home page and starting it from a terminal [Alternatively, at is a version implemented in JavaScript that can be used in any browser without installation.]. Despite the excellent technical documentation, the program is not easy to understand for non-computer scientists. The results are, however, excellent.

The reports are difficult to read at first, as they are only in text with no markings. But one can easily use the reported line numbers to quickly identify and mark the similar passages in an editor or a printed version of the offending documents. The similar passages are not sorted, so one must at times jump around between the files that were compared. At one point it was discovered that the results were only the differences between the files, not the similarities, although it was unclear what was changed in order for this to happen.

This system also does not work with German umlauts [Although it is possible, an online implementation at the VroniPlag Wiki, for example, does deal with umlauts properly.]. If individual words are changed too often in the text, the system will drop to a 0 % similarity measure. But the system in general does an excellent job of finding similarities and can certainly be integrated in other systems.

In summary, the system is difficult in install and use, but produces good results for text collusion and useful results for program collusion and is thus one of the systems that is useful for detecting collusion.


Screenshot 1: Overview

Business Promotion

SIMtests lexical similarity in texts in C, Java, Pascal, Modula-2, Lisp, Miranda, and natural language. It is used

  • to detect potentially duplicated code fragments in large software projects, in program text, in shell scripts and in documentation
  • to detect plagiarism in software projects, educational and otherwise“


official website