Similarity Texter

Additional Information

Summary 2008

There are some programs, which can help finding plagiarism. Nevertheless one has to be aware of the fact that this cannot always happen without any doubts and that 100 percent of all plagiarism cannot be found. Especially plagiarism which is not coming from material available on the internet cannot be found. But also content form the internet is not findable reliably, as our tests have shown.

One has still rely onto ones intuition as a teacher. Is the autor capable of writing in such kind of way? Is there something awkward with the essay? Is there a doubt, one should always refer to a search engine, but one should not indefinitely pursue such a concern. Software to find plagiarism can be helpful or not. We have used a basis of 31 Essays in 2008 (2 original essays and 29 plagiarized ones of different kinds) as well as 9 collusion cases, to test the programs available on the market in September and October 2008.

Theoretically, 93 points were reachable; nevertheless we used by experience the ranking of 80. Plagiarism that has not been recognized by any of the products were separated from the research.

In 2008 we did not only rank the degree of plagiarism detection on a scale form 0 to 3, but we have also ranked the usability of the systems in the respective categories.

Very good systems (80 – 72 points)

Unfortunately, like in 2007, no test scored in this category

Good software (60-71 points)

  • 1st Place goes to the system of the company Indigo Stream Technologies Ltd, Copyscape Premium. The company founded by Gideon Greenspan also has ‘Google Alerts’ in their line of products. The simple interface is easy to use and the found plagiarism seems fine. In total we awarded 70 points to this system as the reports were slightly difficult to handle, but the usability mark 2,7 was given which equals 3rd place. It was only possible to attach a single URL to a file and there was no Quantification of the degree of similarities. Every test costs 5 US-cent and is therefore better for smaller amounts of files to be tested.
  • 2nd Place goes to a new system, Plagiarism-Detector with 68 Points. We were made aware of the system via blogspam which left an “anonymous” message on our website “Plagiatsblog”. They have links to their system in many forums on the internet, but the new version of Firefox warns that there a viruses included in the system. The result is very similar to the one of place 1, because they are using Googles API as well. Only in 2 test cases there were slight derivations detectable. It was one of the three candidates in the test that had to be locally installed. It was necessary to install .NET on the computer, not a trivial thing for some users. According to its advertisement, PDF documents be analyzed, they were though rejected by the system. The search always focused on blocks of 8 words, then 7 were omitted and then 8 words were used again. Because found plagiarism could not be added, the report had a patchwork-like character. The usability mark was nevertheless 2,3 which is place no. 2, because the handling worked well to be integrated into the workflow. The selling company is advertising with an Australian support number, which is diverted to a Ukrainian cell phone. According to the company, the system was programmed by a Ukrainian English lecturer. Emails that we have sent to contact them were not answered. The contact address was the University of Sydney in Australia, but a request on our behalf revealed that the company is unknown to the university and that there is no cooperation. They seemed to be quite irritated to find their helpdesk in the advertisement of the company. Response from 2008-11-14.
  • The third place was awarded to the freeware version of the first place Copyscape Free. This System allows ten tests per Month and has reached 64 points and has reached the same usability mark of 2,7. The differences were only seen in the new test cases. If you have a given IP-address or one is working in a large organization, than it could be that your ten tests have already been used up. Therefore it is not obvious if you can really use the system when you click on the website. There were also strange problems with special characters. Sometimes the system could handle them, sometimes not. There was no explainable pattern.
  • The fourth place goes to the Swedish system Urkund. The System has reached 62 Points and a usability mark of 3.7. The latter was due to the fact that some of the wordings were in Swedish and the layout of overall appearance and reports which needs strong improvement. The tests also took quite long and were not done at the end of the testing day. We could not measure the timeframe, but they were ready the next morning.
  • The 5th place is twice occupied with 61 points and both systems are from Germany.
    • Docoloc of the University of Braunschweig nevertheless only achieved 4,0 as its usability mark, because of the sheer impossibility to have an overview of their reports, the impossibility to understand and the impossibility to print the “Terms of Service”. The reports start with the mentioning of “spots” with lots of URLs. It is not shown how strong the these “spots” were similar to its sources. There were other figures, but they also did not make much sense. Although the reports are highlighting the sports with supposed plagiarism and mention possible sources in a pop-up window, but if you click on it, the source is displayed in a new window, without the supposed source to be marked. In a “source” of 262 pages it is nearly as difficult like searching your sources in a search engine yourself. There are a lot of sources mentioned that do not make sense and are spam websites.
    • PlagAware, a commercial system to supervise websites online, got for its new system (we tested the beta version) the usability mark 1.7, although the system does not really integrate into the workflow of a university. They are demanding that the tested sites are online and that they bear a logo with a link to their company website. The system offered an upload function which we tested once, but the reports were not clear and not clickable. The reports of the online test are nevertheless well readable. The only bothering thing is that a lot of links are mentioned and it is quite demanding, because not well interpretable figures are used in the ranking. Only in the report there are usable remarks how much of the testing document was plagiarized. Sometimes “very strong” plagiarism had been detected (99%), but it could not be found in the presented sources. Once the results of the fracture website Djembe were found as sources. Apart from those small problems, the system was rewarded a good mark.
  • The 7th place with 60 points was given to a program that had just recently still received the mark “good” as it was the winner of the 2007 test. The program is called Ephorus. Its usability can be seen with 3.3 as average. The system names three strengths: strict, standard and resilient, although it was not clear what that actually meant. The respective default setting “Standard” was tested. The tested documents were stored without permission on one of the companies’ servers. It was not asked what kind of options we would like to have when uploading: visibly testing (the default), visible not tested or invisible. Only by reading the documentation it became evident that by choosing the option “visible” the documents were saved on the server. The handling was also quite difficult. It was necessary to give a document a name before uploading. When uploading a Zip-File, the name was automatically given.

Satisfactory systems (48-59 Points)

  • The 8th place goes to a system which refused in 2007 to be tested, because they were just in the process of updating their system to the latest standards. The former MyDropBox-System was bought in 2007 by Blackboard and was integrated into its Learning Management System and is now marketed under the name SafeAssign. We did not get any answer to our emails, but through a fellow university we were granted access. They even created a separate account for us to test the system. The system has reached 57 points and the usability was with 2.7 in the average middle field. The website has quite an informal style (“Wanna learn more”) and they use like Blackboard only a fraction of the screen for the actual document. Only 25 percent remain for the plagiarism check after seven (!) other different navigation blocks have spread over the screen.
  • 9th place was awarded to the Polish system strikeplagiarism with 55 points. In the 2007 test, the system could not even reach a quarter of all points, not it is more than half. Nevertheless, the usability mark is with 3,3 still very bad. There are a lot of orthographical mistakes in a very Polish English. One is forced to save all essays on their servers and author and title have to be mentioned for every document. The sources are not marked, which makes their finding very difficult. Some links in the reports are not clickable, others are. Therefore it is not very easy to have an overview. Two coefficients are calculated which sound quite scientific, but they don’t have any relevance. As soon as they have found something in a source, the system does not look for it in other sources anymore.
  • 10th place with 54 points was given to a German system called PlagiatCheck. The system demands that the tested data is online available and that they have an integrated banner that refers to their homepage. This is to increase their google rank easily. The usability mark of 3,0 was an average level, although it was not usable with older Mozilla versions. Every search lasted between 2-5 minutes. Not other test can be done at the same time. For the developers it might be quite interesting to look at the output to see what kind of document is currently analyzed. The term „relevance“ is used, but it is not clear what is meant by it. Unfortunately, in the sources the relevant spots are not marked.
  • 11th place was awarded to AntiPlag with 51 points. This one is not a commercial system, but a study work of students of economic informatics at the University of Dortmund under Prof. Dr. Lackes. The Java-written system was alreadz difficult to install, because the Java version on our system is deliberately outdated. We had to update Java and shut down our Firewall (why, we did not know) to start the system. They could only test a document at a time and the test took quite a while. One testing case –Telnet- was never finished even as it was done on several attempts. Because of the experimental character of the system, we did not give a usability mark.
  • 12th place with 49 points was given to the old PlagAware system which we had tested at the beginning of the survey.

These systems can be compared to throwing a coin, to see if there is plagiarism or not. Nevertheless, they showed results and if they find plagiarism this is at least something. Only the missing of eventually clear plagiarism is problematic.

Sufficient Systems (40 – 47 points)

  • 13th place with 45 points goes to the self-styled market leader TurnItIn. The usability of the system has been set an average 3.0, because the registration and the display, tasks that should normally take place in an LMS-System, are complicated. This might sure be due to the complexity of the system. Names of Students hat to be always typed in and could not be changed afterwards. The six or more navigation bars had often identical labeling that let to different results (Notions such as “Instructor” and “Accounts” were misleading). There was also no cost transparency on the website. A License is always negotiable. The system was as a few at least able to recognize some collusion, but in the reports these collusions were unable to be closed to view the underlying sources. The first 20 test cases were already saved in the database of TurnItIn. They were not ours, but of a company which we caught in 2007 using TurnItIn as their own search engine (have a look at “Eine kuriose Geschichte“. The marking of the degree of plagiarism was suspicious. By default, the percentages of the found sources were just simply added, but there was not much found by the system. This is probably due to the fact that the system is made for the Anglo-American market and is not able to handle Umlaute like they are found in German texts. When it was possible for us to exclude the overlapping documents, we realized again that Wikipedia was seldom mentioned as a source. Also Amazon was missing in the list and some found sources were leading to the web-error 404. Probably those sites were not online anymore. The system is still volatile towards uncountable Spam-sites, that first had to be taken out from the analysis in the hope to find something. With FTF more than 22 spam sites (that did not often even include the text of Wikipedia) had to be excluded. Umlaute caused, like always before, problems, but more to this later.
    The whole test was a rather tiring process. Because TurnItIn is nevertheless quite frequently used, we tried the test and the end of the survey again to exclude the possibility of marking mistakes. We have only awarded 43 points in the first run, thus we have used the slightly better result of the second run.
  • 14th place with 42 points was given to the system XXXX, the last of the still sufficient systems. The company which had us explicitly asked to test their system has ordered us through a lawyer. The company claims we have tested a Beta-Version, but we have only used the access that was given us by the supplier. This is why this system remains without name and hints that might reveal the system are put in exclamation marks. The usability of the system was awarded a 4,3. We had to create folders where it was impossible to upload anything. We had to upload and keep the tests running individually and had occasionally received reports. The reports were nearly unusable, because they offered the “complicated arrangement of the report”. For more Information one had to press “button” which also took a long time and was not really usable. If the system did not find any sources it went without comment to the main page. This also happened during the analysis of documents. We were often not sure at what stage we actually were. They only gave us the amount of hints and no percentage. We did not evaluate “green” colored results, even as they had found plagiarism.

Not sufficient systems (0-39 points)

These systems have also been tested, but their results were not sufficiently enough.

  • Place 15 with 28 points was awarded to the once good system Plagiarism-Finder. The distributor had refused towards us in 2007 that we could test their new version. We have contacted them nevertheless and asked to test the new system, but did not receive any response. The homepage also has “news” from 2004 on its site. As a lot of small newspapers were nevertheless reporting about the system as being effective, we definitely wanted to test it. A distribution partner gave us a full version. They sent us the version 1.2 we had tested in 2004! After the installation we were offered to upgrade the system to 1.3 what we did. Our registration did not work, which was normally necessary to reach the reports within the system. But as the reports were without any code on our hard disk we were still able to mark them. The creation of the reports was already quite demanding, as there was a message after 3 to 5 tests saying “ist kein gültiger Integerwert” in a popup which caused the system to freeze. Only restarting the system allowed to proceed with further tests. When repeating tests, which were done at first unintentionally, there were different results at the end. A colleague form the University of Marburg contacted us during the test and asked us if we could help her. She also had problems with similar occurrences when she tested the system and she thought that she had operated it wrongly. Due to this we have tried several tests and had indeed to realize that tests that were done in a matter of 10 minutes showed difference between 43% and 80% in a text which was a 100 % plagiarized text! We have marked the first results that we have received, but such a system is not able to be used.
  • The last place (place 16) with 19 points was given to the new TurnItIn Global. The company makes advertisements that the new system can process texts in more than 30 languages. As our testing material was German we used a special link to conduct our tests. In normal cases they only received points for not finding sources in which the originals receive points and the finding of two sources from the internet: telnet and blogs. The result is sobering but clear when one remembers that TurnItIn works with hash functions. The large database of hash functions was created by not recognizing Umlaute. Due to this we believe that the recognition at the standard TurnItIn was so low as the „chain of similarities“was always interrupted by words with Umlauted. For TurnItIn Global the Umlaute  in the hash function calculations were not taken into consideration for the test cases. But there are of course no hits in the databases as the stored date has been hashed without Umlaute. This system can therefore not be used in German.

Aborted Tests

The following systems could not be checked due to several reasons:

  • ArticleChecker could not have been tested towards the end. The user display was full with advertisements (as a curiosity, one was even from Plagiarism-Detector) and completely confusing. The results were normally just results from Google that had to be valued by oneself. This was not really what we have regarded as an anti-plagiarism system, therefore the test was aborted.
  • CatchItFirst was already special in the 2007 test as it always delivered “100% original” no matter what was done. This time texts were uploaded that vanished. There were also no reports created. Where there are no reports there is also nothing that can be evaluated.
  • CheckForPlagiarism.net is the new domain of iPlagiarismCheck. The old site is still online and has a new layout. The sample report is still nevertheless very similar to the one of TurnItIn and found percentages are also added. The 24/7 support is, as in iPlagiarismcheck, unfortunately always offline. We had officially asked for the usage, but did not receive an answer. As we wanted to know if they still use TurnItIn as a source we chose 5 test cases that were found by TurnItIn and submitted 15 € (20$) via PayPal in order to conduct those five tests. The money was transferred immediately and we were able to log in quite soon after. The reports carried a time stamp with the mark “Canadian Daylight Time” although their offices are apparently in New York and London. Although they still appear to look very similar to TurnItIn the found results are nevertheless different. It cost us 65 $ more to do the other test cases. This is why we refrained from it and aborted the test.
  • paperseek had a recently expired Google-API key and therefore did not deliver any results. The latest message on the site was from 2005. We aborted the test.
  • ProfNet ist an especially in Germany advertised service/online solution of author Uwe Kamenz. He sent e-mails to several universities and offered for at first 1000 € and than later for 500 € annual fee to overtake the tiring task to look for plagiarism, to create quality reports and to eradicate plagiarism. We were very keen to include his system into our survey and had a very intensive email exchange. He insisted on the fact that he did not offer a system, but a service. We had no problem to test a service, but this was not granted to us. It was feared that competitors could take over ideas if we would test them. Finally we were not allowed to test, but it was offered to us to create a cooperation, what we rejected. We asked of being at least able to publish our email conversation, but also that was not granted to us. This is why we cannot show a test here.
  • WebMasterLabor still has a link to a paper mill on its website. We nevertheless tried out Tibet. It took ages and then nothing was found. We refrained from further tests.

Collusion-Detection will be discussed on another page.