Similarity Texter

DIE HOCHMÜTIGE DOHLE UND DER PFAU

Einst lebte eine Dohle, voll von eitlem Stolz, die stahl sich Federn, die dem Pfau entfallen warn und putzte sich damit. Das eigne Dohlenvolk ver- achtend trat sie in der schönen Pfauen Reihn. Der Unver- schämten reißt man hier die Federn aus, jagt sie mit Schnäbeln. Und die Dohle, bös verbleit will wieder nun betrübt zu ihrem Volk zurück. Die aber stoßen sie von sich, mit herbem Schimpf. Und eine derer, die zuvor verachtet, sprach zu ihr “Hätt’ unsre Lebensart dir vormals conveniert, hätt’st du, was die Natur dir schenkte, akzeptiert, dann wär dir weder jene Schande widerfahrn noch müsstest du zum Unglück jetzt verstoßen sein.”

Diese Version von Aesops Fabel ist aus Wilfried Strohs Sammlung von Übersetzungen von Jan Novák: “Aesopia”, die auf Geschichten von Phaedrus basieren.

Test 2010: S10-31 ProfNet

Software Profile | Summary| Screenshots | Business Promotion | Links

Software Profile

ID S10-31
Product ProfNet
Company Prof. Dr. Uwe Kamenz
Wissenschaftlicher Direktor
ProfNet Institut für Internet-Marketing
Klosterstr. 3
48143 Münster
0251 – 48 42 245 (Tel.)
0251 – 48 42 246 (Fax)
Web Site http://www.profnet.de/hs
Software Type
Costs
Test Date hand in: 4. September 2010
report received: 16. September 2010

Summary

ProfNet

Uwe Kamenz, professor for business studies at the FH Dortmund offers a plagiarism detection service through his Institute for Internet Marketing. We had wanted to test this service in 2008, but we were denied access. We requested access again for 2010, and Kamenz permitted us to submit 5 test cases, although the requirement was that they be real student papers, not our test cases.

In 2001 Weber-Wulff began working on plagiarism after a class of 32 students submitted papers, of which 12 turned out to be plagiarisms. All of the papers have been kept, so we chose 4 papers that were known to be plagiarisms and 1 paper that had been suspected of being a plagiarism, but for which there was no source found 2001.

We scanned in the papers and ran a character recognition over the pdfs. We replaced the names of the students with fictitious names, and set up freemail accounts in these names, as we had to put an email address on the submission form of about 20 fields that has to be filled out before the paper could be submitted online. We wanted to see if the students are informed that their papers are being tested. They are not.

We submitted the following papers:

  1. A paper with a few unsourced quotes from a book that are properly quoted elsewhere, and which uses an English word (“inculcate”) that even many native speakers do not know.
  2. A paper for which two pages had been found to be plagiarisms from two sources in 2001
  3. A paper that had been determined to be a complete plagiarism 201
  4. A paper that used long passages from a book that Weber-Wulff had recognized and found without search machine help
  5. A paper that was suspicious because of the language used and the extensive but old literature list

We submitted the papers on September 4, 2010. The reports are marked as tested on Sept. 8, and the report produced Sept. 16. That means that it took two weeks for just these five papers to be tested, an unfeasibly long time for general university use. We then did a thorough analysis of the reports sent to us.

The reports look professional, with many tables, numbers, and a glossary, but it is often unclear what exactly the numbers mean. On closer inspection the reports are overly long, using a half a page for each phrase found, and the phrases could often be combined. The results for the five papers will be discussed in the following section.

  • Report 1 gives a 5% probability of the entire text being a plagiarism. It is not clear why this would be an interesting number, as even just lifting a paragraph without attribution is a clear case of plagiarism. The numbers in the tables are “-15% for the subject area for text analysis” and “-80% for the subject area for text comparison”. We have no idea how to interpret these numbers.There are three possible plagiarisms reported. One is a 60-word excerpt from a book that is not quoted in the paper and properly quoted in the source listed, and which includes the word “inculcate”. This is correctly marked “100 % plagiarism possibility”. The second is a list of 9 words and is listed as “50% plagiarism possibility”, although there is a larger portion lifted, but words were dropped from the source, so the phrase identified is just between two words that were dropped. The third is a 19-word phrase also given as “50% plagiarism possibility” that is taken from a book and properly quoted in the source given by ProfNet. The student paper had taken a 130 word paragraph from the book and had dropped or replaced words in every sentence, this was just the longest unchanged passage.
  • Report 2 gives a 52% probability of the entire text being a plagiarism, and again has tables with numbers that are not understandable, “+ 30%” and “87%” for the subject area. This time texts were reported as being from the test candidate that were not exact copies of the candidate: “andpractices”, “itcontributes”, “thewhole”, etc. are reported as being in the candidate, with “and practices” (with a blank), “it contributes”, “the whole” in the source. The candidate did, however, include a blank as well, so it appears that the reports are not generated but created by hand, perhaps by copying from pdfs. 30 possible plagiarisms are reported, although some are multiple reports of the same text. This is not easy to see in the report. The smallest reported plagiarism is 8 words. Larger amounts are also reported, but break when a word is missing or added, when we missed an error during character recognition, or at page breaks.All of the portions are reported as being “100% possible plagiarisms”, it would be better to know how many words have been copied and perhaps what percent of the total document this is. The URLs for the sources are not always readable, long URLs just end in “…”, one has to google the text in order to find the given source.One of the sources found, however, was golden: The CIA World Fact Book was the basis for many other plagiarisms on the Internet that were listed as sources in this report, and it turned out that the entire paper was a plagiarism of the fact book, and not just 2 pages.
  • Report 3 gives a 70% plagiarism possibility overall. It was known that this paper was almost completely taken from an online source, we spent time measuring the exact amount of the plagiarism. The source, which was listed as a source for 29 of the 31 possible plagiarisms listed, was actually the basis for 82% of the paper, based on word count.  Again, any time the student changed or dropped a word, or there was a page break, the ProfNet report stops the reporting on this phrase and resumes after the changed or dropped word.
  • Report 4 gives a 55% plagiarism possibility overall, with -24% and -25% in the subject area. Twelve possible plagiarisms are reported, most are from an excerpt from the book that is published online, and again the report restarts at word changes. Four of the possible plagiarisms are not reported as 100% plagiarisms, but as 50, 60 (twice), or 80%. They consist of only 14, 13, 16, and 21 words. The one reported as 50% similar actually has only one word different (“terms” instead of “reference”) and one of the 60% includes a “.”, but the phrases are otherwise identical. This makes the numbers even more confusing.
  • Report 5, the one that was suspected of being a plagiarism but no sources found by hand, gives a 6% overall plagiarism possibility. Three passages are reported that are, indeed exact copies and not sourced in the paper. Each of the passages is from a different online source. Attempting to find one of the sources in preparing this report led to another online source that gives its source properly: Microsoft Encarta 1999. And indeed, all of the references in the paper are older than 1999, so it must be assumed that major portions of this paper were lifted from the Encarta, which is unfortunately not online. The paper was 26 pages long, three found paragraphs would not have led to a failing grade, but only to a lowered grade.

In four of the five cases, the search by hand using Google would have been sufficient to find enough plagiarism to fail the student, and would have been much faster. For the fifth case, the system did turn up a minor bit of plagiarism that was not found by hand.


Company Statement


Screenshots


Business Promotion


Links

official website http://www.profnet.de/hs