Test 2010: S10-30 plagium

Software Steckbrief

Nummer	s10-30
Produkt	plagium
Hersteller	Dr. Benjamin Epstein Septet Systems Inc. 6747 Ingram St. Forest Hills, New York, 11375 bepstein@septetsystems.com
Web-Site	http://www.plagium.com
Software-Typ	online
Kosteninfo	kostenlos
Testdatum	7.02.2010

Testüberblick

Rang für alle Tests:	25
Rang für Tests 10-40:	25
Rang für Tests 31-40:	15
Benutzerfreundlichkeit:	10
Professionalität	14
Durchschnittlicher Rang:	17,8
Effektivität (Zensur):	5
Gesamtplatzierung:	19 nutzlos

Plagium ist ein neues System, das sich noch im Betatest befindet und Yahoo als Suchmaschine nutzt. Das Überprüfen dauert lange, unser erster Versuch stürzte nach 15 Minuten mit dem Status “98% fertig” ab. Zahlreiche andere Testfälle stürzten ebenfalls ab, dennoch konnten wir alle Testfälle nach ca. 3 Stunden prüfen. Nur ein Testfall kann zu einem Zeitpunkt getestet werden und die Berichte bestehen nur aus einem Text mit einem Link und einem „Rang“ in Prozent. Es gibt eine bizarre „Text-Verwendungs-Zeitleiste“, auf der rote Kugeln mit unterschiedlichem Durchmesser auf einem Raster von Monaten versus Zeit aufgetragen sind, die für uns als Lehrer keinen Sinn machte. Für Leute, die Plagiate ihrer Webseiten verfolgen, kann das vielleicht interessant sein. Bei den ersten Testfällen wurde für eine komplett kopierte Seite ein Teil-Plagiat gemeldet, bei den neueren Fällen wurden nur Ergebnisse gefunden, wenn es sich zumindest bei einer Quellen um die Wikipedia handelte, wobei dies auch Zufall sein kann. Daher wurde nur wenig gefunden, was zu indiskutablen 26% bei der Effektivität führte.

Stellungnahme des Herstellers

Der Hersteller legt Wert darauf, dass unsere gesamte Korrespondenz hier wiedergegeben wird, weil er unglücklich darüber ist, in der Kategorie „nutzlos“ gelandet zu sein. Er meint, dass viele Leute mit seinem System gut umgehen können.

Brief 1:

On 03.01.11 17:50, Ben Epstein wrote:
Thank you for sharing your Plagiarism Detection System Test with us.
Needless to say, we disagree with the conclusions of your so called “study” on a number of grounds:
1) The study is clearly biased in favor of Germanic languages and companies. If you were more forthcoming about this, maybe your study would have more credibility.

On 03.01.11 21:45, Weber-Wulff responds:
In this report we are clear about the test cases up to 30 being in German and 31-40 in English. However, as we have seen since 2004, the only problem is with the representation of the umlauts. We have had systems that did a lot of stemming and manipulations before starting that worked very well on German text, surprisingly.

On 03.01.11 17:50, Ben Epstein wrote:
2) There is absolutely no description of the methodology used to perform the tests. No descriptions are provided of the texts, their sources, and how the results are analyzed other than some subjective remarks. In short, this report is far from any academically acceptable study based on sound statistical principals. Given your commendable background, it really surprises me that you would attach your name and institution to such a poor report.

On 03.01.11 21:45, Weber-Wulff responds:
Oh, then you have not understood that this is not the only paper - we post on our web site (like http://plagiat.htw-berlin.de/software/2008) a large collection of materials including all of the test cases, all of the detailed points awarded and the grading system, and individual pages for each system. This is not a paper intended for publishing, it is far too large. Instead, it gives a short overview of our methodology and discusses the results briefly. Since there are so many systems, it is very long.

You are invited to contribute your own rejoinder, either now or at sometime in the future. I believe very strongly in all data being in the open, and being discussable.

On 03.01.11 17:50, Ben Epstein wrote:
3) Contrary to what you imply, Septet Systems is a real company with real people behind it. We apologize if we do not have a phone number for Germany. It is hard to serve German customers from NY City and we admit that we have not fine tuned our system to operate with the German language (and very few of our users are from Germany).

On 03.01.11 21:45, Weber-Wulff responds:
That's fine, and I understand that. But we are asked by universities in Germany what systems are available and how well they work. There is a current EU statute stating that all web pages that sell goods in the EU must have a person named on the page and a real street address listed, among other things. That is why we have included this in our test. That you were not able to answer the phone when we called is unfortunate, but not the only criterium.

On 03.01.11 17:50, Ben Epstein wrote:
4) Septet does not earn any money for its service, other than the voluntary contributions that it receives from some of its users. Clearly, there are some users that like what we are doing. I find it ironic that a for-profit German site ranks as #1. Any astute reader will question what is going on. Likewise, our own tests with German texts applied to this site are inconsistent with your findings.

On 03.01.11 21:45, Weber-Wulff responds:
Why would that be a problem? A university is interested in finding a system that finds the most plagiarism, not a system that is just cheap to use. We inform about the price, but it is not part of any of our criteria. If you have a look at our past tests, the previous #1 was Copyscape, a Gibralter/Israeli outfit, and before that Ephorus, a Dutch system. The first winner was indeed German.

I don't quite understand what is inconsistent here, but you are welcome to send us text or links to your material, and we will include it on our web page.

On 03.01.11 17:50, Ben Epstein wrote:
In short, your study does a disservice for the many users or would be users of plagiarism software. The industry as a long way to go towards perfecting the tracking of usage, but questionable efforts such as yours only set everybody back. Finally, permit me to state that I would prefer that you release the report without any mention of Plagium.

On 03.01.11 21:45, Weber-Wulff responds:
I understand that you are unhappy that your system did not do well on our test, but as you have just questioned my scientific methodology in
#2, I find it strange that I am requested to remove data from the study. Again - we offer all of the companies the possibility to include a text (we will translate it into German for the German pages) in which you are free to declare our study invalid! I feel that this is an important part of science, being able to question what others do. We are all seeking truth!

Brief 2:

On 04.01.11 17:02, Ben Epstein wrote:
Thank you for your response. I am afraid I still have to question your scientific methodology given that the site you refer to http://plagiat.htw-berlin.de/software/2008 is mainly in German, which I cannot read well.

On 05.01.11 11:04, Debora Weber-Wulff wrote:
Well, I have mostly been writing for a German target group, who often has problems with English. I am sending you an English version of the description of the methodology and the test cases for the 2008 test, I will be extending this for the web site for Friday.

On 04.01.11 17:02, Ben Epstein wrote:
From the little German that I know, it is hard to find any description of the methodology. I also remain firm in stating that no direct references to the methodology are in the text of your report.

On 05.01.11 11:04, Debora Weber-Wulff wrote:
We have test cases with a known amount of plagiarism that we feed to the systems. We evaluate the reports according to a schema, I enclose it (although it is in German), awarding points on a scale of 0-3.

On 04.01.11 17:02, Ben Epstein wrote:
But to get right to the matter, permit me to refute your statements about Plagium:
1) Claiming it to be useless is an incendiary statement. Thousands of people are finding it useful every day. Maybe it is "useless" in the limited context of what you are looking for in plagiarism software, but this context is not clear and in the end leads to outright insulting remarks about others' efforts.

On 05.01.11 11:04, Debora Weber-Wulff wrote:
That is a good point - it is useless for our purposes, but there are many other purposes for which it could be useful, although your system did not find much in the way of plagiarism.

On 04.01.11 17:02, Ben Epstein wrote:
2) Your claims that it crashes are baseless given that we were never > notified of such crashes. The conditions in which the crashes occurred were not even stated in your report. The very fact that you did not attempt to reach out to us indicates and interest more to deride than to help.

On 05.01.11 11:04, Debora Weber-Wulff wrote:
We do not notify anyone about the crashes. We are being normal users, who are mystified when strange things happen. We always provide information to all of the companies *after* the results are published. I enclose screenshots of one search for a plagiarism of three sources that found nothing; a timeline that I do not understand; a crash; and a screen about waiting that never completed. We have noted that you are in beta!

On 04.01.11 17:02, Ben Epstein wrote:
3) Comments about the need for Wikipedia entries are outright false.

On 05.01.11 11:04, Debora Weber-Wulff wrote:
That is our observation - you only found something, when the Wikipedia was found. There is perhaps a different explanation.

On 04.01.11 17:02, Ben Epstein wrote:
4) Your comments about the time line are rather arrogant. Our system is used not just for tracking plagiarism, but for tracking general and legitimate usage as well. Many users find this feature useful.

On 05.01.11 11:04, Debora Weber-Wulff wrote:
We are looking for systems that are useful for universities. We do not track usage, so this would make no sense to a teacher. Teachers need to know which systems are useful for their purposes.

On 04.01.11 17:02, Ben Epstein wrote:
Permit me to take up your offer of including a text in your report to defend our name. To maintain the "scientific credibility" of your study, leave the Plagium description as is -- but include this email thread to show that there are others out there that do not agree with your highly questionable work and conclusions.

On 05.01.11 11:04, Debora Weber-Wulff wrote:
Of course! We will use this thread, and if you want to include any other material we will be glad to have that included as well.

Brief 3:

[table "19" seems to be empty /]

Brief 4:

On 07.01.11 13:32, Ben Epstein wrote:
I really do not like:
1) Your choice of words such as "useless" since Plagium has been used, and can continue to be used, for university work. In your mind you may be targeting the paper towards university usage, but the reader will walk away with the impression that your results apply to general use (even the title of the paper makes no mention of the targeted reader). You can therefore try to hide behind your caveat of university use, but your intentions are clearly otherwise.

On 07.01.11 13:57, Debora Weber-Wulff wrote:
That is your perogative to not like my choice of words. I am not "hiding" - that is my area of investigation. It is entirely possible that your system finds much more plagiarism when tested with different material. With my material, it only was awarded 26,67% of the points for effectiveness, i.e. the correct identification of what constitutes a plagiarized paper. You either found nothing, or only one source for a paper that had three, or you signaled plagiarism on an original paper.

On 07.01.11 13:32, Ben Epstein wrote:
2) We do not appreciate being binned into a category that contains dead > links and ad spam sites.

On 07.01.11 13:57, Debora Weber-Wulff wrote:
I understand. But no matter how we slice it, that's where your system is ranked. You have been tested for the first time - note that turnitin was always very far down on the list. Their system has improved over the years, and I am sure that yours will, too.

On 07.01.11 13:32, Ben Epstein wrote:
3) You continue to make unsubstantiated claims, where I am now sure the test cases were cherry picked to produce a bias in the results.

On 07.01.11 13:57, Debora Weber-Wulff wrote:
On the basis of what scientific evidence do you base this claim? We produce the test material before the test. And are surprised every time at the results. We are not affiliated with any of the systems that we test.

On 07.01.11 13:32, Ben Epstein wrote:
In short, you are engaging on an unfortunate path that can interfere with Septet's ability to conduct its business in a fair and ethical manner. I insist that all correspondences be included in whatever you are going to publish.

On 07.01.11 13:57, Debora Weber-Wulff wrote:
I will be glad to do so. Do you have a street address other than "New York, New York"? Your domain registration hides behind GoDaddy, with the technical contact being Domains by Proxy.

On 07.01.11 13:32, Ben Epstein wrote:
I find it very unfortunate that nowadays anybody can publish what on the surface appears to be an "academic paper", sent out press releases about it, and have it disseminated worldwide without one bit of peer review. Let your own reputation roast on what you are about to do.

On 07.01.11 13:57, Debora Weber-Wulff wrote:
I'm afraid you will find quite the opposite. I have been publishing on plagiarism detection since 2002, and I stand by my results for 2010.

Note: I received no answer to my request for a street address, so I researched the Department of Commerce database. There is an entry from 2005 for a Septet Systems registered by Benjamin Epstein, so I have included the address here, although it appears to be a residential neighborhood. The company may now be located somewhere else.

Screenshots

Screenshot 1: Einstiegsseite

Screenshot 2: Leider nichts gefunden, obwohl es sich um ein Shake & Paste Plagiat dreier Quellen handelt

Screenshot 3: Bericht

Screenshot 4: Wir verstehen nicht, was die roten Kugeln sein sollen

Screenshot 5: Wir mussten oft warten

Screenshot 6: Der Server war auch öfters überlastet

Firmenwerbung

„Plagium is a service of Septet Systems Inc. – a New York-based company that specializes in advanced search solutions for industry, the public sector, and government. We have aimed to provide an easy to use service that applies to a broad base of users.“

Links

offizielle Website http://www.plagium.com

Suchen

Similarity Texter

Publications

Additional Information

Test 2010: S10-30 plagium

Software Steckbrief

Testüberblick

Kurzfassung

Stellungnahme des Herstellers

Screenshots

Firmenwerbung

Links

Blog: Copy, Shake & Paste

Links Plagiat (de)

Links Plagiarism (en)