Similarity Texter

Additional Information

Test 2010: S10-30 plagium

Software Profile | Testoverview | Summary| Screenshots | Business Promotion | Links

Software Profile

ID S10-30
Product plagium
Company Dr. Benjamin Epstein
Septet Systems Inc.
6747 Ingram St.
Forest Hills, New York, 11375
Web Site http://www.plagium.com
Software Type Online
Costs free
Test Date 2010-02-07

Testoverview

Ranking for all tests: 25
Ranking for tests 10-40: 25
Ranking for tests 31-40: 15
Usability: 10
Professionalism 14
Middle Ranking 17,8
Effectiveness (Grade): F
Overall Ranking: 19

useless


Summary

plagium is a new system that is still in beta testing. It uses Yahoo as its search engine. It takes a long time to check, our first attempt crashed after 15 minutes of reporting 98% finished. Numerous other tests crashed as well, but we were able to get all test cases finished in about 3 hours. Only one test can be done at a time, and the reports just give some text with a link and a “rank” with a percentage. There is a “Text Usage Timeline” offered, which has red balls of different diameters plotted on a grid of months vs. time, which makes no sense to us as teachers, but apparently can be useful for people tracking plagiarism of their sites. On the older tests it would often report a partial plagiarism with a page that was a complete copy, on the newer tests only results were found when at least one of the sources was the Wikipedia, otherwise nothing was found, although this may be just a chance result. Overall, only a dismal 26% of the effectiveness points on all test cases were obtained.


Company Statement

The manufacturer is keen that all our correspondence is reproduced here because he is unhappy to be in the category „useless“. He thinks that many people can cope well with his system.

Mail 1:

On 03.01.11 17:50, Ben Epstein wrote:
Thank you for sharing your Plagiarism Detection System Test with us.
Needless to say, we disagree with the conclusions of your so called “study” on a number of grounds:
1) The study is clearly biased in favor of Germanic languages and companies. If you were more forthcoming about this, maybe your study would have more credibility.
On 03.01.11 21:45, Weber-Wulff responds:
In this report we are clear about the test cases up to 30 being in German and 31-40 in English. However, as we have seen since 2004, the only problem is with the representation of the umlauts. We have had systems that did a lot of stemming and manipulations before starting that worked very well on German text, surprisingly.
On 03.01.11 17:50, Ben Epstein wrote:
2) There is absolutely no description of the methodology used to perform the tests. No descriptions are provided of the texts, their sources, and how the results are analyzed other than some subjective remarks. In short, this report is far from any academically acceptable study based on sound statistical principals. Given your commendable background, it really surprises me that you would attach your name and institution to such a poor report.
On 03.01.11 21:45, Weber-Wulff responds:
Oh, then you have not understood that this is not the only paper - we post on our web site (like http://plagiat.htw-berlin.de/software/2008) a large collection of materials including all of the test cases, all of the detailed points awarded and the grading system, and individual pages for each system. This is not a paper intended for publishing, it is far too large. Instead, it gives a short overview of our methodology and discusses the results briefly. Since there are so many systems, it is very long.

You are invited to contribute your own rejoinder, either now or at sometime in the future. I believe very strongly in all data being in the open, and being discussable.
On 03.01.11 17:50, Ben Epstein wrote:
3) Contrary to what you imply, Septet Systems is a real company with real people behind it. We apologize if we do not have a phone number for Germany. It is hard to serve German customers from NY City and we admit that we have not fine tuned our system to operate with the German language (and very few of our users are from Germany).
On 03.01.11 21:45, Weber-Wulff responds:
That's fine, and I understand that. But we are asked by universities in Germany what systems are available and how well they work. There is a current EU statute stating that all web pages that sell goods in the EU must have a person named on the page and a real street address listed, among other things. That is why we have included this in our test. That you were not able to answer the phone when we called is unfortunate, but not the only criterium.
On 03.01.11 17:50, Ben Epstein wrote:
4) Septet does not earn any money for its service, other than the voluntary contributions that it receives from some of its users. Clearly, there are some users that like what we are doing. I find it ironic that a for-profit German site ranks as #1. Any astute reader will question what is going on. Likewise, our own tests with German texts applied to this site are inconsistent with your findings.
On 03.01.11 21:45, Weber-Wulff responds:
Why would that be a problem? A university is interested in finding a system that finds the most plagiarism, not a system that is just cheap to use. We inform about the price, but it is not part of any of our criteria. If you have a look at our past tests, the previous #1 was Copyscape, a Gibralter/Israeli outfit, and before that Ephorus, a Dutch system. The first winner was indeed German.

I don't quite understand what is inconsistent here, but you are welcome to send us text or links to your material, and we will include it on our web page.
On 03.01.11 17:50, Ben Epstein wrote:
In short, your study does a disservice for the many users or would be users of plagiarism software. The industry as a long way to go towards perfecting the tracking of usage, but questionable efforts such as yours only set everybody back. Finally, permit me to state that I would prefer that you release the report without any mention of Plagium.
On 03.01.11 21:45, Weber-Wulff responds:
I understand that you are unhappy that your system did not do well on our test, but as you have just questioned my scientific methodology in
#2, I find it strange that I am requested to remove data from the study. Again - we offer all of the companies the possibility to include a text (we will translate it into German for the German pages) in which you are free to declare our study invalid! I feel that this is an important part of science, being able to question what others do. We are all seeking truth!

Mail 2:

On 04.01.11 17:02, Ben Epstein wrote:
Thank you for your response. I am afraid I still have to question your scientific methodology given that the site you refer to http://plagiat.htw-berlin.de/software/2008 is mainly in German, which I cannot read well.
On 05.01.11 11:04, Debora Weber-Wulff wrote:
Well, I have mostly been writing for a German target group, who often has problems with English. I am sending you an English version of the description of the methodology and the test cases for the 2008 test, I will be extending this for the web site for Friday.
On 04.01.11 17:02, Ben Epstein wrote:
From the little German that I know, it is hard to find any description of the methodology. I also remain firm in stating that no direct references to the methodology are in the text of your report.
On 05.01.11 11:04, Debora Weber-Wulff wrote:
We have test cases with a known amount of plagiarism that we feed to the systems. We evaluate the reports according to a schema, I enclose it (although it is in German), awarding points on a scale of 0-3.
On 04.01.11 17:02, Ben Epstein wrote:
But to get right to the matter, permit me to refute your statements about Plagium:
1) Claiming it to be useless is an incendiary statement. Thousands of people are finding it useful every day. Maybe it is "useless" in the limited context of what you are looking for in plagiarism software, but this context is not clear and in the end leads to outright insulting remarks about others' efforts.
On 05.01.11 11:04, Debora Weber-Wulff wrote:
That is a good point - it is useless for our purposes, but there are many other purposes for which it could be useful, although your system did not find much in the way of plagiarism.
On 04.01.11 17:02, Ben Epstein wrote:
2) Your claims that it crashes are baseless given that we were never > notified of such crashes. The conditions in which the crashes occurred were not even stated in your report. The very fact that you did not attempt to reach out to us indicates and interest more to deride than to help.
On 05.01.11 11:04, Debora Weber-Wulff wrote:
We do not notify anyone about the crashes. We are being normal users, who are mystified when strange things happen. We always provide information to all of the companies *after* the results are published. I enclose screenshots of one search for a plagiarism of three sources that found nothing; a timeline that I do not understand; a crash; and a screen about waiting that never completed. We have noted that you are in beta!
On 04.01.11 17:02, Ben Epstein wrote:
3) Comments about the need for Wikipedia entries are outright false.
On 05.01.11 11:04, Debora Weber-Wulff wrote:
That is our observation - you only found something, when the Wikipedia was found. There is perhaps a different explanation.
On 04.01.11 17:02, Ben Epstein wrote:
4) Your comments about the time line are rather arrogant. Our system is used not just for tracking plagiarism, but for tracking general and legitimate usage as well. Many users find this feature useful.
On 05.01.11 11:04, Debora Weber-Wulff wrote:
We are looking for systems that are useful for universities. We do not track usage, so this would make no sense to a teacher. Teachers need to know which systems are useful for their purposes.
On 04.01.11 17:02, Ben Epstein wrote:
Permit me to take up your offer of including a text in your report to defend our name. To maintain the "scientific credibility" of your study, leave the Plagium description as is -- but include this email thread to show that there are others out there that do not agree with your highly questionable work and conclusions.
On 05.01.11 11:04, Debora Weber-Wulff wrote:
Of course! We will use this thread, and if you want to include any other material we will be glad to have that included as well.

Mail 3:

[table "19" seems to be empty /]

Mail 4:

On 07.01.11 13:32, Ben Epstein wrote:
I really do not like:
1) Your choice of words such as "useless" since Plagium has been used, and can continue to be used, for university work. In your mind you may be targeting the paper towards university usage, but the reader will walk away with the impression that your results apply to general use (even the title of the paper makes no mention of the targeted reader). You can therefore try to hide behind your caveat of university use, but your intentions are clearly otherwise.
On 07.01.11 13:57, Debora Weber-Wulff wrote:
That is your perogative to not like my choice of words. I am not "hiding" - that is my area of investigation. It is entirely possible that your system finds much more plagiarism when tested with different material. With my material, it only was awarded 26,67% of the points for effectiveness, i.e. the correct identification of what constitutes a plagiarized paper. You either found nothing, or only one source for a paper that had three, or you signaled plagiarism on an original paper.
On 07.01.11 13:32, Ben Epstein wrote:
2) We do not appreciate being binned into a category that contains dead > links and ad spam sites.
On 07.01.11 13:57, Debora Weber-Wulff wrote:
I understand. But no matter how we slice it, that's where your system is ranked. You have been tested for the first time - note that turnitin was always very far down on the list. Their system has improved over the years, and I am sure that yours will, too.
On 07.01.11 13:32, Ben Epstein wrote:
3) You continue to make unsubstantiated claims, where I am now sure the test cases were cherry picked to produce a bias in the results.
On 07.01.11 13:57, Debora Weber-Wulff wrote:
On the basis of what scientific evidence do you base this claim? We produce the test material before the test. And are surprised every time at the results. We are not affiliated with any of the systems that we test.
On 07.01.11 13:32, Ben Epstein wrote:
In short, you are engaging on an unfortunate path that can interfere with Septet's ability to conduct its business in a fair and ethical manner. I insist that all correspondences be included in whatever you are going to publish.
On 07.01.11 13:57, Debora Weber-Wulff wrote:
I will be glad to do so. Do you have a street address other than "New York, New York"? Your domain registration hides behind GoDaddy, with the technical contact being Domains by Proxy.
On 07.01.11 13:32, Ben Epstein wrote:
I find it very unfortunate that nowadays anybody can publish what on the surface appears to be an "academic paper", sent out press releases about it, and have it disseminated worldwide without one bit of peer review. Let your own reputation roast on what you are about to do.
On 07.01.11 13:57, Debora Weber-Wulff wrote:
I'm afraid you will find quite the opposite. I have been publishing on plagiarism detection since 2002, and I stand by my results for 2010.

Note: I received no answer to my request for a street address, so I researched the Department of Commerce database. There is an entry from 2005 for a Septet Systems registered by Benjamin Epstein, so I have included the address here, although it appears to be a residential neighborhood. The company may now be located somewhere else.


Screenshots


Screenshot 1: Home


Screenshot 2: Nothing found, although it is a Shake & paste plagiarism of three sources


Screenshot 3: Report

Screenshot 4: We do not understand what should be the red balls.

Screenshot 5: We had to wait often.

Screenshot 6: The server was often overloaded too.


Business Promotion

“Plagium is a service of Septet Systems Inc. – a New York-based company that specializes in advanced search solutions for industry, the public sector, and government. We have aimed to provide an easy to use service that applies to a broad base of users.”


Links

official website http://www.plagium.com

nothing found, although there is a Shake & paste plagiarism three sources