Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

ALLEA KWAN symposium Amsterdam 2011-12-14

1.008 Aufrufe

Veröffentlicht am

Slides from the ALLEA Kwan symposium, Amsterdam 2011-12-14, about technical possibilities of detecting plagiarism - Comparative analysis of detection tools.

Veröffentlicht in: Bildung, Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

ALLEA KWAN symposium Amsterdam 2011-12-14

  1. 1. flickr, cc by-nc jobadge, 2011Technical possibilities of detecting plagiarism -Comparative analysis of detection toolsKatrin Köhler (B.SC.)Plagiarism - legal, moral and educational aspects, Amsterdam, 2011-12-14 Slides based on Debora Weber-Wulff, edited by Katrin Köhler
  2. 2. About me • Research assistant of Prof. Dr. Weber-Wulff since 2007 • Sofware Test in 2008 and 2010 • Masterthesis about “Cryptographic Watermarking for Texts”2 / 52
  3. 3. Contents • Plagiarism Detection Test 2010 • Doctor Thesis of Karl-Theodor zu Guttenberg • Discovering plagiarism3 / 52
  4. 4. Teachers and administrations want an simple solution Photo: Flickr cc-by-nc-sa: xtrarant, 2008 Art Installation: Jamie Pawlus, Indianapolis, Indiana, 20034 / 52
  5. 5. Many software companies are glad to help5 / 52
  6. 6. Plagiarism detection software • Can be extremely expensive! • Teachers want to have all papers marked original or plagiarism before they start reading them. • Students are afraid of wrongly being labeled plagiarists. • Only a teacher can decide if it is indeed plagiarism! Software cannot be used to solve social problems. • Prof. Dr. Weber-Wulff has tested plagiarism detection software 4.5 times: 2004, 2007, 2008, 2010 and zu Guttenberg’s thesis 6/6 150
  7. 7. Test process 2010 • 9 months of work with 2 persons • 42 test cases in English, German and Japanese • Different types of plagiarism, a few originals • Market survey • Access to the systems • 48 systems found, 26 could be completely evaluated7 / 52
  8. 8. Evaluation metric: Effectivness • Plagiarism or not: What was found? • Total • Without the first 10 tests (Google accident) • English cases • Japanese cases as additional challenge Flickr, cc-by, arthit, 2005 ➡No winner, continuous between 55% and 64 %8 / 52
  9. 9. Evaluation metric: Usability • Design, language consistency, navigation, labelling, print quality of the reports, fits in university processes • Support by email: Speed, good answers • Top: PlagScan, followed by PlagiarismFinder, Ephorus, PlagAware and TurnItIn Flickr, cc-by, Quapan, 20089 / 52
  10. 10. Evaluation metric : Professionalism • Street address with town, telephone number, name of a person • Domain registration in own name Flickr, cc-by-sa, • No parallel offers of term papers or sludgegulper , 2008 pornography or advertising for such services • German-speaking availability by telephone during German working hours • No installation of viruses ➡ PlagiarismFinder, followed by PlagAware, Strike Plagiarism, TurnItIn, Docoloc, PlagScan, Blackboard10 / 52
  11. 11. Problems: Effectiveness • Nothing found from books - not even if they are in Google Books! • We had one 100% plagiarism from Google books register at less than 25% • Translations are not found11 / 52
  12. 12. Problems: Effectiveness • Umlauts cause problems, although less so than in earlier tests • Redacted texts are found less often • Many systems very difficult to use • Not all companies trustworthy • Some keep copies - and award themselves rights to use the text!12 / 52
  13. 13. Problems: Usability • Language mix • Workflow problems • The reports are generally not useful13 / 52
  14. 14. Problems: Professionalism • No info, no names • The address listed is a parking lot • Support questions not answered, telephone does not pick up • Offer term papers or pornography in parallel, all rights given to the company14 / 52
  15. 15. How to rank? • No system was best in all of the metrics • We set up a ranking for each of the five criteria (three effectiveness, one usability, one professionalism) • Calculated the average ranking15 / 52
  16. 16. Results: Useful • There were no systems in this category - only human are able to reach this level of effectiveness. Flickr, cc-by-nc, dianejp, 200916 / 52
  17. 17. Results: Partially useful systems17 / 52
  18. 18. Partially useful: PlagAware • German System • Good documentation • Average effectiveness: 61% • But: each file must be submitted by itself (5 clicks!), this does not fit with the workflow • Looks for plagiarism in online texts18 / 52
  19. 19. PlagAware19 / 52
  20. 20. Partially useful : turnitin • Best results for material that is stored in their database • Translation problems • Umlaut problems • Return Wikipedia copies with ads for porn • The source URLs reported are often no longer valid • Just adds up the percent values for the “originality” report • Only system to deal with Japanese properly20 / 52
  21. 21. turnitin Orginality Report21 / 52
  22. 22. turnitin: How colorful!22 / 52
  23. 23. 23 / 52
  24. 24. turnitin stores Texts24 / 52
  25. 25. turnitin remembers for a long time25 / 52
  26. 26. 26 / 52
  27. 27. Partially useful: Ephorus • Dutch system • Direct mail-in using Hand-In-Code • Reports by E-Mail • Stores texts aggressively • Problems with umlauts27 / 52
  28. 28. ephorus: Umlauts28 / 52
  29. 29. 29 / 52
  30. 30. Partially useful: PlagScan • Newcomer from Germany • One purchases “PlagPoints” • Useful: Subaccounts for teachers • First place in usability • Three kinds of report, none of which are a side-by-side report • Only 60% in effectiveness30 / 52
  31. 31. PlagScan31 / 52
  32. 32. PlagScan - Report32 / 52
  33. 33. Partially useful: Urkund • Swedish system • Second in overall effectiveness • 13th in usability and professionalism • Language problems • Complex navigation • Catastrophic layout • Unusable reports • Cryptic error messages • Test cases from 2008 were still stored33 / 52
  34. 34. 34 / 52
  35. 35. Urkund: Report35 / 52
  36. 36. Barely useful Systems • They find something, but miss a lot • They are not really easy to use • They have professionalism problems • Docoloc, Copyscape, Blackboard/Safe Assign, Plagiarism Finder, Plagiarisma, Compilatio, StrikePlagiarism, The Plagiarism Checker36 / 52
  37. 37. Strange tales • checkforplagiarism.net • Viper cc-by-sa D. Weber-Wulff, 200937 / 52
  38. 38. checkforplagiarism.net • In 2007 it was called iPlagiarismcheck.com • Was a plagiarism of turnitin, but they said: These are the sources! • Charge 15 € for 5 tests, students are the target group • turnitin set up a Honeypot38 / 52
  39. 39. 39 / 52
  40. 40. Viper • Is installed on a PC • In the terms of use: You give us irrevocable rights to use your text as we see fit • Also runs a paper mill • Complicated reports • Only 24% effectiveness - better to throw a coin! • Advertise in the UK by power cleaning the sidewalks40 / 52
  41. 41. Viper41 / 52
  42. 42. GuttenPlag Collaborative documentation of plagiarism42 / 52
  43. 43. The Extent of the Plagiarism • 135 sources • 94% of pages • 63% of lines 43/43 150
  44. 44. Test Results • 38 of the (at the time of the test) 131 known sources were found by at least one of the systems • Many of these sources (no longer) online • Over all of the possible sources were found: iThenticate 30 23 % PlagScan 19 15 % Urkund 16 12 % PlagAware 7 5% Ephorus 6 5% 44/44 150
  45. 45. We tested these systems on zu Guttenbergs thesis • The usability for such large works was extremely poor • The numbers appear to be random • Many sources throw a 404 “file not found” error with iThenticate • Nothing from books (or the Bundestag) was found45 / 52
  46. 46. The major problem is: • They don’t find plagiarism! Just (marginally changed) copies of text - even properly referenced! Flickr, cc-by-nc, Leeks, 200646 / 52
  47. 47. So let’s have a look ourselves.... • But doesn’t the thesis have to be available digitally? • And the thesis is so long? • And the Internet is extremely large? Flickr, cc-by-nc-nd, t_buchtele, 200947 / 52
  48. 48. Suspicion • Upon careful reading you find it nicely written, but ..... • The style is too polished, the vocabulary not that of your students. • There is some strange formatting • Interesting spelling errors • Lurching breaks in style Flickr, cc-by, redcctshirt, 200948 / 52
  49. 49. Searching with Google & Co • Phrase in "..." • 3-5 nouns Flickr, cc-by-nc-nd, Athena1970, 2008 • The typo • Check the second page of hits • Set a time limit49 / 52
  50. 50. Three words suffice!127 50/ 150
  51. 51. Really!51 / 52
  52. 52. Thank you! • Portal Plagiarism http://plagiat.htw-berlin.de • Plagiarism-Blog: http://copy-shake-paste.blogspot.com/ c. 2011: Axel Völcker, DerWedding.de • Homepage: http://www.f4.htw-berlin.de/~weberwu/ • Kontakt: katrin.koehler@student.htw-berlin.de52 / 52

×