SlideShare a Scribd company logo
1 of 52
Download to read offline
flickr, cc by-nc jobadge, 2011




Technical possibilities of detecting plagiarism -
Comparative analysis of detection tools




Katrin Köhler (B.SC.)
Plagiarism - legal, moral and educational aspects, Amsterdam, 2011-12-14

                                 Slides based on Debora Weber-Wulff, edited by Katrin Köhler
About me


   • Research assistant of Prof. Dr. Weber-Wulff
     since 2007
   • Sofware Test in 2008 and 2010
   • Masterthesis about “Cryptographic Watermarking
     for Texts”




2 / 52
Contents


   • Plagiarism Detection Test 2010
   • Doctor Thesis of Karl-Theodor zu Guttenberg
   • Discovering plagiarism




3 / 52
Teachers and administrations
   want an simple solution




                 Photo: Flickr cc-by-nc-sa: xtrarant, 2008
                 Art Installation: Jamie Pawlus, Indianapolis, Indiana, 2003




4 / 52
Many software companies are glad to help




5 / 52
Plagiarism detection software

   • Can be extremely expensive!
   • Teachers want to have all papers
     marked original or plagiarism before
     they start reading them.
   • Students are afraid of wrongly being
     labeled plagiarists.
   • Only a teacher can decide if it is indeed
     plagiarism! Software cannot be used to solve
     social problems.
   • Prof. Dr. Weber-Wulff has tested plagiarism
     detection software 4.5 times: 2004, 2007, 2008,
     2010 and zu Guttenberg’s thesis

  6
/6
 150
Test process 2010

         • 9 months of work with 2 persons
         • 42 test cases in English, German
           and Japanese
         • Different types of plagiarism,
           a few originals
         • Market survey
         • Access to the systems
         • 48 systems found, 26 could be
           completely evaluated




7 / 52
Evaluation metric: Effectivness

    • Plagiarism or not:
      What was found?
       • Total
       • Without the first 10 tests
         (Google accident)
       • English cases
       • Japanese cases as additional
         challenge                      Flickr, cc-by, arthit, 2005




         ➡No winner,
          continuous between 55% and 64 %


8 / 52
Evaluation metric: Usability


   • Design, language consistency, navigation,
     labelling, print quality of the reports, fits in
     university processes
   • Support by email:
     Speed, good answers
   • Top: PlagScan, followed by
     PlagiarismFinder, Ephorus,
     PlagAware and TurnItIn


                                                Flickr, cc-by, Quapan, 2008




9 / 52
Evaluation metric : Professionalism


   • Street address with town, telephone
     number, name of a person
   • Domain registration in own name
                                                   Flickr, cc-by-sa,
   • No parallel offers of term papers or        sludgegulper , 2008


     pornography or advertising for such services
   • German-speaking availability by telephone
     during German working hours
   • No installation of viruses
   ➡ PlagiarismFinder, followed by PlagAware,
     Strike Plagiarism, TurnItIn, Docoloc,
     PlagScan, Blackboard

10 / 52
Problems: Effectiveness


   • Nothing found from books - not
     even if they are in Google
     Books!
   • We had one 100% plagiarism
     from Google books register at
     less than 25%
   • Translations are not found




11 / 52
Problems: Effectiveness


   • Umlauts cause problems, although less so than
     in earlier tests
   • Redacted texts are found less often
   • Many systems very
     difficult to use
   • Not all companies
     trustworthy
   • Some keep copies - and
     award themselves
     rights to use the text!



12 / 52
Problems: Usability


   • Language mix
   • Workflow problems
   • The reports are generally not useful




13 / 52
Problems: Professionalism


   • No info, no names
   • The address listed is a parking lot
   • Support questions not answered, telephone does
     not pick up
   • Offer term papers or
     pornography in parallel,
     all rights given
     to the company




14 / 52
How to rank?


   • No system was best in all of the metrics
   • We set up a ranking for each of the five criteria
     (three effectiveness, one usability, one
     professionalism)
   • Calculated the average ranking




15 / 52
Results: Useful


   • There were no systems in
     this category - only human
     are able to reach this level of
     effectiveness.




                                       Flickr, cc-by-nc, dianejp, 2009




16 / 52
Results: Partially useful systems




17 / 52
Partially useful: PlagAware


   • German System
   • Good documentation
   • Average effectiveness: 61%
   • But: each file must be submitted by itself (5
     clicks!), this does not fit with the workflow
   • Looks for plagiarism in online texts




18 / 52
PlagAware




19 / 52
Partially useful : turnitin

   • Best results for material that is stored in their
     database
   • Translation problems
   • Umlaut problems
   • Return Wikipedia copies with ads for porn
   • The source URLs reported are often no longer
     valid
   • Just adds up the percent values for the
     “originality” report
   • Only system to deal with Japanese properly



20 / 52
turnitin Orginality Report




21 / 52
turnitin: How colorful!




22 / 52
23 / 52
turnitin stores Texts




24 / 52
turnitin remembers for a long time




25 / 52
26 / 52
Partially useful: Ephorus

   •      Dutch system
   •      Direct mail-in using Hand-In-Code
   •      Reports by E-Mail
   •      Stores texts aggressively
   •      Problems with umlauts




27 / 52
ephorus: Umlauts




28 / 52
29 / 52
Partially useful: PlagScan


   • Newcomer from Germany
   • One purchases “PlagPoints”
   • Useful: Subaccounts for teachers
   • First place in usability
   • Three kinds of report, none of which are a
     side-by-side report
   • Only 60% in effectiveness




30 / 52
PlagScan




31 / 52
PlagScan - Report




32 / 52
Partially useful: Urkund


   •      Swedish system
   •      Second in overall effectiveness
   •      13th in usability and professionalism
   •      Language problems
   •      Complex navigation
   •      Catastrophic layout
   •      Unusable reports
   •      Cryptic error messages
   •      Test cases from 2008 were still stored



33 / 52
34 / 52
Urkund: Report




35 / 52
Barely useful Systems


   • They find something, but miss a lot
   • They are not really easy to use
   • They have professionalism problems

   • Docoloc, Copyscape, Blackboard/Safe Assign,
     Plagiarism Finder, Plagiarisma, Compilatio,
     StrikePlagiarism, The Plagiarism Checker




36 / 52
Strange tales


   • checkforplagiarism.net
   • Viper




                              cc-by-sa D. Weber-Wulff, 2009




37 / 52
checkforplagiarism.net


   • In 2007 it was called
     iPlagiarismcheck.com
   • Was a plagiarism of
     turnitin, but they said:
     These are the sources!
   • Charge 15 €
     for 5 tests, students
     are the target group
   • turnitin set up a
     Honeypot



38 / 52
39 / 52
Viper


   • Is installed on a PC
   • In the terms of use: You give us
     irrevocable rights to use your text
     as we see fit
   • Also runs a paper mill
   • Complicated reports
   • Only 24% effectiveness -
     better to throw a coin!
   • Advertise in the UK by power
     cleaning the sidewalks



40 / 52
Viper




41 / 52
GuttenPlag
   Collaborative documentation of plagiarism




42 / 52
The Extent of
    the Plagiarism

    • 135 sources
    • 94% of pages
    • 63% of lines




  43
/43
  150
Test Results
   • 38 of the (at the time of the test) 131 known
     sources were found by at least one of the
     systems
   • Many of these sources (no longer) online
   • Over all of the possible sources were found:

                    iThenticate         30   23 %
                    PlagScan            19   15 %
                    Urkund              16   12 %

                    PlagAware            7    5%

                    Ephorus              6    5%



  44
/44
  150
We tested these systems on
   zu Guttenbergs thesis

   • The usability for such large
     works was extremely poor
   • The numbers appear to be
     random
   • Many sources throw a 404
     “file not found” error with
     iThenticate
   • Nothing from books (or the
     Bundestag) was found




45 / 52
The major problem is:

     • They don’t find plagiarism! Just (marginally
       changed)
       copies of text - even properly referenced!




                           Flickr, cc-by-nc, Leeks, 2006




46 / 52
So let’s have a look ourselves....


   • But doesn’t the thesis have to be available
     digitally?
   • And the thesis is so long?
   • And the Internet
     is extremely
     large?




             Flickr, cc-by-nc-nd, t_buchtele, 2009




47 / 52
Suspicion


   • Upon careful reading you find it nicely written,
     but .....
   • The style is too polished, the vocabulary not that
     of your students.
   • There is some
     strange formatting
   • Interesting spelling
     errors
   • Lurching breaks in style
                               Flickr, cc-by, redcctshirt, 2009




48 / 52
Searching with Google & Co


   • Phrase in "..."
   • 3-5 nouns
                                Flickr, cc-by-nc-nd, Athena1970, 2008
   • The typo
   • Check the second page
     of hits
   • Set a time limit




49 / 52
Three words suffice!




127
  50
/ 150
Really!




51 / 52
Thank you!


   • Portal Plagiarism
     http://plagiat.htw-berlin.de

   • Plagiarism-Blog:
     http://copy-shake-paste.blogspot.com/    c. 2011: Axel Völcker,
                                              DerWedding.de




   • Homepage:
     http://www.f4.htw-berlin.de/~weberwu/

   • Kontakt: katrin.koehler@student.htw-berlin.de



52 / 52

More Related Content

Viewers also liked

Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
Traian Rebedea
 

Viewers also liked (6)

Plagiarism Detection Tools
Plagiarism Detection ToolsPlagiarism Detection Tools
Plagiarism Detection Tools
 
Plag detection
Plag detectionPlag detection
Plag detection
 
My Graduation Project Documentation: Plagiarism Detection System for English ...
My Graduation Project Documentation: Plagiarism Detection System for English ...My Graduation Project Documentation: Plagiarism Detection System for English ...
My Graduation Project Documentation: Plagiarism Detection System for English ...
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
plagiarism detection tools and techniques
plagiarism detection tools and techniquesplagiarism detection tools and techniques
plagiarism detection tools and techniques
 
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic ProgammingAlgorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
 

Similar to ALLEA KWAN symposium Amsterdam 2011-12-14

CHI: evaluation
CHI: evaluationCHI: evaluation
CHI: evaluation
Erik Duval
 
• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx
• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx
• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx
odiliagilby
 
Going Remote: User experiences at a distance
Going Remote: User experiences at a distanceGoing Remote: User experiences at a distance
Going Remote: User experiences at a distance
linoleumjet
 

Similar to ALLEA KWAN symposium Amsterdam 2011-12-14 (20)

A personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchA personal journey towards more reproducible networking research
A personal journey towards more reproducible networking research
 
You and your code.pdf
You and your code.pdfYou and your code.pdf
You and your code.pdf
 
Electronic Laboratory Notebooks
Electronic Laboratory NotebooksElectronic Laboratory Notebooks
Electronic Laboratory Notebooks
 
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
Agile Offsharing: Using Pair Work to OvercomeNearshoring DifficultiesAgile Offsharing: Using Pair Work to OvercomeNearshoring Difficulties
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
 
Python Meetup: The Origins
Python Meetup: The OriginsPython Meetup: The Origins
Python Meetup: The Origins
 
CHI: evaluation
CHI: evaluationCHI: evaluation
CHI: evaluation
 
Generic or specific? Making sensible software design decisions
 Generic or specific? Making sensible software design decisions  Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Electronic Signatures
Electronic SignaturesElectronic Signatures
Electronic Signatures
 
ICS3211_lecture 9_2022.pdf
ICS3211_lecture 9_2022.pdfICS3211_lecture 9_2022.pdf
ICS3211_lecture 9_2022.pdf
 
Prototyping - Get the right tools and workflow
Prototyping - Get the right tools and workflowPrototyping - Get the right tools and workflow
Prototyping - Get the right tools and workflow
 
• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx
• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx
• COMMUNICATEBUSINESS VISION• WHAT TO EXPECT• .docx
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Going Remote: User experiences at a distance
Going Remote: User experiences at a distanceGoing Remote: User experiences at a distance
Going Remote: User experiences at a distance
 
2005 04 05 SRI ELN Architecture
2005 04 05 SRI ELN Architecture2005 04 05 SRI ELN Architecture
2005 04 05 SRI ELN Architecture
 
What is open source?
What is open source?What is open source?
What is open source?
 
ICS3211 Lecture 9
ICS3211 Lecture 9ICS3211 Lecture 9
ICS3211 Lecture 9
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Waves keynote2c
Waves keynote2cWaves keynote2c
Waves keynote2c
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

ALLEA KWAN symposium Amsterdam 2011-12-14

  • 1. flickr, cc by-nc jobadge, 2011 Technical possibilities of detecting plagiarism - Comparative analysis of detection tools Katrin Köhler (B.SC.) Plagiarism - legal, moral and educational aspects, Amsterdam, 2011-12-14 Slides based on Debora Weber-Wulff, edited by Katrin Köhler
  • 2. About me • Research assistant of Prof. Dr. Weber-Wulff since 2007 • Sofware Test in 2008 and 2010 • Masterthesis about “Cryptographic Watermarking for Texts” 2 / 52
  • 3. Contents • Plagiarism Detection Test 2010 • Doctor Thesis of Karl-Theodor zu Guttenberg • Discovering plagiarism 3 / 52
  • 4. Teachers and administrations want an simple solution Photo: Flickr cc-by-nc-sa: xtrarant, 2008 Art Installation: Jamie Pawlus, Indianapolis, Indiana, 2003 4 / 52
  • 5. Many software companies are glad to help 5 / 52
  • 6. Plagiarism detection software • Can be extremely expensive! • Teachers want to have all papers marked original or plagiarism before they start reading them. • Students are afraid of wrongly being labeled plagiarists. • Only a teacher can decide if it is indeed plagiarism! Software cannot be used to solve social problems. • Prof. Dr. Weber-Wulff has tested plagiarism detection software 4.5 times: 2004, 2007, 2008, 2010 and zu Guttenberg’s thesis 6 /6 150
  • 7. Test process 2010 • 9 months of work with 2 persons • 42 test cases in English, German and Japanese • Different types of plagiarism, a few originals • Market survey • Access to the systems • 48 systems found, 26 could be completely evaluated 7 / 52
  • 8. Evaluation metric: Effectivness • Plagiarism or not: What was found? • Total • Without the first 10 tests (Google accident) • English cases • Japanese cases as additional challenge Flickr, cc-by, arthit, 2005 ➡No winner, continuous between 55% and 64 % 8 / 52
  • 9. Evaluation metric: Usability • Design, language consistency, navigation, labelling, print quality of the reports, fits in university processes • Support by email: Speed, good answers • Top: PlagScan, followed by PlagiarismFinder, Ephorus, PlagAware and TurnItIn Flickr, cc-by, Quapan, 2008 9 / 52
  • 10. Evaluation metric : Professionalism • Street address with town, telephone number, name of a person • Domain registration in own name Flickr, cc-by-sa, • No parallel offers of term papers or sludgegulper , 2008 pornography or advertising for such services • German-speaking availability by telephone during German working hours • No installation of viruses ➡ PlagiarismFinder, followed by PlagAware, Strike Plagiarism, TurnItIn, Docoloc, PlagScan, Blackboard 10 / 52
  • 11. Problems: Effectiveness • Nothing found from books - not even if they are in Google Books! • We had one 100% plagiarism from Google books register at less than 25% • Translations are not found 11 / 52
  • 12. Problems: Effectiveness • Umlauts cause problems, although less so than in earlier tests • Redacted texts are found less often • Many systems very difficult to use • Not all companies trustworthy • Some keep copies - and award themselves rights to use the text! 12 / 52
  • 13. Problems: Usability • Language mix • Workflow problems • The reports are generally not useful 13 / 52
  • 14. Problems: Professionalism • No info, no names • The address listed is a parking lot • Support questions not answered, telephone does not pick up • Offer term papers or pornography in parallel, all rights given to the company 14 / 52
  • 15. How to rank? • No system was best in all of the metrics • We set up a ranking for each of the five criteria (three effectiveness, one usability, one professionalism) • Calculated the average ranking 15 / 52
  • 16. Results: Useful • There were no systems in this category - only human are able to reach this level of effectiveness. Flickr, cc-by-nc, dianejp, 2009 16 / 52
  • 17. Results: Partially useful systems 17 / 52
  • 18. Partially useful: PlagAware • German System • Good documentation • Average effectiveness: 61% • But: each file must be submitted by itself (5 clicks!), this does not fit with the workflow • Looks for plagiarism in online texts 18 / 52
  • 20. Partially useful : turnitin • Best results for material that is stored in their database • Translation problems • Umlaut problems • Return Wikipedia copies with ads for porn • The source URLs reported are often no longer valid • Just adds up the percent values for the “originality” report • Only system to deal with Japanese properly 20 / 52
  • 25. turnitin remembers for a long time 25 / 52
  • 27. Partially useful: Ephorus • Dutch system • Direct mail-in using Hand-In-Code • Reports by E-Mail • Stores texts aggressively • Problems with umlauts 27 / 52
  • 30. Partially useful: PlagScan • Newcomer from Germany • One purchases “PlagPoints” • Useful: Subaccounts for teachers • First place in usability • Three kinds of report, none of which are a side-by-side report • Only 60% in effectiveness 30 / 52
  • 33. Partially useful: Urkund • Swedish system • Second in overall effectiveness • 13th in usability and professionalism • Language problems • Complex navigation • Catastrophic layout • Unusable reports • Cryptic error messages • Test cases from 2008 were still stored 33 / 52
  • 36. Barely useful Systems • They find something, but miss a lot • They are not really easy to use • They have professionalism problems • Docoloc, Copyscape, Blackboard/Safe Assign, Plagiarism Finder, Plagiarisma, Compilatio, StrikePlagiarism, The Plagiarism Checker 36 / 52
  • 37. Strange tales • checkforplagiarism.net • Viper cc-by-sa D. Weber-Wulff, 2009 37 / 52
  • 38. checkforplagiarism.net • In 2007 it was called iPlagiarismcheck.com • Was a plagiarism of turnitin, but they said: These are the sources! • Charge 15 € for 5 tests, students are the target group • turnitin set up a Honeypot 38 / 52
  • 40. Viper • Is installed on a PC • In the terms of use: You give us irrevocable rights to use your text as we see fit • Also runs a paper mill • Complicated reports • Only 24% effectiveness - better to throw a coin! • Advertise in the UK by power cleaning the sidewalks 40 / 52
  • 42. GuttenPlag Collaborative documentation of plagiarism 42 / 52
  • 43. The Extent of the Plagiarism • 135 sources • 94% of pages • 63% of lines 43 /43 150
  • 44. Test Results • 38 of the (at the time of the test) 131 known sources were found by at least one of the systems • Many of these sources (no longer) online • Over all of the possible sources were found: iThenticate 30 23 % PlagScan 19 15 % Urkund 16 12 % PlagAware 7 5% Ephorus 6 5% 44 /44 150
  • 45. We tested these systems on zu Guttenbergs thesis • The usability for such large works was extremely poor • The numbers appear to be random • Many sources throw a 404 “file not found” error with iThenticate • Nothing from books (or the Bundestag) was found 45 / 52
  • 46. The major problem is: • They don’t find plagiarism! Just (marginally changed) copies of text - even properly referenced! Flickr, cc-by-nc, Leeks, 2006 46 / 52
  • 47. So let’s have a look ourselves.... • But doesn’t the thesis have to be available digitally? • And the thesis is so long? • And the Internet is extremely large? Flickr, cc-by-nc-nd, t_buchtele, 2009 47 / 52
  • 48. Suspicion • Upon careful reading you find it nicely written, but ..... • The style is too polished, the vocabulary not that of your students. • There is some strange formatting • Interesting spelling errors • Lurching breaks in style Flickr, cc-by, redcctshirt, 2009 48 / 52
  • 49. Searching with Google & Co • Phrase in "..." • 3-5 nouns Flickr, cc-by-nc-nd, Athena1970, 2008 • The typo • Check the second page of hits • Set a time limit 49 / 52
  • 52. Thank you! • Portal Plagiarism http://plagiat.htw-berlin.de • Plagiarism-Blog: http://copy-shake-paste.blogspot.com/ c. 2011: Axel Völcker, DerWedding.de • Homepage: http://www.f4.htw-berlin.de/~weberwu/ • Kontakt: katrin.koehler@student.htw-berlin.de 52 / 52