SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Plagiarism Detection
Techniques
Nimisha .T
13MCA11030
Contents:
 Introduction
 Definition of Plagiarism
 Avoiding plagiarism
 Text based plagiarism detection techniques
 Tools used for text based plagiarism
 Source code based plagiarism detection techniques
 Tools used for code based plagiarism
 Disadvantages of the plagiarism detection technology
 Conclusion
1
Plagiarism Detection Techniques
Introduction
 Plagiarism is a significant problem on almost every
college and university campus.
 The problems of plagiarism go beyond the campus, and
have become an issue in industry, journalism, and
government activities.
2
Plagiarism Detection Techniques
Definition of Plagiarism
Plagiarize according to the Merriam-Webster Online
dictionary is:
 To steal and pass off the idea or words of another as one’s
own.
 To use another’s production without crediting the source
 To commit literary theft
 To present as new and original idea or product derived
from an existing source.
Plagiarism Detection Techniques 3
The following are considered
as Plagiarism:
 Turning in someone else’s work as your own.
 Copying words or ideas from someone else without
giving credit.
 Failing to put a quotation in quotation marks
 Giving incorrect information about the source of a
quotation.
 Changing words but copying sentence structure.
 Copying so many words or ideas from a source that it
makes up the majority of your work, even though by
credit.
4
Plagiarism Detection Techniques
Deliberate and Accidental
Plagiarism
Deliberate (intentional) Plagiarism :
Steals the property of somebody else and claims it
to be his own.
Accidental (unintentional) Plagiarism :
Somebody unknowingly cites a phrase or copies
words without acknowledging the author of the
material.
5
Plagiarism Detection Techniques
Deliberate and Accidental
Plagiarism
6
Plagiarism Detection Techniques
Avoiding plagiarism
Two methods :
 Plagiarism prevention
 Plagiarism detection
7
Plagiarism Detection Techniques
Plagiarism Prevention
 Collaborative effort for recognize and counter plagiarism
at every level.
 Educate students about the appropriate use of intellectual
material.
 Minimize the possibility of submission of plagiarized
content.
 Plagiarism prevention is difficult to achieve & also take
a long time.
8
Plagiarism Detection Techniques
Plagiarism Detection
9
Plagiarism Detection Techniques
Culwin and Lancaster’s four stages of
detecting plagiarism:
Plagiarism Detection technique
10
Plagiarism Detection Techniques
 Text based plagiarism detection techniques
 Source code based plagiarism detection techniques
Text based plagiarism
detection techniques
 Substring matching
 Keyword similarity
 Exact fingerprint match
 Text parsing
11
Plagiarism Detection Techniques
Substring Matching
 Try to identify maximal matches in pairs
 which then are used as plagiarism indicators.
 Typically, the substrings are represented in suffix trees.
 Graph-based measures are employed to capture the
fraction of the plagiarized sections.
12
Plagiarism Detection Techniques
Keyword Similarity
 Extract topic identifying keywords from a document.
 Compare with keywords of other document.
 If the similarity exceeds a threshold, the candidate
documents are divided into smaller pieces.
 which are then compared recursively.
 This approach assumes that plagiarism usually happens
in topically similar documents.
13
Plagiarism Detection Techniques
Exact Fingerprint Match
 The documents are partitioned into term sequences
called chunks.
 which then are used as plagiarism indicators.
 from which digital digests are computed that form the
document’s fingerprint.
 digests are inserted into a hash table then collisions
indicate matching sequences.
 For the fingerprint computation some standard hashing
suffers from two severe problems:
 Computationally expensive,
 A small chunk size (3-10 words) must be chosen
to identify matching passages
14
Plagiarism Detection Techniques
Text parsing
 Any sentence of the text can be automatically
represented in the form of the tree.
 which reflects the structure of the sentence
 Example: The phrase the monkey ate the banana will
be parsed by such software as,
┌──────SENTENCE─────┐
SUBJECT └─VERB OBJECT
ARTICLE └─ ate ARTICLE
└─ the └─ the
NOUN NOUN
└─monkey └─ banana
15
Plagiarism Detection Techniques
Text parsing (Continue…)
 Once a parse tree is created, we can invoke a tree
matching procedure
 Initially the algorithm builds a flowchart-styled parse
tree for each file to be analyzed
 Then for each pair of files, the algorithm performs a
rough “abstract comparison”, when only types of the
parse tree elements ( like Assignment, Loop, Branching)
are taken into account.
 This is done recursively for the each level of tree nodes
If the similarity percentage becomes lower, the trees are
immediately treated as not similar.
16
Plagiarism Detection Techniques
Text parsing (Continue…)
 If the abstract comparison indicates enough similarity,
a special low-level “micro comparison” procedure is
invoked.
 Each node represents an individual statement
 Each tree node turns into a separate sub tree that has to
be compared with the corresponding sub tree taken from
another file.
 E.g. the phrases the monkey ate the banana and the
banana was eaten by the monkey will be very close after
the tokenization.
17
Plagiarism Detection Techniques
Tools used for text based plagiarism
Some tools are:
 PlagAware
 PlagScan
 CheckForPlagiarism.net
 iThenticate
 PlagiarismDetection.org
18
Plagiarism Detection Techniques
PlagAware
 Is an online-service used for plagiarism detection
 It can search, find, analyze and trace plagiarism in the
specified topic similar to the topics
 PlagAware is a search Engine
 provide different types of report that help the user to
decide that is his document has been plagiarized or not
 Mainly used in academic filed
 Multiple Document Comparison
 Does not support synonym and sentence structure
checking.
19
Plagiarism Detection Techniques
PlagScan
 It is online software used for textual plagiarism checker
 Complex algorithms for checking and analyzing
uploaded document
 Unique signature extracted from the document’s
structure that is then compared with PlagScan database
and millions of online documents.
 Detect most of plagiarism types either directs copy and
paste or words switching
 PlagScan supports all the language that use the
international UTF-8 encoding and all language with
Latin or Arabic characters
20
Plagiarism Detection Techniques
CheckForPlagiarism.net
 One of the best online plagiarism checkers that used to
stop or prevention of online plagiarism.
 The fingerprint-based approach used to analyze and
summarize collection of document and create a kind of
fingerprint for it.
 Uses its own database that include millions documents
and articles over World Wide Web
 Support synonym and sentence structure checking
 Can compare set of different documents simultaneously
with other documents
21
Plagiarism Detection Techniques
iThenticate
 One of the application or services designed especially
for the researchers and authors’ publisher
 It have own database that contain millions of documents
 Users who have account can do either online and
offline comparison of submitted documents against it
and to identify plagiarized content.
 Considered as the first online plagiarism checker
 Document to document and multiple documents checking
 Supports more than 30 languages
 Does not support synonym and sentence structure
checking
22
Plagiarism Detection Techniques
PlagiarismDetection.org
 It is an online service provides high level of accuracy
result in plagiarism detection
 Use its own database that contains millions of documents
 Supports English languages and all languages that using
Latin characters
 Does not support multiple document comparison
 Does not support synonym and sentence structure
checking
23
Plagiarism Detection Techniques
Source code based plagiarism
detection techniques
 Lexical Similarities
 Parse Tree Similarities
 Program Dependence Graphs
 Metrics
24
Plagiarism Detection Techniques
Lexical Similarities
 Converts source code into a stream of lexical tokens
from which compiler extract meaning from the source.
 During the lexical analysis phase, the source code
undergoes a series of transformation
 Some of these transformations, such as the identification
of reserved words, identifiers are beneficial for plagiarism
detection.
Plagiarism Detection Techniques 25
Lexical Similarities (Continue…)
 Consider the following two snippets of Java Code:
Plagiarism Detection Techniques 26
int[] A = {1,2,3,4};
for(int i = 0; i < A.length; i++)
{
A[i] = A[i] + 1;
}
int[] B = {1, 2, 3, 4};
for(int j = 0; j < B.length; j++)
{
B[j] = B[j] + 1;
}
Lexical Similarities (Continue…)
The lexical stream of the 2 snippets of code is :
LITERAL_int LBRACK RBRACK IDENT ASSIGN
LCURLY NUM_INT COMMA NUM_INT COMMA
NUM_INT COMMA NUM_INT RCURLY SEMI
LITERAL_for LPAREN LITERAL_int IDENT ASSIGN
NUM_INT SEMI IDENT LT IDENT DOT IDENT SEMI
IDENT INC RPAREN LCURLY NUM_INT SEMI
Both the java snippets will have the exact lexical stream
Plagiarism Detection Techniques 27
0
Parse Tree Similarities
 The parse tree or derivation tree built from the lexical
for a program also exhibits structure for a given program
 A compiler, during the compilation process builds a
parse tree which represents the program.
 The parse tree will have the same structure for both the
snippet of code as the lexical streams are same.
 An algorithm for detecting plagiarism using this method
would first, parse each program.
Next, for each pair of parse trees, it attempts to find as
many common sub trees as possible.
Use this number as a measure of similarity between the
two programs
Plagiarism Detection Techniques 28
Plagiarism Detection Techniques 29
Program Dependence Graphs(PDG)
 PDG is a graph representation of the source Code
 It is a directed, labeled graph which represents the data
and the control dependencies within one procedure.
 Basic statements like variable declarations, assignments,
and procedure calls are represented by vertices in PDGs.
 It depicts how the data flows between statements and
how statements are controlled by other statements.
 The data and control dependencies between statements
are represented by edges between vertices in PDGs
 Data and control dependencies are plotted in solid and
dashed lines respectively.
Plagiarism Detection Techniques 30
Program Dependence Graphs
(Continue…)
Plagiarism Detection Techniques 31
Example:
Metrics
 Plagiarism detection by similarity analysis using
software metrics.
 Software metrics are:
 Number of function calls
 Number of used or defined local variables
 Number of used or defined non-local variables
 Number of parameters
 Number of statements
 Number of branches
 Number of loops
Plagiarism Detection Techniques 32
Metrics (Continue…)
 Each fragment characterized by a set of features
measured by metrics
 Metrics computation requires the parsing of source code
to identifying interesting fragments
 Metrics are simple to calculate and can be compared
quickly
 False positives: two fragments with the same scores on a
set of metrics may do entirely different things.
Plagiarism Detection Techniques 33
Tools used for code based
plagiarism
 MOSS
 JPlag
 CodeMatch
Plagiarism Detection Techniques 34
MOSS (Measure of Software Similarity)
 Can be applied to a range of programming languages
 Registered instructors can submit batches of programs to
the moss server.
 Result is placed on a web page on the moss web server.
 A link to that web page is returned when checking the
document is finished.
 The MOSS database stores an internal representation of
programs, and then looks for similarities between them.
35Plagiarism Detection Techniques
JPlag
 JPlag compares submitted programs in pairs
 It assumes that plagiarists may vary the names of
variables or classes, but they are least likely to change
the control structure of a program.
 It presents its results as a set of HTML pages.
 The pages are sent back to the client and stored locally.
 JPlag was easier to use but supported fewer languages
than MOSS
36Plagiarism Detection Techniques
CodeMatch
 Compares thousands of source code files in multiple
directories and subdirectories
 Determine those files which are closely correlated.
 Useful for finding open source code within
proprietary code.
 Discovering common standard algorithms within
different programs.
37Plagiarism Detection Techniques
Disadvantages of the plagiarism
detection technology
 Plagiarism detection systems are built based on a few
languages. To check for plagiarism with the same
software can be difficult.
 Most of the detection software checking is done with
some repository situated in an organization. Other
people are unable to access it and verify for plagiarism.
 As the number of digital copies are going up the
repository size should be large and the plagiarism
detection software should be able to handle it.
 Some software ask us to load a file to their link .The
file is copied to their database . This cause our data
being leaked or hacked for other purposes.
38
Plagiarism Detection Techniques
Conclusion
 Plagiarism is rampant now. With most of the data
available to us in digital format the venues for
plagiarism is opening up.
 To avoid this kind of cheating and to acknowledge
the originality of the author new detection techniques
are to be created.
 To protect the intellectual property source code new
techniques are to be developed and implemented.
39Plagiarism Detection Techniques
References
39Plagiarism Detection Techniques
 http://www.ijirae.com/volumes/vol1/issue7/AUCS10085.
06.pdf
 http://dspace.cusat.ac.in/jspui/bitstream/123456789/3618/
1/PDT.pdf
 http://elearningindustry.com/top-10-free-plagiarism-
detection-tools-for-teachers
 http://www.plagiarism.org/plagiarism-101/what-is-
plagiarism/
 http://www.cs.uu.nl/research/techreps/repo/CS2010/2010-
015.Pdf
 https://en.wikipedia.org/wiki/Category:Plagiarism_detecto
rs
10
Plagiarism Detection Techniques
?
10
Plagiarism Detection Techniques

Weitere ähnliche Inhalte

Was ist angesagt?

Plagiarism and its detection
Plagiarism and its detectionPlagiarism and its detection
Plagiarism and its detection
ankit_saluja
 

Was ist angesagt? (20)

h-index
h-indexh-index
h-index
 
journal and impact factor
journal and impact factorjournal and impact factor
journal and impact factor
 
COPE General Intro Core Practices
COPE General Intro Core PracticesCOPE General Intro Core Practices
COPE General Intro Core Practices
 
Plagiarism:-Types and Causes
Plagiarism:-Types and CausesPlagiarism:-Types and Causes
Plagiarism:-Types and Causes
 
Introduction to COPE and Publication Ethics
Introduction to COPE and Publication EthicsIntroduction to COPE and Publication Ethics
Introduction to COPE and Publication Ethics
 
Plagiarism and its detection
Plagiarism and its detectionPlagiarism and its detection
Plagiarism and its detection
 
Avoid salami slicing and duplicate publication
Avoid salami slicing and duplicate publicationAvoid salami slicing and duplicate publication
Avoid salami slicing and duplicate publication
 
Publication ethics
Publication ethicsPublication ethics
Publication ethics
 
Selective Reporting and Misrepresentation of Data
Selective Reporting and Misrepresentation of DataSelective Reporting and Misrepresentation of Data
Selective Reporting and Misrepresentation of Data
 
Publication ethics
Publication ethicsPublication ethics
Publication ethics
 
Open Access Publishing
Open Access PublishingOpen Access Publishing
Open Access Publishing
 
Research and publication ethics
Research and publication ethicsResearch and publication ethics
Research and publication ethics
 
IDENTIFICATION OF PUBLICATION MISCONDUCT, COMPLAINTS & APPEALS IN ETHICS
IDENTIFICATION OF PUBLICATION MISCONDUCT, COMPLAINTS & APPEALS IN ETHICSIDENTIFICATION OF PUBLICATION MISCONDUCT, COMPLAINTS & APPEALS IN ETHICS
IDENTIFICATION OF PUBLICATION MISCONDUCT, COMPLAINTS & APPEALS IN ETHICS
 
Open Access Initiatives
Open Access Initiatives Open Access Initiatives
Open Access Initiatives
 
Publication ethics
Publication ethics Publication ethics
Publication ethics
 
Scientific misconduct
Scientific misconductScientific misconduct
Scientific misconduct
 
What is salami slicing
What is salami slicingWhat is salami slicing
What is salami slicing
 
Plagiarism
PlagiarismPlagiarism
Plagiarism
 
Predatory publishers and journals
Predatory publishers and journalsPredatory publishers and journals
Predatory publishers and journals
 
REDUNDANT PUBLICATION IN RESEARCH
REDUNDANT PUBLICATION IN RESEARCHREDUNDANT PUBLICATION IN RESEARCH
REDUNDANT PUBLICATION IN RESEARCH
 

Ähnlich wie plagiarism detection tools and techniques

RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docxRESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
infantkimber
 
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Waqas Tariq
 
csmalware_malware
csmalware_malwarecsmalware_malware
csmalware_malware
Joshua Saxe
 
FNC Corporate Protect Workshop
FNC Corporate Protect WorkshopFNC Corporate Protect Workshop
FNC Corporate Protect Workshop
forensicsnation
 
03.fnc corporate protect workshop new
03.fnc corporate protect workshop new03.fnc corporate protect workshop new
03.fnc corporate protect workshop new
forensicsnation
 

Ähnlich wie plagiarism detection tools and techniques (20)

A Survey On Plagiarism Detection
A Survey On Plagiarism DetectionA Survey On Plagiarism Detection
A Survey On Plagiarism Detection
 
A Review Of Plagiarism Detection Based On Lexical And Semantic Approach
A Review Of Plagiarism Detection Based On Lexical And Semantic ApproachA Review Of Plagiarism Detection Based On Lexical And Semantic Approach
A Review Of Plagiarism Detection Based On Lexical And Semantic Approach
 
Plag detection
Plag detectionPlag detection
Plag detection
 
A framework for plagiarism
A framework for plagiarismA framework for plagiarism
A framework for plagiarism
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
 
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docxRESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
 
Cognitive Security: How Artificial Intelligence is Your New Best Friend
Cognitive Security: How Artificial Intelligence is Your New Best FriendCognitive Security: How Artificial Intelligence is Your New Best Friend
Cognitive Security: How Artificial Intelligence is Your New Best Friend
 
Malwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionMalwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant Extraction
 
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
 
Research Report
Research ReportResearch Report
Research Report
 
Importance of String in Programming Languages.pptx
Importance of String in Programming Languages.pptxImportance of String in Programming Languages.pptx
Importance of String in Programming Languages.pptx
 
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
 
csmalware_malware
csmalware_malwarecsmalware_malware
csmalware_malware
 
A Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfA Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdf
 
Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...
 
Plagiarism Preventive Initiatives and AI Tools......docx
Plagiarism Preventive Initiatives and AI Tools......docxPlagiarism Preventive Initiatives and AI Tools......docx
Plagiarism Preventive Initiatives and AI Tools......docx
 
FNC Corporate Protect Workshop
FNC Corporate Protect WorkshopFNC Corporate Protect Workshop
FNC Corporate Protect Workshop
 
Classification with R
Classification with RClassification with R
Classification with R
 
03.fnc corporate protect workshop new
03.fnc corporate protect workshop new03.fnc corporate protect workshop new
03.fnc corporate protect workshop new
 
FNC Corporate Protect
FNC Corporate ProtectFNC Corporate Protect
FNC Corporate Protect
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

plagiarism detection tools and techniques

  • 2. Contents:  Introduction  Definition of Plagiarism  Avoiding plagiarism  Text based plagiarism detection techniques  Tools used for text based plagiarism  Source code based plagiarism detection techniques  Tools used for code based plagiarism  Disadvantages of the plagiarism detection technology  Conclusion 1 Plagiarism Detection Techniques
  • 3. Introduction  Plagiarism is a significant problem on almost every college and university campus.  The problems of plagiarism go beyond the campus, and have become an issue in industry, journalism, and government activities. 2 Plagiarism Detection Techniques
  • 4. Definition of Plagiarism Plagiarize according to the Merriam-Webster Online dictionary is:  To steal and pass off the idea or words of another as one’s own.  To use another’s production without crediting the source  To commit literary theft  To present as new and original idea or product derived from an existing source. Plagiarism Detection Techniques 3
  • 5. The following are considered as Plagiarism:  Turning in someone else’s work as your own.  Copying words or ideas from someone else without giving credit.  Failing to put a quotation in quotation marks  Giving incorrect information about the source of a quotation.  Changing words but copying sentence structure.  Copying so many words or ideas from a source that it makes up the majority of your work, even though by credit. 4 Plagiarism Detection Techniques
  • 6. Deliberate and Accidental Plagiarism Deliberate (intentional) Plagiarism : Steals the property of somebody else and claims it to be his own. Accidental (unintentional) Plagiarism : Somebody unknowingly cites a phrase or copies words without acknowledging the author of the material. 5 Plagiarism Detection Techniques
  • 8. Avoiding plagiarism Two methods :  Plagiarism prevention  Plagiarism detection 7 Plagiarism Detection Techniques
  • 9. Plagiarism Prevention  Collaborative effort for recognize and counter plagiarism at every level.  Educate students about the appropriate use of intellectual material.  Minimize the possibility of submission of plagiarized content.  Plagiarism prevention is difficult to achieve & also take a long time. 8 Plagiarism Detection Techniques
  • 10. Plagiarism Detection 9 Plagiarism Detection Techniques Culwin and Lancaster’s four stages of detecting plagiarism:
  • 11. Plagiarism Detection technique 10 Plagiarism Detection Techniques  Text based plagiarism detection techniques  Source code based plagiarism detection techniques
  • 12. Text based plagiarism detection techniques  Substring matching  Keyword similarity  Exact fingerprint match  Text parsing 11 Plagiarism Detection Techniques
  • 13. Substring Matching  Try to identify maximal matches in pairs  which then are used as plagiarism indicators.  Typically, the substrings are represented in suffix trees.  Graph-based measures are employed to capture the fraction of the plagiarized sections. 12 Plagiarism Detection Techniques
  • 14. Keyword Similarity  Extract topic identifying keywords from a document.  Compare with keywords of other document.  If the similarity exceeds a threshold, the candidate documents are divided into smaller pieces.  which are then compared recursively.  This approach assumes that plagiarism usually happens in topically similar documents. 13 Plagiarism Detection Techniques
  • 15. Exact Fingerprint Match  The documents are partitioned into term sequences called chunks.  which then are used as plagiarism indicators.  from which digital digests are computed that form the document’s fingerprint.  digests are inserted into a hash table then collisions indicate matching sequences.  For the fingerprint computation some standard hashing suffers from two severe problems:  Computationally expensive,  A small chunk size (3-10 words) must be chosen to identify matching passages 14 Plagiarism Detection Techniques
  • 16. Text parsing  Any sentence of the text can be automatically represented in the form of the tree.  which reflects the structure of the sentence  Example: The phrase the monkey ate the banana will be parsed by such software as, ┌──────SENTENCE─────┐ SUBJECT └─VERB OBJECT ARTICLE └─ ate ARTICLE └─ the └─ the NOUN NOUN └─monkey └─ banana 15 Plagiarism Detection Techniques
  • 17. Text parsing (Continue…)  Once a parse tree is created, we can invoke a tree matching procedure  Initially the algorithm builds a flowchart-styled parse tree for each file to be analyzed  Then for each pair of files, the algorithm performs a rough “abstract comparison”, when only types of the parse tree elements ( like Assignment, Loop, Branching) are taken into account.  This is done recursively for the each level of tree nodes If the similarity percentage becomes lower, the trees are immediately treated as not similar. 16 Plagiarism Detection Techniques
  • 18. Text parsing (Continue…)  If the abstract comparison indicates enough similarity, a special low-level “micro comparison” procedure is invoked.  Each node represents an individual statement  Each tree node turns into a separate sub tree that has to be compared with the corresponding sub tree taken from another file.  E.g. the phrases the monkey ate the banana and the banana was eaten by the monkey will be very close after the tokenization. 17 Plagiarism Detection Techniques
  • 19. Tools used for text based plagiarism Some tools are:  PlagAware  PlagScan  CheckForPlagiarism.net  iThenticate  PlagiarismDetection.org 18 Plagiarism Detection Techniques
  • 20. PlagAware  Is an online-service used for plagiarism detection  It can search, find, analyze and trace plagiarism in the specified topic similar to the topics  PlagAware is a search Engine  provide different types of report that help the user to decide that is his document has been plagiarized or not  Mainly used in academic filed  Multiple Document Comparison  Does not support synonym and sentence structure checking. 19 Plagiarism Detection Techniques
  • 21. PlagScan  It is online software used for textual plagiarism checker  Complex algorithms for checking and analyzing uploaded document  Unique signature extracted from the document’s structure that is then compared with PlagScan database and millions of online documents.  Detect most of plagiarism types either directs copy and paste or words switching  PlagScan supports all the language that use the international UTF-8 encoding and all language with Latin or Arabic characters 20 Plagiarism Detection Techniques
  • 22. CheckForPlagiarism.net  One of the best online plagiarism checkers that used to stop or prevention of online plagiarism.  The fingerprint-based approach used to analyze and summarize collection of document and create a kind of fingerprint for it.  Uses its own database that include millions documents and articles over World Wide Web  Support synonym and sentence structure checking  Can compare set of different documents simultaneously with other documents 21 Plagiarism Detection Techniques
  • 23. iThenticate  One of the application or services designed especially for the researchers and authors’ publisher  It have own database that contain millions of documents  Users who have account can do either online and offline comparison of submitted documents against it and to identify plagiarized content.  Considered as the first online plagiarism checker  Document to document and multiple documents checking  Supports more than 30 languages  Does not support synonym and sentence structure checking 22 Plagiarism Detection Techniques
  • 24. PlagiarismDetection.org  It is an online service provides high level of accuracy result in plagiarism detection  Use its own database that contains millions of documents  Supports English languages and all languages that using Latin characters  Does not support multiple document comparison  Does not support synonym and sentence structure checking 23 Plagiarism Detection Techniques
  • 25. Source code based plagiarism detection techniques  Lexical Similarities  Parse Tree Similarities  Program Dependence Graphs  Metrics 24 Plagiarism Detection Techniques
  • 26. Lexical Similarities  Converts source code into a stream of lexical tokens from which compiler extract meaning from the source.  During the lexical analysis phase, the source code undergoes a series of transformation  Some of these transformations, such as the identification of reserved words, identifiers are beneficial for plagiarism detection. Plagiarism Detection Techniques 25
  • 27. Lexical Similarities (Continue…)  Consider the following two snippets of Java Code: Plagiarism Detection Techniques 26 int[] A = {1,2,3,4}; for(int i = 0; i < A.length; i++) { A[i] = A[i] + 1; } int[] B = {1, 2, 3, 4}; for(int j = 0; j < B.length; j++) { B[j] = B[j] + 1; }
  • 28. Lexical Similarities (Continue…) The lexical stream of the 2 snippets of code is : LITERAL_int LBRACK RBRACK IDENT ASSIGN LCURLY NUM_INT COMMA NUM_INT COMMA NUM_INT COMMA NUM_INT RCURLY SEMI LITERAL_for LPAREN LITERAL_int IDENT ASSIGN NUM_INT SEMI IDENT LT IDENT DOT IDENT SEMI IDENT INC RPAREN LCURLY NUM_INT SEMI Both the java snippets will have the exact lexical stream Plagiarism Detection Techniques 27 0
  • 29. Parse Tree Similarities  The parse tree or derivation tree built from the lexical for a program also exhibits structure for a given program  A compiler, during the compilation process builds a parse tree which represents the program.  The parse tree will have the same structure for both the snippet of code as the lexical streams are same.  An algorithm for detecting plagiarism using this method would first, parse each program. Next, for each pair of parse trees, it attempts to find as many common sub trees as possible. Use this number as a measure of similarity between the two programs Plagiarism Detection Techniques 28
  • 31. Program Dependence Graphs(PDG)  PDG is a graph representation of the source Code  It is a directed, labeled graph which represents the data and the control dependencies within one procedure.  Basic statements like variable declarations, assignments, and procedure calls are represented by vertices in PDGs.  It depicts how the data flows between statements and how statements are controlled by other statements.  The data and control dependencies between statements are represented by edges between vertices in PDGs  Data and control dependencies are plotted in solid and dashed lines respectively. Plagiarism Detection Techniques 30
  • 32. Program Dependence Graphs (Continue…) Plagiarism Detection Techniques 31 Example:
  • 33. Metrics  Plagiarism detection by similarity analysis using software metrics.  Software metrics are:  Number of function calls  Number of used or defined local variables  Number of used or defined non-local variables  Number of parameters  Number of statements  Number of branches  Number of loops Plagiarism Detection Techniques 32
  • 34. Metrics (Continue…)  Each fragment characterized by a set of features measured by metrics  Metrics computation requires the parsing of source code to identifying interesting fragments  Metrics are simple to calculate and can be compared quickly  False positives: two fragments with the same scores on a set of metrics may do entirely different things. Plagiarism Detection Techniques 33
  • 35. Tools used for code based plagiarism  MOSS  JPlag  CodeMatch Plagiarism Detection Techniques 34
  • 36. MOSS (Measure of Software Similarity)  Can be applied to a range of programming languages  Registered instructors can submit batches of programs to the moss server.  Result is placed on a web page on the moss web server.  A link to that web page is returned when checking the document is finished.  The MOSS database stores an internal representation of programs, and then looks for similarities between them. 35Plagiarism Detection Techniques
  • 37. JPlag  JPlag compares submitted programs in pairs  It assumes that plagiarists may vary the names of variables or classes, but they are least likely to change the control structure of a program.  It presents its results as a set of HTML pages.  The pages are sent back to the client and stored locally.  JPlag was easier to use but supported fewer languages than MOSS 36Plagiarism Detection Techniques
  • 38. CodeMatch  Compares thousands of source code files in multiple directories and subdirectories  Determine those files which are closely correlated.  Useful for finding open source code within proprietary code.  Discovering common standard algorithms within different programs. 37Plagiarism Detection Techniques
  • 39. Disadvantages of the plagiarism detection technology  Plagiarism detection systems are built based on a few languages. To check for plagiarism with the same software can be difficult.  Most of the detection software checking is done with some repository situated in an organization. Other people are unable to access it and verify for plagiarism.  As the number of digital copies are going up the repository size should be large and the plagiarism detection software should be able to handle it.  Some software ask us to load a file to their link .The file is copied to their database . This cause our data being leaked or hacked for other purposes. 38 Plagiarism Detection Techniques
  • 40. Conclusion  Plagiarism is rampant now. With most of the data available to us in digital format the venues for plagiarism is opening up.  To avoid this kind of cheating and to acknowledge the originality of the author new detection techniques are to be created.  To protect the intellectual property source code new techniques are to be developed and implemented. 39Plagiarism Detection Techniques
  • 41. References 39Plagiarism Detection Techniques  http://www.ijirae.com/volumes/vol1/issue7/AUCS10085. 06.pdf  http://dspace.cusat.ac.in/jspui/bitstream/123456789/3618/ 1/PDT.pdf  http://elearningindustry.com/top-10-free-plagiarism- detection-tools-for-teachers  http://www.plagiarism.org/plagiarism-101/what-is- plagiarism/  http://www.cs.uu.nl/research/techreps/repo/CS2010/2010- 015.Pdf  https://en.wikipedia.org/wiki/Category:Plagiarism_detecto rs