SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Similarity of Source Code

in the Presence of Pervasive
Modifications
Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark
Centre for Research on Evolution, Search and Testing (CREST)
Dept. of Computer Science, UCL, London, UK
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Pervasive Modifications
2
/* ORIGINAL */
private static int partition

(Comparable[] a, int lo, int hi) {

int i = lo;

int j = hi+1;

Comparable v = a[lo];

while (true) {

while (less(a[++i], v)) {

if (i == hi) break;

}

while (less(v, a[--j])) {

if (j == lo) break;

}

if (i >= j) break;

exch(a, i, j);

}

exch(a, lo, j);

return j;

}
/* PERVASIVELY MODIFIED CODE */
private static int partition
(int[] bob, int left, int right){

int x = left;

int y = right+1;

for (;;) {

while (less(bob[left],bob[--y]))

if (y == left) break;

while (less(bob[++x],bob[left]))

if (x == right) break;

if (x >= y) break;

swap(bob, y, x);

}

swap(bob, y, left);

return y;

}
From: https://www.princeton.edu/pr/pub/integrity/pages/plagiarism/
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Pervasive Modifications
3
Changes affecting many locations in the whole method,
file, or project
Examples: layout changes, identifier renaming, API
changes, refactoring
Code cloning, software plagiarism, software evolution
But do not include (strong) code obfuscation
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 4
When source code is pervasively
modified, which similarity detection
techniques or tools get the most
accurate results?
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
30 Similarity Analysers
5
CCFinderX
iClones
Simian, NiCad
Deckard
Clone detectors
JPlag
Plaggie, Sherlock
Sim
Plagiarism detectors
7zncd, bzip2ncd
gzipncd, xz-ncd
icd, ncd
Compression
diff, bsdiff
difflib, fuzzywuzzy
jellyfish, ngram, sklearn
Others
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Test Data Generation
6
original
source
obfuscator
bytecode
obfuscator decompilers
InfixConverter.java
SqrtAlgorithm.java
Hanoi.java
Queens.java
MagicSquare.java
pervasively modified code
to be used in
detection phase
pervasively
modified code
compiler
javac
ARTIFICE
ProGuard Krakatau
Procyon
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Parameter Settings
7
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Similarity Report
8
InfC/
orig
InfC/
artfc
InfC/
orig
no
kraka
tau
InfC/
orig
no
procy
on
InfC/
orig
pg
kraka
tau
InfC/
orig
pg
procy
on
InfC/
artfc
no
kraka
tau
InfC/
artfc
no
procy
on
InfC/
artfc
pg
kraka
tau
InfC/
artfc
pg
procy
on
Sqrt/
orig
Sqrt/
artfc
… Squr/
artfc
pg
kraka
tau
Squr/
artfc
pg
procy
on
InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17
InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17
InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17
InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21
InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20
InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17
InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19
InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17
InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16
Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18
… … … … … … … … … … … … … … … …
Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32
Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Similarity Threshold = 50
9
InfC/
orig
InfC/
artfc
InfC/
orig
no
kraka
tau
InfC/
orig
no
procy
on
InfC/
orig
pg
kraka
tau
InfC/
orig
pg
procy
on
InfC/
artfc
no
kraka
tau
InfC/
artfc
no
procy
on
InfC/
artfc
pg
kraka
tau
InfC/
artfc
pg
procy
on
Sqrt/
orig
Sqrt/
artfc
… Squr/
artfc
pg
kraka
tau
Squr/
artfc
pg
procy
on
InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17
InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17
InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17
InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21
InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20
InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17
InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19
InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17
InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16
Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18
… … … … … … … … … … … … … … … …
Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32
Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Best Threshold
10
F-measure
0.00
0.23
0.45
0.68
0.90
Threshold Value (T)
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
31
F-measure = 0.8282
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Optimal Configuration
11
Best ThresholdBest Parameter Settings
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Results
12
Tool Settings T Acc Prec Rec AUC Prec@n F1
ccfx b=20,t=1 4 0.9640 0.9145 0.9040 0.9468 0.9040 0.9095
simjava r=22 5 0.9568 0.8769 0.9120 0.9490 0.8840 0.8941
jplag-text t=8 2 0.9408 0.8235 0.8960 0.9453 0.8440 0.8582
py-difflib noautojunk 35 0.9392 0.8901 0.7940 0.9147 0.8080 0.8393
7zncd-BZip2 mx=1 39 0.9368 0.8977 0.7720 0.9419 0.8180 0.8301
ncd-bzlib 31 0.9336 0.8584 0.8000 0.9482 0.8200 0.8282
jplag-java t=3 43 0.9160 0.7526 0.8640 0.9667 0.7860 0.8045
py-sklearn 33 0.8488 0.5894 0.8040 0.9146 0.6200 0.6802
ccfx
deckard
iclones
nicad
simian
jplag-java
jplag-text
plaggie
sherlock
simjava
simtext
7zncd-BZip2
7zncd-LZMA
7zncd-LZMA2
7zncd-Deflate
7zncd-Deflate64
7zncd-PPMd
bzip2ncd
gzipncd
icd
ncd-bzlib
ncd-zlib
xz-ncd
bsdiff
diff
py-difflib
py-fuzzywuzzy
py-jellyfish
py-ngram
py-sklearn
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1
Clone 

det.
Plag 

det.
Comp.
Others
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 14
Highly specialised source code similarity
detection techniques and tools can perform
better than more general, textual similarity
measures.
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Normalisation by Decompilation
15
javac
Krakatau
Procyon
Pervasively modified
code
Normalised
code
Normalisation
Compile
Decompile
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Code Before Decompilation
16
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Code After Decompilation
17
Clone 

det.
Plag 

det.
Comp.
Others
ccfx
deckard
iclones
nicad
simian
jplag-java
jplag-text
plaggie
sherlock
simjava
simtext
7zncd-BZip2
7zncd-LZMA
7zncd-LZMA2
7zncd-Deflate
7zncd-Deflate64
7zncd-PPMd
bzip2ncd
gzipncd
icd
ncd-bzlib
ncd-zlib
xz-ncd
bsdiff
diff
py-difflib
py-fuzzywuzzy
py-jellyfish
py-ngram
py-sklearn
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1
Orig.
Dec.
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 19
Compilation and decompilation can be used
as an effective normalisation method that
greatly improves similarity detection on Java
source code
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 20
Compilation and decompilation can be used as
an effective normalisation method that greatly
improves similarity detection on Java source code
Highly specialised source code similarity
detection techniques and tools can perform
better than more general, textual similarity
measures.
Similarity of Source Code

in the Presence of Pervasive Modifications
Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark — CREST, UCL
More info: http://crest.cs.ucl.ac.uk/resources/cloplag/

Weitere ähnliche Inhalte

Ähnlich wie Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16]

Using Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone DetectionUsing Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone DetectionChaiyong Ragkhitwetsagul
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Detailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocolsDetailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocolsChristian Spolaore
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
 
Self-Similarity in Complex Networks
Self-Similarity in Complex NetworksSelf-Similarity in Complex Networks
Self-Similarity in Complex Networksnorman_fahrer
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQLMark Wong
 
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]Chaiyong Ragkhitwetsagul
 
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Kimberly Aguada
 
GraphQL Relay Introduction
GraphQL Relay IntroductionGraphQL Relay Introduction
GraphQL Relay IntroductionChen-Tsu Lin
 
On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network ModelingYang Zhang
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataTony Fast
 
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...Jorge Quintanilla
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Jean-Paul Calbimonte
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Wesley De Neve
 

Ähnlich wie Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16] (20)

Using Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone DetectionUsing Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone Detection
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Detailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocolsDetailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocols
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
Self-Similarity in Complex Networks
Self-Similarity in Complex NetworksSelf-Similarity in Complex Networks
Self-Similarity in Complex Networks
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Cto cn
Cto cnCto cn
Cto cn
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQL
 
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
 
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
GraphQL Relay Introduction
GraphQL Relay IntroductionGraphQL Relay Introduction
GraphQL Relay Introduction
 
Ijetr021108
Ijetr021108Ijetr021108
Ijetr021108
 
Ijetr021108
Ijetr021108Ijetr021108
Ijetr021108
 
On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network Modeling
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
 

Kürzlich hochgeladen

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 

Kürzlich hochgeladen (20)

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 

Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16]

  • 1. Similarity of Source Code
 in the Presence of Pervasive Modifications Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark Centre for Research on Evolution, Search and Testing (CREST) Dept. of Computer Science, UCL, London, UK
  • 2. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Pervasive Modifications 2 /* ORIGINAL */ private static int partition
 (Comparable[] a, int lo, int hi) {
 int i = lo;
 int j = hi+1;
 Comparable v = a[lo];
 while (true) {
 while (less(a[++i], v)) {
 if (i == hi) break;
 }
 while (less(v, a[--j])) {
 if (j == lo) break;
 }
 if (i >= j) break;
 exch(a, i, j);
 }
 exch(a, lo, j);
 return j;
 } /* PERVASIVELY MODIFIED CODE */ private static int partition (int[] bob, int left, int right){
 int x = left;
 int y = right+1;
 for (;;) {
 while (less(bob[left],bob[--y]))
 if (y == left) break;
 while (less(bob[++x],bob[left]))
 if (x == right) break;
 if (x >= y) break;
 swap(bob, y, x);
 }
 swap(bob, y, left);
 return y;
 } From: https://www.princeton.edu/pr/pub/integrity/pages/plagiarism/
  • 3. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Pervasive Modifications 3 Changes affecting many locations in the whole method, file, or project Examples: layout changes, identifier renaming, API changes, refactoring Code cloning, software plagiarism, software evolution But do not include (strong) code obfuscation
  • 4. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 4 When source code is pervasively modified, which similarity detection techniques or tools get the most accurate results?
  • 5. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 30 Similarity Analysers 5 CCFinderX iClones Simian, NiCad Deckard Clone detectors JPlag Plaggie, Sherlock Sim Plagiarism detectors 7zncd, bzip2ncd gzipncd, xz-ncd icd, ncd Compression diff, bsdiff difflib, fuzzywuzzy jellyfish, ngram, sklearn Others
  • 6. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Test Data Generation 6 original source obfuscator bytecode obfuscator decompilers InfixConverter.java SqrtAlgorithm.java Hanoi.java Queens.java MagicSquare.java pervasively modified code to be used in detection phase pervasively modified code compiler javac ARTIFICE ProGuard Krakatau Procyon
  • 7. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Parameter Settings 7
  • 8. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Similarity Report 8 InfC/ orig InfC/ artfc InfC/ orig no kraka tau InfC/ orig no procy on InfC/ orig pg kraka tau InfC/ orig pg procy on InfC/ artfc no kraka tau InfC/ artfc no procy on InfC/ artfc pg kraka tau InfC/ artfc pg procy on Sqrt/ orig Sqrt/ artfc … Squr/ artfc pg kraka tau Squr/ artfc pg procy on InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17 InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17 InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17 InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21 InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20 InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17 InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19 InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17 InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16 Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18 … … … … … … … … … … … … … … … … Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32 Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
  • 9. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Similarity Threshold = 50 9 InfC/ orig InfC/ artfc InfC/ orig no kraka tau InfC/ orig no procy on InfC/ orig pg kraka tau InfC/ orig pg procy on InfC/ artfc no kraka tau InfC/ artfc no procy on InfC/ artfc pg kraka tau InfC/ artfc pg procy on Sqrt/ orig Sqrt/ artfc … Squr/ artfc pg kraka tau Squr/ artfc pg procy on InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17 InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17 InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17 InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21 InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20 InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17 InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19 InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17 InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16 Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18 … … … … … … … … … … … … … … … … Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32 Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
  • 10. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Best Threshold 10 F-measure 0.00 0.23 0.45 0.68 0.90 Threshold Value (T) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 31 F-measure = 0.8282
  • 11. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Optimal Configuration 11 Best ThresholdBest Parameter Settings
  • 12. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Results 12 Tool Settings T Acc Prec Rec AUC Prec@n F1 ccfx b=20,t=1 4 0.9640 0.9145 0.9040 0.9468 0.9040 0.9095 simjava r=22 5 0.9568 0.8769 0.9120 0.9490 0.8840 0.8941 jplag-text t=8 2 0.9408 0.8235 0.8960 0.9453 0.8440 0.8582 py-difflib noautojunk 35 0.9392 0.8901 0.7940 0.9147 0.8080 0.8393 7zncd-BZip2 mx=1 39 0.9368 0.8977 0.7720 0.9419 0.8180 0.8301 ncd-bzlib 31 0.9336 0.8584 0.8000 0.9482 0.8200 0.8282 jplag-java t=3 43 0.9160 0.7526 0.8640 0.9667 0.7860 0.8045 py-sklearn 33 0.8488 0.5894 0.8040 0.9146 0.6200 0.6802
  • 14. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 14 Highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures.
  • 15. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Normalisation by Decompilation 15 javac Krakatau Procyon Pervasively modified code Normalised code Normalisation Compile Decompile
  • 16. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Code Before Decompilation 16
  • 17. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Code After Decompilation 17
  • 19. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 19 Compilation and decompilation can be used as an effective normalisation method that greatly improves similarity detection on Java source code
  • 20. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 20 Compilation and decompilation can be used as an effective normalisation method that greatly improves similarity detection on Java source code Highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures. Similarity of Source Code
 in the Presence of Pervasive Modifications Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark — CREST, UCL More info: http://crest.cs.ucl.ac.uk/resources/cloplag/