SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Evaluating a bot detection model
on git commit messages
Mehdi Golzadeh Alexandre Decan Tom Mens
University of Mons - Belgium
BENEVOL 2020
THE 19TH BELGIUM-NETHERLANDS-LUXEMBURG SOFTWARE EVOLUTION WORKSHOP
Socio-technical background
1/14
Detecting bots in distributed
software development
activities is very important
identity merging, team productivity, development effort,
developer onboarding or abandonment, …
Pull requests are more than twice as likely to get accepted if bots are
present in their commenting activity
2/14
Idea behind the study
Mean distance between comments
3/14
Idea behind the study
c c
c
c c
c
c
c
Comment patterns
4/14
Bot identification
Available ground-truth
datasets
Download comments from GitHub
136,529 repositories
10,874,611 issue and PR commentsSelect an account
Rate (by 2nd rater)Rate (by 1st rater)
Discussion
Rate (3rd rater)
[disagreement]
[difficult case]
[disagreement]
[agreement][agreement]
[include account]
[exclude account]
Current methods and Tools
5/14
Classification of accounts
Number of
comments
Grid-search cross-validation,
Random forest classifier – Best score
k-nearest neighbors
Support vector machine
Decision tree classifier
Logistic regression
Number of empty
comments
Comment
patterns
Inequality between
comments in patterns
Ground truth dataset
of 527 bots and 4473 humans
An approach based on characteristics of comments
6/14
Evaluation of the classifier
We identified 4 categories
of misclassified humans
Classification model for detecting bots
Based on issue and PR comments
Precision: 0.94 Precision: 0.99 Precision: 0.98
Recall: 0.91 Recall: 0.99 Recall: 0.98
F1-score: 0.92 F1-score: 0.99 F1-score: 0.98
We identified 3 categories
of misclassified bots
M. Golzadeh, A. Decan, D. Legay, and T. Mens, “A ground-truthdataset and classification model for detecting bots in GitHub issueand PR comments,”Journal of Systems and Software [Submitted for review], 2020.
7/14
Bots in commits
8/14
Detecting and Characterizing
Bots that Commit Code[2]
2- T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filipova, andA. Mockus, “Detecting and characterizing bots that commit code,” inInt’l Conf. Mining Software Repositories, 2020.
Ground-truth dataset of 13,150 bots and 13,150 humans
BIMAN
is validated on a dataset of 67 bots and 67 humans
58 cases were correctly detected (87%) AUC-ROC: 0.89
BIN
The presence of the
string “bot” at the end
of the author name
Precision: 0.99
Recall: 0.37
BIM
Commit messages
AUC-ROC:0.70
Precision:0.57
Recall:0.67
BICA
Files changed by each commit
The projects that commit is associated with
timestamp and time zone of the commits
AUC-ROC: 0.89
BIMAN
9/14
Evaluationofthemodels Ground-truth dataset
Activities of an account in a single repository
Only consider contributors that have at least 10 commit messages
3,380 3,542
Compute features for each contributor
Model trained on PR and issues comments
Model trained on commit messages
Precision: 0.76 Precision: 0.78 Precision: 0.77
Recall: 0.78 Recall: 0.76 Recall: 0.77
F1-score: 0.77 F1-score: 0.77 F1-score: 0.77
Precision: 0.82 Precision: 0.78 Precision: 0.80
Recall: 0.75 Recall: 0.84 Recall: 0.80
F1-score: 0.78 F1-score: 0.81 F1-score: 0.80 10/14
A bot detector based on git commits
Command-line tool in Python.
Given a git repository, predicts the type of
Authors/Committers, Export to CSV or JSON
https://github.com/mehdigolzadeh/BoDeGiC
[ …. ]
Inputs BoDeGic Output
Extract
Git commit
messages
Extracting
features
Pre-trained
classifier
Git
repository
Authors
list
Parameters
Bot prediction
[ ]
BoDeGiC
11/14
Discussions and Threats
Better definition of
“what a bot is” is required
Identify bots not at the level of a contributor
but at the level of activities
The ground-truth dataset
A source of threat to construct validity
19 out of 100 cases wrongly labeled in the original dataset
13 Bots 6 Humans
Presence of mixed accounts
12/14
• Detecting bots is important to avoid bias in socio-technical studies
• An approach and a model to identify bots
• To which extent this approach performs on git commit messages
• Evaluation of the existing model
• Train a new classifier on commit messages and evaluate the approach
• A tool to identify bots in git repositories
• Substitute BIM by our classification model in the BIMAN approach[2]
Sum up
2- T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filipova, andA. Mockus, “Detecting and characterizing bots that commit code,” inInt’l Conf. Mining Software Repositories, 2020.
13/14
The FNRS-FWO Excellence of Science research project SECO-ASSIST
Thank you
14/14

Weitere ähnliche Inhalte

Ähnlich wie Evaluating Bot Detection Models on Code Commits

Detection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsDetection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsCSCJournals
 
Build a mobile chatbot with Xamarin
Build a mobile chatbot with XamarinBuild a mobile chatbot with Xamarin
Build a mobile chatbot with XamarinLuis Beltran
 
Tracing Back The Botmaster
Tracing Back The BotmasterTracing Back The Botmaster
Tracing Back The BotmasterIJERA Editor
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationIJERA Editor
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentTom Mens
 
Bot net detection by using ssl encryption
Bot net detection by using ssl encryptionBot net detection by using ssl encryption
Bot net detection by using ssl encryptionAcad
 
Lab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxLab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxsmile790243
 
PFHub: Phase Field Community Hub
PFHub: Phase Field Community HubPFHub: Phase Field Community Hub
PFHub: Phase Field Community HubDaniel Wheeler
 
All you know about Botnet
All you know about BotnetAll you know about Botnet
All you know about BotnetNaveen Titare
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detectionbutest
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation methodAcad
 
Clever data building a chatbot from your database
Clever data building a chatbot from your databaseClever data building a chatbot from your database
Clever data building a chatbot from your databaseLuis Beltran
 
Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...
Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...
Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...IRJET Journal
 
Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...
Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...
Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...Codemotion
 
Understanding the Botnet Phenomenon
Understanding the Botnet PhenomenonUnderstanding the Botnet Phenomenon
Understanding the Botnet PhenomenonDr. Amarjeet Singh
 

Ähnlich wie Evaluating Bot Detection Models on Code Commits (20)

Detection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsDetection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P Botnets
 
Build a mobile chatbot with Xamarin
Build a mobile chatbot with XamarinBuild a mobile chatbot with Xamarin
Build a mobile chatbot with Xamarin
 
Tracing Back The Botmaster
Tracing Back The BotmasterTracing Back The Botmaster
Tracing Back The Botmaster
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classification
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software development
 
Paper(edited)
Paper(edited)Paper(edited)
Paper(edited)
 
Bot net detection by using ssl encryption
Bot net detection by using ssl encryptionBot net detection by using ssl encryption
Bot net detection by using ssl encryption
 
Lab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxLab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docx
 
PFHub: Phase Field Community Hub
PFHub: Phase Field Community HubPFHub: Phase Field Community Hub
PFHub: Phase Field Community Hub
 
Botnet Architecture
Botnet ArchitectureBotnet Architecture
Botnet Architecture
 
All you know about Botnet
All you know about BotnetAll you know about Botnet
All you know about Botnet
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detection
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation method
 
Clever data building a chatbot from your database
Clever data building a chatbot from your databaseClever data building a chatbot from your database
Clever data building a chatbot from your database
 
Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...
Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...
Detecting Victim Systems In Client Networks Using Coarse Grained Botnet Algor...
 
How To Protect Your Website From Bot Attacks
How To Protect Your Website From Bot AttacksHow To Protect Your Website From Bot Attacks
How To Protect Your Website From Bot Attacks
 
Botnet
BotnetBotnet
Botnet
 
AI Machine vs Human
AI Machine vs HumanAI Machine vs Human
AI Machine vs Human
 
Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...
Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...
Human vs Bot: Giocare a Sasso-Carta-Forbici - Matteo Valoriani, Antimo Musone...
 
Understanding the Botnet Phenomenon
Understanding the Botnet PhenomenonUnderstanding the Botnet Phenomenon
Understanding the Botnet Phenomenon
 

Mehr von Tom Mens

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD studentTom Mens
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubTom Mens
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHubTom Mens
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureTom Mens
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Tom Mens
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubTom Mens
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networksTom Mens
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsTom Mens
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero SpaceTom Mens
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Tom Mens
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Tom Mens
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsTom Mens
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...Tom Mens
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Tom Mens
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Tom Mens
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsTom Mens
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarTom Mens
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersTom Mens
 
ConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker ContainersConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker ContainersTom Mens
 
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...Tom Mens
 

Mehr von Tom Mens (20)

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHub
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
 
ConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker ContainersConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker Containers
 
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
 

Kürzlich hochgeladen

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 

Kürzlich hochgeladen (20)

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 

Evaluating Bot Detection Models on Code Commits

  • 1. Evaluating a bot detection model on git commit messages Mehdi Golzadeh Alexandre Decan Tom Mens University of Mons - Belgium BENEVOL 2020 THE 19TH BELGIUM-NETHERLANDS-LUXEMBURG SOFTWARE EVOLUTION WORKSHOP
  • 3. Detecting bots in distributed software development activities is very important identity merging, team productivity, development effort, developer onboarding or abandonment, … Pull requests are more than twice as likely to get accepted if bots are present in their commenting activity 2/14
  • 4. Idea behind the study Mean distance between comments 3/14
  • 5. Idea behind the study c c c c c c c c Comment patterns 4/14
  • 6. Bot identification Available ground-truth datasets Download comments from GitHub 136,529 repositories 10,874,611 issue and PR commentsSelect an account Rate (by 2nd rater)Rate (by 1st rater) Discussion Rate (3rd rater) [disagreement] [difficult case] [disagreement] [agreement][agreement] [include account] [exclude account] Current methods and Tools 5/14
  • 7. Classification of accounts Number of comments Grid-search cross-validation, Random forest classifier – Best score k-nearest neighbors Support vector machine Decision tree classifier Logistic regression Number of empty comments Comment patterns Inequality between comments in patterns Ground truth dataset of 527 bots and 4473 humans An approach based on characteristics of comments 6/14
  • 8. Evaluation of the classifier We identified 4 categories of misclassified humans Classification model for detecting bots Based on issue and PR comments Precision: 0.94 Precision: 0.99 Precision: 0.98 Recall: 0.91 Recall: 0.99 Recall: 0.98 F1-score: 0.92 F1-score: 0.99 F1-score: 0.98 We identified 3 categories of misclassified bots M. Golzadeh, A. Decan, D. Legay, and T. Mens, “A ground-truthdataset and classification model for detecting bots in GitHub issueand PR comments,”Journal of Systems and Software [Submitted for review], 2020. 7/14
  • 10. Detecting and Characterizing Bots that Commit Code[2] 2- T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filipova, andA. Mockus, “Detecting and characterizing bots that commit code,” inInt’l Conf. Mining Software Repositories, 2020. Ground-truth dataset of 13,150 bots and 13,150 humans BIMAN is validated on a dataset of 67 bots and 67 humans 58 cases were correctly detected (87%) AUC-ROC: 0.89 BIN The presence of the string “bot” at the end of the author name Precision: 0.99 Recall: 0.37 BIM Commit messages AUC-ROC:0.70 Precision:0.57 Recall:0.67 BICA Files changed by each commit The projects that commit is associated with timestamp and time zone of the commits AUC-ROC: 0.89 BIMAN 9/14
  • 11. Evaluationofthemodels Ground-truth dataset Activities of an account in a single repository Only consider contributors that have at least 10 commit messages 3,380 3,542 Compute features for each contributor Model trained on PR and issues comments Model trained on commit messages Precision: 0.76 Precision: 0.78 Precision: 0.77 Recall: 0.78 Recall: 0.76 Recall: 0.77 F1-score: 0.77 F1-score: 0.77 F1-score: 0.77 Precision: 0.82 Precision: 0.78 Precision: 0.80 Recall: 0.75 Recall: 0.84 Recall: 0.80 F1-score: 0.78 F1-score: 0.81 F1-score: 0.80 10/14
  • 12. A bot detector based on git commits Command-line tool in Python. Given a git repository, predicts the type of Authors/Committers, Export to CSV or JSON https://github.com/mehdigolzadeh/BoDeGiC [ …. ] Inputs BoDeGic Output Extract Git commit messages Extracting features Pre-trained classifier Git repository Authors list Parameters Bot prediction [ ] BoDeGiC 11/14
  • 13. Discussions and Threats Better definition of “what a bot is” is required Identify bots not at the level of a contributor but at the level of activities The ground-truth dataset A source of threat to construct validity 19 out of 100 cases wrongly labeled in the original dataset 13 Bots 6 Humans Presence of mixed accounts 12/14
  • 14. • Detecting bots is important to avoid bias in socio-technical studies • An approach and a model to identify bots • To which extent this approach performs on git commit messages • Evaluation of the existing model • Train a new classifier on commit messages and evaluate the approach • A tool to identify bots in git repositories • Substitute BIM by our classification model in the BIMAN approach[2] Sum up 2- T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filipova, andA. Mockus, “Detecting and characterizing bots that commit code,” inInt’l Conf. Mining Software Repositories, 2020. 13/14
  • 15. The FNRS-FWO Excellence of Science research project SECO-ASSIST Thank you 14/14