SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
ishepard @DavideSpadini
What?
Framework to analyse Git (and soon Mercurial)
repositories
Why?
• There are already many frameworks for Git
• Generally, one for each programming language
• Java -> JGit
• Python -> GitPython
• Javascript -> nodegit
• etc.
So, why?
How many commands does Git have?
• > 20?
• > 50?
• > 100?
• > 150?
154!!
PyDriller
• Aim: to ease the extraction of information from Git repositories
• What is supported:
• analysing the history of a project
• retrieving commit information (date, message, authors, etc.)
• retrieving files information (diff, source code)
• What is not supported:
• writing on the repo (git pull, git push, git add, git commit,
etc..)
Demo
Statistics
• Everything is lazy evaluated, so you “pay” what you get.
1. only commit information:
immediate (as git log)
2. commit and file information:
60 commits/sec (1240 commits in 22 seconds)
3. commit, file and metrics information:
4 commits/s (1240 commits in ~5min)
Thank you for your support!
• Some numbers:
1. Downloaded approximatively 4000 times
2. 100 times only last 2 weeks
• Community driven
• University of Zurich, TU Delft and University of Catania teach
PyDriller in their MSR courses
• SIG uses PyDriller in their quality assessments
What’s next?
• A company asked me to implement
RepositoryMining().traverse_files()
• Mercurial support
• Ideas? Talk to me or submit a PR :)
PyDriller
• Source code: https://github.com/ishepard/pydriller
• Doc: https://pydriller.readthedocs.io/en/latest/
• Feel free to leave a star! :)

Weitere ähnliche Inhalte

Ähnlich wie PyDriller: Python Framework for Mining Software Repositories

LTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoLTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoHidenori Matsuki
 
OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] Jose Manuel Ortega Candel
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고용 최
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonSmartBear
 
The Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentThe Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentFrederick Reiss
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIsTim Osborn
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container AppsICS
 
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석iFunFactory Inc.
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
PythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development WorkshopPythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development Workshoppythonsd
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internettkisason
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 
Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta
 
Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six MonthsAnthony Baker
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSkhangtoh
 

Ähnlich wie PyDriller: Python Framework for Mining Software Repositories (20)

LTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoLTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyo
 
OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition]
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고
 
고등수학 스터디 결과발표
고등수학 스터디 결과발표고등수학 스터디 결과발표
고등수학 스터디 결과발표
 
hotdog a TD tool for DD
hotdog a TD tool for DDhotdog a TD tool for DD
hotdog a TD tool for DD
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
 
The Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentThe Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter Deployment
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIs
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container Apps
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container Apps
 
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
PythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development WorkshopPythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development Workshop
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
Github basics
Github basicsGithub basics
Github basics
 
Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020
 
Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six Months
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUS
 

Mehr von Delft University of Technology

Mehr von Delft University of Technology (7)

Investigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test SmellsInvestigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test Smells
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Test-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical StudyTest-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical Study
 
Practices and Tools for Better Software Testing
Practices and Tools for  Better Software TestingPractices and Tools for  Better Software Testing
Practices and Tools for Better Software Testing
 
When Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsWhen Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review Tests
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
To Mock or Not To Mock
To Mock or Not To MockTo Mock or Not To Mock
To Mock or Not To Mock
 

Kürzlich hochgeladen

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsIndustrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsAlirezaBagherian3
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsapna80328
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfNainaShrivastava14
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 

Kürzlich hochgeladen (20)

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsIndustrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal Compressors
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveying
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 

PyDriller: Python Framework for Mining Software Repositories

  • 1. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli
  • 2. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli ishepard @DavideSpadini
  • 4. Framework to analyse Git (and soon Mercurial) repositories
  • 6. • There are already many frameworks for Git • Generally, one for each programming language • Java -> JGit • Python -> GitPython • Javascript -> nodegit • etc.
  • 8.
  • 9.
  • 10.
  • 11. How many commands does Git have? • > 20? • > 50? • > 100? • > 150? 154!!
  • 12. PyDriller • Aim: to ease the extraction of information from Git repositories • What is supported: • analysing the history of a project • retrieving commit information (date, message, authors, etc.) • retrieving files information (diff, source code) • What is not supported: • writing on the repo (git pull, git push, git add, git commit, etc..)
  • 13. Demo
  • 14. Statistics • Everything is lazy evaluated, so you “pay” what you get. 1. only commit information: immediate (as git log) 2. commit and file information: 60 commits/sec (1240 commits in 22 seconds) 3. commit, file and metrics information: 4 commits/s (1240 commits in ~5min)
  • 15. Thank you for your support! • Some numbers: 1. Downloaded approximatively 4000 times 2. 100 times only last 2 weeks • Community driven • University of Zurich, TU Delft and University of Catania teach PyDriller in their MSR courses • SIG uses PyDriller in their quality assessments
  • 16. What’s next? • A company asked me to implement RepositoryMining().traverse_files() • Mercurial support • Ideas? Talk to me or submit a PR :)
  • 17. PyDriller • Source code: https://github.com/ishepard/pydriller • Doc: https://pydriller.readthedocs.io/en/latest/ • Feel free to leave a star! :)