SlideShare a Scribd company logo
1 of 21
Download to read offline
OpenMRS Gsoc 2010
                    First presentation
                     Shivashankar S
                 Project : Bug Analytics
      Mentors : Dr. Diederik van Liere, David Eaves

                       About myself
Education : MS by Research in CSE
Grad school : Indian Institute of Technology Madras,
  Chennai, India
Research Area : Text Mining, Machine Learning
Roadmap
•   Overview
•   Deliverables
•   Duplicate report identification
•   Expert identification
•   Current progress & Challenges
•   Results
Project Overview
• Fixing bugs faster is very crucial to keep any open
  source project active and alive.
• Higher level flow : When a new report comes, the
  report will be assigned to an expert, then the expert
  resolves the bug and fixes it.
• Issue 1 : Organizations involve a triaging team to do
  assignment of reports manually to experts.
• Issue 2 : Also if a report is resolved as duplicate, the
  time spent on the task by the expert goes in vain.
• The aim of “Bug Analytics” project is to address the
  above mentioned issues using the text in reports.
• Bug tracking tool of choice is JIRA.
Deliverables
• Plug-in for JIRA that can do the following
   – Duplicate ticket identification
   – Automatically assigning reports to experts
   – Classifying a report as bug or not
   – Likelihood of a bug report being fixed

   Note : Since all tasks are dealt with individually in
    literature, the focus will be on tasks in the same
    order mentioned [last two depends on time
    availability].
Duplicate report identification
Semi-automated approach                       Automated approach
Predict top “K” similar reports for each      Fix a threshold for similarity and call a report
report and leave it to the administrator to   as duplicate if its similarity with any of the
call it as duplicate or not                   reports in DB exceeds the threshold
Pros : No false alarms, also the similar    Pros : Lesser human intervention
reports returned can be used to improve the Cons : False alarms
report description, analyzing similar bugs
and come up with a fix etc.,
Cons : More human intervention compared
to automated approach.
Reference : Lyndon Hiew, Gail C.            Reference : Per Runeson , Magnus
Murphy. Assisted Detection of Duplicate Bug Alexandersson , Oskar Nyholm, Detection of
Reports. Submitted to FSE 2006              Duplicate Defect Reports Using Natural
                                            Language Processing, Proceedings of the
                                            29th international conference on Software
                                            Engineering, p.499-510, May 20-26, 2007
We should decide the final approach based on the experimental results
Expert identification
Semi-automated approach                         Automated approach
Predict top “K” experts for each report and     Assign exactly to one expert.
leave it to the administrator or the top “K”
experts themselves to assign the report to
one.
Pros : Even if one expert is busy, others can   Pros : If the prediction is good, leads to zero
take it up.                                     manual effort
Cons : Some protocol or mechanism must be       Cons : If incase the prediction was not
put in place to assign one from “K” experts     correct or if the assigned person is
                                                overloaded, then it requires manual triaging
References :
[1] Anvik, J., Hiew, L., and Murphy, G. C. 2006. Who should fix this bug?, In Proc. ICSE
[2] Anvik, J. and Murphy, G. C. 2007. Determining Implementation Expertise from
Bug Reports, In Proc. Fourth international Workshop on Mining Software Repositories
We should decide the final approach based on the experimental results
Training set
• Duplicate report prediction : This task needs a
  validation set to fix threshold in the case of automated
  approach, and for evaluation in both automated and
  semi-automated case. We built it using resolution field
  (resolution = “Duplicate”)
• Expert prediction : Here training set creation is not
  straight forward. Since “assigned-to” field is not the
  exact indicator of the expert for a report.
   – So a bunch of heuristics are employed, as given in
     http://www.cs.ubc.ca/labs/spl/projects/bugTriage/as
     signment/heuristics.html for other projects. It is
     given in the following slide.
Resolved, Closed                                    Open,
                                                        Report             Reopened
  Won’t fix,                                            Status ?
  Incomplete, Cannot
  reproduce
                       Resolution               Fixed

                                Duplicate

                    Use the labels of                     No                     Patch
                     the duplicated                                           submission
                         report                                               OR activity
                                                                             as comments
                                            If it is assigned, label the
Add the resolver
                                             report with owner, else                  Yes
as primary expert
                                                 discard the report

                                                                        Add the person who has
                  Add other patch                                       submitted most number
               submitters ,commenter,                                    of patches (ELSE) has
               and owner as additional                                 commented most times as
                       labels                                                primary expert
Current Progress
• Working code which does duplicate report
  identification and expert identification with results
  comparable to state of art approaches.
• Next step will be improvising the results with closer
  analysis and working on the plug-in for JIRA.

                   Challenges
• Noisy text – short forms, spelling mistakes
• Using stack traces, logs information properly
• Similar terms usage, as it is not necessary for everyone
  to use same word every time.
Duplicate identification
• Here TF, TF-IDF vectors are constructed using Summary,
  Description, Comments text. This is referred to as SDC
  and SD (with and without comments)
• In the semi-automated approach top “K” similar
  reports are returned for each report.
   – Presence of the actual duplicate report in top “K” is
      considered as a hit.
   – The results are plotted for “K” Vs hits ratio.
   – From the results, TF-IDF on SD has the best results.
Semi-automated approach
Automated approach
• A report that has similarity greater than threshold with
  any other report in the DB is flagged as duplicate.
• Else called as unique.
• For those reports flagged as duplicate, top “K” similar
  reports greater than threshold are examined.
   – If the actual duplicated report is present in top “K”
     it’s a hit for duplicate case
• On the other hand, if a report is correctly classified as
  unique, it’s a hit for unique case
• Plots are drawn for “threshold Vs hit ratio” for both
  duplicates and unique cases
Automated approach – SDC
Automated approach – SD
Expert classification
• The methods used are the following
   – Maximum Likelihood based prediction using BRKNN
     (Binary relevance KNN)
   – Maximum A posteriori prediction (MAP) using
     BRKNN
   – Component wise Maximum Likelihood based
     prediction using BRKNN
   – Component wise MAP using BRKNN (best results)
• For smaller “K” and smaller number of experts
  returned, the precision is high.
• Other way, for larger “K” and larger number of experts
  returned, the recall is high.
Precision value for 1 expert
Recall value for returning 1 expert
Precision value for returning 2 experts
Recall value for returning 2 experts
Precision value for returning 3 experts
Recall value for returning 3 experts

More Related Content

What's hot (7)

Design poo my_jug_en_ppt
Design poo my_jug_en_pptDesign poo my_jug_en_ppt
Design poo my_jug_en_ppt
 
Reliability Vs. Testing
Reliability Vs. TestingReliability Vs. Testing
Reliability Vs. Testing
 
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
 
Grasp principles
Grasp principlesGrasp principles
Grasp principles
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talk
 
Grasp
GraspGrasp
Grasp
 
Driven to Tests
Driven to TestsDriven to Tests
Driven to Tests
 

Viewers also liked

Aida Presentation 6 15 2010
Aida  Presentation  6 15 2010Aida  Presentation  6 15 2010
Aida Presentation 6 15 2010Aidapf
 
Reviving Business Etiquette
Reviving Business EtiquetteReviving Business Etiquette
Reviving Business EtiquetteMahendra Kakde
 
The technology in the classroom
The technology in the classroomThe technology in the classroom
The technology in the classroomra1v3n
 
Workload consolidation on ATCA with the advantech mic 5333 universal platform
Workload consolidation on ATCA with the advantech mic 5333 universal platformWorkload consolidation on ATCA with the advantech mic 5333 universal platform
Workload consolidation on ATCA with the advantech mic 5333 universal platformPaul Stevens
 
Mi concurso a
Mi concurso aMi concurso a
Mi concurso aARAM2682
 
The impact of optimized packet processing sw for dpi and networking security
The impact of optimized packet processing sw for  dpi and networking securityThe impact of optimized packet processing sw for  dpi and networking security
The impact of optimized packet processing sw for dpi and networking securityPaul Stevens
 
Optimized packet processing software for networking and security
Optimized packet processing software for networking and securityOptimized packet processing software for networking and security
Optimized packet processing software for networking and securityPaul Stevens
 

Viewers also liked (7)

Aida Presentation 6 15 2010
Aida  Presentation  6 15 2010Aida  Presentation  6 15 2010
Aida Presentation 6 15 2010
 
Reviving Business Etiquette
Reviving Business EtiquetteReviving Business Etiquette
Reviving Business Etiquette
 
The technology in the classroom
The technology in the classroomThe technology in the classroom
The technology in the classroom
 
Workload consolidation on ATCA with the advantech mic 5333 universal platform
Workload consolidation on ATCA with the advantech mic 5333 universal platformWorkload consolidation on ATCA with the advantech mic 5333 universal platform
Workload consolidation on ATCA with the advantech mic 5333 universal platform
 
Mi concurso a
Mi concurso aMi concurso a
Mi concurso a
 
The impact of optimized packet processing sw for dpi and networking security
The impact of optimized packet processing sw for  dpi and networking securityThe impact of optimized packet processing sw for  dpi and networking security
The impact of optimized packet processing sw for dpi and networking security
 
Optimized packet processing software for networking and security
Optimized packet processing software for networking and securityOptimized packet processing software for networking and security
Optimized packet processing software for networking and security
 

Similar to Ppt Open Mrs 1

USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSijseajournal
 
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality MeasureTMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality MeasureIosif Itkin
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...butest
 
Testing in the Oil & Gas Market“
Testing in the Oil & Gas Market“Testing in the Oil & Gas Market“
Testing in the Oil & Gas Market“Ernesto Kiszkurno
 
Methodology Patterns (Agile Cambridge 2014)
Methodology Patterns (Agile Cambridge 2014)Methodology Patterns (Agile Cambridge 2014)
Methodology Patterns (Agile Cambridge 2014)Giovanni Asproni
 
Agile Software Development Process Practice in Thai Culture
Agile Software Development Process Practice in Thai CultureAgile Software Development Process Practice in Thai Culture
Agile Software Development Process Practice in Thai CultureWee Witthawaskul
 
Bug best practice
Bug best practiceBug best practice
Bug best practicegaoliang641
 
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine LearningIRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine LearningIRJET Journal
 
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...ESEM 2014
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Steve Feldman
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
Naging The Development Of Large Software Systems
Naging The Development Of Large Software Systems Naging The Development Of Large Software Systems
Naging The Development Of Large Software Systems Software Guru
 
Open Source tools in Continuous Integration environment (case study for agil...
Open Source tools in Continuous Integration environment  (case study for agil...Open Source tools in Continuous Integration environment  (case study for agil...
Open Source tools in Continuous Integration environment (case study for agil...suwalki24.pl
 
Continuous Integration testing based on Selenium and Hudson
Continuous Integration testing based on Selenium and HudsonContinuous Integration testing based on Selenium and Hudson
Continuous Integration testing based on Selenium and HudsonZbyszek Mockun
 

Similar to Ppt Open Mrs 1 (20)

USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
 
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality MeasureTMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...
 
Testing in the Oil & Gas Market“
Testing in the Oil & Gas Market“Testing in the Oil & Gas Market“
Testing in the Oil & Gas Market“
 
Methodology Patterns (Agile Cambridge 2014)
Methodology Patterns (Agile Cambridge 2014)Methodology Patterns (Agile Cambridge 2014)
Methodology Patterns (Agile Cambridge 2014)
 
Agile Software Development Process Practice in Thai Culture
Agile Software Development Process Practice in Thai CultureAgile Software Development Process Practice in Thai Culture
Agile Software Development Process Practice in Thai Culture
 
Bug best practice
Bug best practiceBug best practice
Bug best practice
 
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine LearningIRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
 
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Ijcai 2020
Ijcai 2020Ijcai 2020
Ijcai 2020
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Testing
TestingTesting
Testing
 
Testing &ampdebugging
Testing &ampdebuggingTesting &ampdebugging
Testing &ampdebugging
 
Naging The Development Of Large Software Systems
Naging The Development Of Large Software Systems Naging The Development Of Large Software Systems
Naging The Development Of Large Software Systems
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
Open Source tools in Continuous Integration environment (case study for agil...
Open Source tools in Continuous Integration environment  (case study for agil...Open Source tools in Continuous Integration environment  (case study for agil...
Open Source tools in Continuous Integration environment (case study for agil...
 
Continuous Integration testing based on Selenium and Hudson
Continuous Integration testing based on Selenium and HudsonContinuous Integration testing based on Selenium and Hudson
Continuous Integration testing based on Selenium and Hudson
 

Recently uploaded

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 

Recently uploaded (20)

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 

Ppt Open Mrs 1

  • 1. OpenMRS Gsoc 2010 First presentation Shivashankar S Project : Bug Analytics Mentors : Dr. Diederik van Liere, David Eaves About myself Education : MS by Research in CSE Grad school : Indian Institute of Technology Madras, Chennai, India Research Area : Text Mining, Machine Learning
  • 2. Roadmap • Overview • Deliverables • Duplicate report identification • Expert identification • Current progress & Challenges • Results
  • 3. Project Overview • Fixing bugs faster is very crucial to keep any open source project active and alive. • Higher level flow : When a new report comes, the report will be assigned to an expert, then the expert resolves the bug and fixes it. • Issue 1 : Organizations involve a triaging team to do assignment of reports manually to experts. • Issue 2 : Also if a report is resolved as duplicate, the time spent on the task by the expert goes in vain. • The aim of “Bug Analytics” project is to address the above mentioned issues using the text in reports. • Bug tracking tool of choice is JIRA.
  • 4. Deliverables • Plug-in for JIRA that can do the following – Duplicate ticket identification – Automatically assigning reports to experts – Classifying a report as bug or not – Likelihood of a bug report being fixed Note : Since all tasks are dealt with individually in literature, the focus will be on tasks in the same order mentioned [last two depends on time availability].
  • 5. Duplicate report identification Semi-automated approach Automated approach Predict top “K” similar reports for each Fix a threshold for similarity and call a report report and leave it to the administrator to as duplicate if its similarity with any of the call it as duplicate or not reports in DB exceeds the threshold Pros : No false alarms, also the similar Pros : Lesser human intervention reports returned can be used to improve the Cons : False alarms report description, analyzing similar bugs and come up with a fix etc., Cons : More human intervention compared to automated approach. Reference : Lyndon Hiew, Gail C. Reference : Per Runeson , Magnus Murphy. Assisted Detection of Duplicate Bug Alexandersson , Oskar Nyholm, Detection of Reports. Submitted to FSE 2006 Duplicate Defect Reports Using Natural Language Processing, Proceedings of the 29th international conference on Software Engineering, p.499-510, May 20-26, 2007 We should decide the final approach based on the experimental results
  • 6. Expert identification Semi-automated approach Automated approach Predict top “K” experts for each report and Assign exactly to one expert. leave it to the administrator or the top “K” experts themselves to assign the report to one. Pros : Even if one expert is busy, others can Pros : If the prediction is good, leads to zero take it up. manual effort Cons : Some protocol or mechanism must be Cons : If incase the prediction was not put in place to assign one from “K” experts correct or if the assigned person is overloaded, then it requires manual triaging References : [1] Anvik, J., Hiew, L., and Murphy, G. C. 2006. Who should fix this bug?, In Proc. ICSE [2] Anvik, J. and Murphy, G. C. 2007. Determining Implementation Expertise from Bug Reports, In Proc. Fourth international Workshop on Mining Software Repositories We should decide the final approach based on the experimental results
  • 7. Training set • Duplicate report prediction : This task needs a validation set to fix threshold in the case of automated approach, and for evaluation in both automated and semi-automated case. We built it using resolution field (resolution = “Duplicate”) • Expert prediction : Here training set creation is not straight forward. Since “assigned-to” field is not the exact indicator of the expert for a report. – So a bunch of heuristics are employed, as given in http://www.cs.ubc.ca/labs/spl/projects/bugTriage/as signment/heuristics.html for other projects. It is given in the following slide.
  • 8. Resolved, Closed Open, Report Reopened Won’t fix, Status ? Incomplete, Cannot reproduce Resolution Fixed Duplicate Use the labels of No Patch the duplicated submission report OR activity as comments If it is assigned, label the Add the resolver report with owner, else Yes as primary expert discard the report Add the person who has Add other patch submitted most number submitters ,commenter, of patches (ELSE) has and owner as additional commented most times as labels primary expert
  • 9. Current Progress • Working code which does duplicate report identification and expert identification with results comparable to state of art approaches. • Next step will be improvising the results with closer analysis and working on the plug-in for JIRA. Challenges • Noisy text – short forms, spelling mistakes • Using stack traces, logs information properly • Similar terms usage, as it is not necessary for everyone to use same word every time.
  • 10. Duplicate identification • Here TF, TF-IDF vectors are constructed using Summary, Description, Comments text. This is referred to as SDC and SD (with and without comments) • In the semi-automated approach top “K” similar reports are returned for each report. – Presence of the actual duplicate report in top “K” is considered as a hit. – The results are plotted for “K” Vs hits ratio. – From the results, TF-IDF on SD has the best results.
  • 12. Automated approach • A report that has similarity greater than threshold with any other report in the DB is flagged as duplicate. • Else called as unique. • For those reports flagged as duplicate, top “K” similar reports greater than threshold are examined. – If the actual duplicated report is present in top “K” it’s a hit for duplicate case • On the other hand, if a report is correctly classified as unique, it’s a hit for unique case • Plots are drawn for “threshold Vs hit ratio” for both duplicates and unique cases
  • 15. Expert classification • The methods used are the following – Maximum Likelihood based prediction using BRKNN (Binary relevance KNN) – Maximum A posteriori prediction (MAP) using BRKNN – Component wise Maximum Likelihood based prediction using BRKNN – Component wise MAP using BRKNN (best results) • For smaller “K” and smaller number of experts returned, the precision is high. • Other way, for larger “K” and larger number of experts returned, the recall is high.
  • 17. Recall value for returning 1 expert
  • 18. Precision value for returning 2 experts
  • 19. Recall value for returning 2 experts
  • 20. Precision value for returning 3 experts
  • 21. Recall value for returning 3 experts