SlideShare ist ein Scribd-Unternehmen logo
1 von 34
AAAI 2011

                                        CosTriage: A Cost-Aware Algorithm for
Information & Database Systems Lab




                                        Bug Reporting Systems
                                     Jin-woo Park1, Mu-Woong Lee1, Jinhan Kim1, Seung-won Hwang1, Sunghun Kim2
                                                           POSTECH, Korea, Republic of1
                                                                HKUST, Hong Kong2
Bug reporting systems
                                      Bugs!!
                                        More than 300 bug reports per day in Mozilla (a big software
                                         project)
Information & Database Systems Lab




                                      Bug Solving
                                        One of the important issues in a software development process
                                        Bug reports are posted, discussed, and assigned to developers


                                      Open sources projects
                                          Apache
                                          Eclipse
                                          Linux kernel
                                          Mozilla
Bug reporting systems
                                      Bug reports
                                         Has Bug ID, title, description, status, and other meta data
                                         Assigned to developers
                                         Fixed by developers
Information & Database Systems Lab




                                      Challenges
                                         Bug triage
                                         Duplicate bug detection
                                         …
                                         bug ID
                                                                                                        title (summary)
                                         status
                                                                                                   bug fix history (time)

                                      other data



                                      description
Bug reporting systems
                                      Bug Triage
                                        Assigning a new bug report to a suitable developer
                                        Bottleneck of bug fixing process
                                           Labor intensive
Information & Database Systems Lab




                                           Miss-assignment can lead to slow bug fix
                                                                                          Bottleneck of
                                                                                          the bug fixing
                                                                                             process




                                                                                           assign
                                           Open
                                          Source             Bug
                                          Project           Reports
                                                                                Triager

                                                                                                           Developers

                                                                                               Can be
                                                                                            automated!!!
Preliminary (recommendation)
                                      Recommender algorithms
                                        Content-based recommendation (CBR)
                                           Predicting user’s interests based on item features
                                           Machine learning methods
Information & Database Systems Lab




                                           Over-specialization problem


                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
                                           Sparsity problem


                                        Hybrid recommendation
                                           Content-boosted collaborative filtering (CBCF)
                                           Combining an existing CBR with a CF
                                           Better performance than either approach alone
Preliminary (recommendation)
                                      Recommender algorithms
                                         Content-based recommendation (CBR)
                                                Predict user’s interests based on item features
                                                   Title, Status, Description, …
Information & Database Systems Lab




                                                Learn features using machine learning methods
                                                Recommend similar items
                                                Over-specialization problem



                                                                                                    Developers
                                                      Bug
                                                    Report 4                        Title   Bug 1      Bug 2     Bug 3   Bug 4

                                                                                    Dev 1    10          9         1      ?

                                     Feature      word1    word2     word3          Dev 2     6          5        10      ?

                                      Count        1           1      4             Dev 3     8          7         7      ?


                                                                                             Rating score table
Preliminary (recommendation)
                                      Recommender algorithms
                                         Content-based recommendation (CBR)
                                                Predict user’s interests based on item features
                                                   Title, Status, Description, …
Information & Database Systems Lab




                                                Learn features using machine learning methods
                                                Recommend similar items
                                                Over-specialization problem



                                                                                                    Developers
                                                      Bug
                                                    Report 4                        Title   Bug 1      Bug 2     Bug 3   Bug 4

                                                                                    Dev 1    10          9         1       9

                                     Feature      word1    word2     word3          Dev 2     6          5        10       5

                                      Count        1           1      4             Dev 3     8          7         7       7


                                                                                             Rating score table
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
Information & Database Systems Lab




                                           Sparsity problem



                                                                           Bug
                                                                          Reports


                                                                  Bug 1             Bug 2      Bug 3
                                                Developer 1         10               5           10
                                                Developer 2          5               10           6
                                                Developer 3          7               7            7
                                                Developer 4          9               5            ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
Information & Database Systems Lab




                                           Sparsity problem



                                                                           Bug
                                                                          Reports


                                                                  Bug 1             Bug 2      Bug 3
                                                Developer 1         10               5           10
                                                Developer 2          5               10           6
                                                Developer 3          7               7            7
                                                Developer 4          9               5            ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
Information & Database Systems Lab




                                           Sparsity problem



                                                                           Bug
                                                                          Reports


                                                                  Bug 1             Bug 2      Bug 3
                                                Developer 1         10               5           10
                                                Developer 2          5               10           6
                                                Developer 3          7               7            7
                                                Developer 4          9               5            9
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                            Predicting user’s interests based on affinity’s interests for items
                                            User neighborhood
Information & Database Systems Lab




                                            Sparsity problem


                                     Unsuitable for triaging because:
                                                                           Bug
                                      No one solved new bug!!
                                                                         Report 4


                                                          Bug 1          Bug 2          Bug 3          Bug 4
                                        Developer 1         10             5              10             ?
                                        Developer 2          5             10             6              ?
                                        Developer 3          7             7              7              ?
                                        Developer 4          9             5              9              ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                            Predicting user’s interests based on affinity’s interests for items
                                            User neighborhood
Information & Database Systems Lab




                                            Sparsity problem


                                     Unsuitable for triaging because:
                                                                           Bug
                                      Rating is extremely sparse!!
                                                                         Report 4


                                                          Bug 1          Bug 2          Bug 3          Bug 4
                                        Developer 1         10             5              ?              ?
                                        Developer 2          ?             ?              ?              ?
                                        Developer 3          ?             ?              10             ?
                                        Developer 4          ?             ?              ?              3
Preliminary (recommendation)
                                      Recommender algorithms
                                        Hybrid recommendation
                                          Content-boosted collaborative filtering (CBCF)
                                          Combining an existing CBR with a CF
Information & Database Systems Lab




                                          Better performance than either approach alone


                                           Two Phases
                                                                         Bug
                                           - CBR phase                                Feature     word1   word2   word3
                                                                       Report 5
                                           - CF Phase                                 Count        3        2      4



                                                     Bug 1      Bug 2         Bug 3      Bug 4            Bug 5
                                           Dev 1       10          ?              ?           ?            ?
                                           Dev 2         ?         8              3           ?            ?
                                           Dev 3         ?         ?              ?           7            ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Hybrid recommendation
                                          Content-boosted collaborative filtering (CBCF)
                                          Combining an existing CBR with a CF
Information & Database Systems Lab




                                          Better performance than either approach alone


                                           CBR phase
                                                                         Bug
                                                                       Report 5



                                                     Bug 1      Bug 2         Bug 3    Bug 4   Bug 5
                                           Dev 1       10         10              10    10      10
                                           Dev 2        3          8              3     8       3
                                           Dev 3        7          7              7     7       7
Preliminary (recommendation)
                                      Recommender algorithms
                                        Hybrid recommendation
                                          Content-boosted collaborative filtering (CBCF)
                                          Combining an existing CBR with a CF
Information & Database Systems Lab




                                          Improving CBR/CF by combining the complementary strength of CBR
                                           and CF

                                           CF phase
                                                                       Bug
                                                                     Report 5



                                                      Bug 1    Bug 2        Bug 3   Bug 4      Bug 5
                                           Dev 1       10        9              7     9          8
                                           Dev 2       5         8              3     8          5
                                           Dev 3       7         8              5     7          6

                                        Existing recommendation approaches are not suitable!
Preliminary (Bug triage)
                                      PureCBR [Anvik06]
                                        Construct multi-class classifier using a SVM classifier
                                        Bug reports B are converted into pair <F(B), D> for training
                                            Bug Report History  Input Data
Information & Database Systems Lab




                                            Developers  Classes

                                         F(B) is the feature vector indicating the counts of the keyword w of description of B
                                         D is the developer who fixed B (class)



                                                            Bug Fix
                                                            History

                                                                                           training
                                                                                                        Assign a new bug to Dev 1

                                      New Bug
                                       Report
                                                                                                            Classifier’s scores
Preliminary (Bug triage)
                                      PureCBR [Anvik06]
                                        Good performances
                                        Problem
                                          This approach only considers accuracy
Information & Database Systems Lab




                                             Many bugs  One super-developer
                                             Over-specialization problem
                                          Are developer happy?
                                            We consider developer’s cost (e.g., interests, bug fix time, and
                                           expertise)
                                            We assume that faster bug fixing time has higher developer cost




                                                                   Bug            Bug
                                                                 Report 1       Report 2


                                                                  50 days         2 days
Goal
                                      Goal
                                        Find efficient bug-developer matching
                                           Optimizing not only accuracy but also cost
                                        Use modified CBCF approach
Information & Database Systems Lab




                                           Constructing developer profiles for cost

                                      Challenge
                                        Enhancing CBCF approach for sparse data
                                        Extreme sparseness of the past bug fix history data
                                           A bug fixed by a developer
                                           Need to reduce sparseness for enhancing quality of CBCF




                                                              Bug fix time from bug fix history
Overview
                                                            Cost

                                                             Developer profiles                   Cost score
Information & Database Systems Lab




                                                                                                                         Recommended
                                        New Bug                                                  Aggregation
                                                                                                                           Developer
                                         Report             Accuracy

                                                               Bug classifier                   Accuracy score
                                                                 <SVM>



                                      Merging classifier’s scores and developer’s cost scores.
                                        The accuracy scores are obtained using PureCBR [Anvik06]
                                        The developer cost scores are obtained from “de-sparsified” bug
                                         fix history.
                                        Two scores are then merged for prediction
                                                                                                                 Assign a new bug to Dev 1

                                                                   +                                       =
                                          Accuracy scores                         Cost scores                           Hybrid scores
CosTriage (Cost Estimation)
                                      CosTriage: A Cost-aware Triage Algorithm for bug reporting
                                       system
                                      Challenge to estimate the developer cost?
                                      How to reduce the sparseness problem?
Information & Database Systems Lab




                                         Using a Topic Modeling




                                                           Bug fix time from bug fix history




                                                     Categorization bugs to reduce the sparseness
CosTriage
                                      Categorizing bugs
                                        Topic Modeling
                                           Latent Dirichlet Allocation (LDA) [BleiNg03]
                                           Each topic is represented as a bug type
Information & Database Systems Lab




                                           The topic distribution of reports determine bug types
                                        We adopt the divergence measure proposed in [Arun, R. PAKDD ‘10]
                                           Finding the natural number of topics (# bug types)


                                            t is the natural number of bug types
CosTriage
                                      Developer profiles modeling
                                        Developer Profiles
                                           N-dimensional feature vector
                                           The element of developer profiles, Pu[i], denotes the developer cost
Information & Database Systems Lab




                                            for ith-type bugs


                                                    T denotes the number of bug types



                                        Developer Cost
                                           The average time to fix ith type bugs
CosTriage
                                      Predicting missing values in profiles
                                        Using CF for developer profiles
                                           Similarity measure:
Information & Database Systems Lab




                                               k=1
CosTriage
                                      Obtaining developer’s cost for a new bug report

                                                 Bug type = 1
Information & Database Systems Lab




                                                 New Bug
                                                  Report




                                                                Developer cost for a new bug
Merging
                                                             Cost

                                                              Developer profiles                  Cost score
                                                                <Bug types>
Information & Database Systems Lab




                                                                                                                     Recommended
                                          Bug                                                    Aggregation
                                        Reports                                                                        Developer
                                                             Accuracy

                                                                Bug classifier
                                                                  <SVM>                         Accuracy score



                                      Merging classifier’s scores and developer’s cost scores.


                                                                    +                                          =

                                           Accuracy scores                       Cost scores (CosTriage)           Hybrid scores
                                              [Anvik06]
Experiments
                                      Subject Systems
                                        97,910 valid bug reports
                                        255 active developers
                                        From four open source projects
Information & Database Systems Lab




                                      Approaches
                                        PureCBR: State of the art CBR-based approach
                                        CBCF: Original CBCF
                                        CosTraige: Our approach
Experiments
                                      Two research questions
                                       Q1. How much can our approach improve cost (bug fix time) without
                                           sacrificing bug assignment accuracy?

                                       Q2. What are the trade-offs between accuracy and cost (bug fix
Information & Database Systems Lab




                                           time)?


                                      Evaluation measures




                                       W is the set of bug reports predicted correctly.
                                       N is the number of bug reports in the test set.

                                        The real fix time is unknown, we only use the fix time for correctly
                                         matched bugs.
Experiments
                                      Relative errors of expected bug fix time
Information & Database Systems Lab




                                      Improvement of bug fix time (Q1)




                                        CosTriage improves the costs efficiently up to 30% without
                                         seriously compromising accuracy
Experiments
                                      Trade-off between accuracy and bug fix time (Q2)
Information & Database Systems Lab
Conclusion
                                      We proposed a new bug triaging technique
                                        Optimize not only accuracy but also cost
                                        Solve data sparseness problem by using topic modeling
Information & Database Systems Lab




                                      Experiments using four real bug report corpora
                                        Improve the cost without heavy losses of accuracy
Q&A
Information & Database Systems Lab




                                                Thank you!


                                           Do you have any questions?
Contact
Information & Database Systems Lab




                                                     Jin-woo Park
                                               jwpark85@postech.ac.kr
Back up
                                      We adopt the divergence measure proposed in [Arun10]
                                        Finding the natural number of topics (# bug types)
Information & Database Systems Lab




                                            t is the natural number of bug types




                                                        Mozilla                         Apache
Back up - Bug features
                                      Bug features
                                        Keywords of title and description
                                        Other meta data
Information & Database Systems Lab




                                                                    Title : Traditional Memory Rendering refactoring request

                                          New Bug                   Description : Request additional refactoring so we can
                                                                                                …
                                           Report                                    Traditional Rendering.



                                                                                                    Remove stopwords



                                                                   Traditional Memory Rendering refactoring request Request
                                                                               refactoring … Traditional Rendering

Weitere ähnliche Inhalte

Ähnlich wie CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)

Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningGuido A. Ciollaro
 
Webinar issues we_find_slideshare
Webinar issues we_find_slideshareWebinar issues we_find_slideshare
Webinar issues we_find_slideshareSOASTA
 
Bottlenecks exposed web app db servers
Bottlenecks exposed web app db serversBottlenecks exposed web app db servers
Bottlenecks exposed web app db serversUpender Dravidum
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialStuart Charlton
 
The Cloud: A game changer to test, at scale and in production, SOA based web...
The Cloud: A game changer to test, at scale and in production,  SOA based web...The Cloud: A game changer to test, at scale and in production,  SOA based web...
The Cloud: A game changer to test, at scale and in production, SOA based web...Fred Beringer
 
2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migration2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migrationDidier Durand
 
Testability for developers – Fighting a mess by making it testable
Testability for developers – Fighting a mess by making it testableTestability for developers – Fighting a mess by making it testable
Testability for developers – Fighting a mess by making it testableAlexander Tarlinder
 
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012SOASTA
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Chakkrit (Kla) Tantithamthavorn
 
Analysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product LineAnalysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product LineDharmalingam Ganesan
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect predictionThomas Zimmermann
 
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...Shakas Technologies
 
WebLogic Diagnostic Framework Dr. Frank Munz / munz & more WLS11g
WebLogic Diagnostic Framework  Dr. Frank Munz / munz & more WLS11gWebLogic Diagnostic Framework  Dr. Frank Munz / munz & more WLS11g
WebLogic Diagnostic Framework Dr. Frank Munz / munz & more WLS11gInSync Conference
 
Profiling Multicore Systems to Maximize Core Utilization
Profiling Multicore Systems to Maximize Core Utilization Profiling Multicore Systems to Maximize Core Utilization
Profiling Multicore Systems to Maximize Core Utilization mentoresd
 
Microsoft HPC User Group
Microsoft HPC User Group Microsoft HPC User Group
Microsoft HPC User Group sjwoodman
 
Jimwebber soa
Jimwebber soaJimwebber soa
Jimwebber soad0nn9n
 

Ähnlich wie CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011) (20)

Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
 
Webinar issues we_find_slideshare
Webinar issues we_find_slideshareWebinar issues we_find_slideshare
Webinar issues we_find_slideshare
 
Bottlenecks exposed web app db servers
Bottlenecks exposed web app db serversBottlenecks exposed web app db servers
Bottlenecks exposed web app db servers
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
 
The Cloud: A game changer to test, at scale and in production, SOA based web...
The Cloud: A game changer to test, at scale and in production,  SOA based web...The Cloud: A game changer to test, at scale and in production,  SOA based web...
The Cloud: A game changer to test, at scale and in production, SOA based web...
 
2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migration2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migration
 
Testability for developers – Fighting a mess by making it testable
Testability for developers – Fighting a mess by making it testableTestability for developers – Fighting a mess by making it testable
Testability for developers – Fighting a mess by making it testable
 
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
 
Analysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product LineAnalysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product Line
 
Vb
VbVb
Vb
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
 
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
 
WebLogic Diagnostic Framework Dr. Frank Munz / munz & more WLS11g
WebLogic Diagnostic Framework  Dr. Frank Munz / munz & more WLS11gWebLogic Diagnostic Framework  Dr. Frank Munz / munz & more WLS11g
WebLogic Diagnostic Framework Dr. Frank Munz / munz & more WLS11g
 
Profiling Multicore Systems to Maximize Core Utilization
Profiling Multicore Systems to Maximize Core Utilization Profiling Multicore Systems to Maximize Core Utilization
Profiling Multicore Systems to Maximize Core Utilization
 
Microsoft HPC User Group
Microsoft HPC User Group Microsoft HPC User Group
Microsoft HPC User Group
 
Jimwebber soa
Jimwebber soaJimwebber soa
Jimwebber soa
 
Od2 research framework (1)
Od2 research framework (1)Od2 research framework (1)
Od2 research framework (1)
 

Mehr von Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 

Mehr von Sung Kim (20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
Time series classification
Time series classificationTime series classification
Time series classification
 
Tensor board
Tensor boardTensor board
Tensor board
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 

Kürzlich hochgeladen

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)

  • 1. AAAI 2011 CosTriage: A Cost-Aware Algorithm for Information & Database Systems Lab Bug Reporting Systems Jin-woo Park1, Mu-Woong Lee1, Jinhan Kim1, Seung-won Hwang1, Sunghun Kim2 POSTECH, Korea, Republic of1 HKUST, Hong Kong2
  • 2. Bug reporting systems  Bugs!!  More than 300 bug reports per day in Mozilla (a big software project) Information & Database Systems Lab  Bug Solving  One of the important issues in a software development process  Bug reports are posted, discussed, and assigned to developers  Open sources projects  Apache  Eclipse  Linux kernel  Mozilla
  • 3. Bug reporting systems  Bug reports  Has Bug ID, title, description, status, and other meta data  Assigned to developers  Fixed by developers Information & Database Systems Lab  Challenges  Bug triage  Duplicate bug detection  … bug ID title (summary) status bug fix history (time) other data description
  • 4. Bug reporting systems  Bug Triage  Assigning a new bug report to a suitable developer  Bottleneck of bug fixing process  Labor intensive Information & Database Systems Lab  Miss-assignment can lead to slow bug fix Bottleneck of the bug fixing process assign Open Source Bug Project Reports Triager Developers Can be automated!!!
  • 5. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predicting user’s interests based on item features  Machine learning methods Information & Database Systems Lab  Over-specialization problem  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood  Sparsity problem  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF  Better performance than either approach alone
  • 6. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, … Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 ? Feature word1 word2 word3 Dev 2 6 5 10 ? Count 1 1 4 Dev 3 8 7 7 ? Rating score table
  • 7. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, … Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 9 Feature word1 word2 word3 Dev 2 6 5 10 5 Count 1 1 4 Dev 3 8 7 7 7 Rating score table
  • 8. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 9. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 10. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 9
  • 11. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  No one solved new bug!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 10 ? Developer 2 5 10 6 ? Developer 3 7 7 7 ? Developer 4 9 5 9 ?
  • 12. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  Rating is extremely sparse!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 ? ? Developer 2 ? ? ? ? Developer 3 ? ? 10 ? Developer 4 ? ? ? 3
  • 13. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF Information & Database Systems Lab  Better performance than either approach alone Two Phases Bug - CBR phase Feature word1 word2 word3 Report 5 - CF Phase Count 3 2 4 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 ? ? ? ? Dev 2 ? 8 3 ? ? Dev 3 ? ? ? 7 ?
  • 14. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF Information & Database Systems Lab  Better performance than either approach alone CBR phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 10 10 10 10 Dev 2 3 8 3 8 3 Dev 3 7 7 7 7 7
  • 15. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF Information & Database Systems Lab  Improving CBR/CF by combining the complementary strength of CBR and CF CF phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 9 7 9 8 Dev 2 5 8 3 8 5 Dev 3 7 8 5 7 6 Existing recommendation approaches are not suitable!
  • 16. Preliminary (Bug triage)  PureCBR [Anvik06]  Construct multi-class classifier using a SVM classifier  Bug reports B are converted into pair <F(B), D> for training  Bug Report History  Input Data Information & Database Systems Lab  Developers  Classes F(B) is the feature vector indicating the counts of the keyword w of description of B D is the developer who fixed B (class) Bug Fix History training Assign a new bug to Dev 1 New Bug Report Classifier’s scores
  • 17. Preliminary (Bug triage)  PureCBR [Anvik06]  Good performances  Problem  This approach only considers accuracy Information & Database Systems Lab  Many bugs  One super-developer  Over-specialization problem  Are developer happy?  We consider developer’s cost (e.g., interests, bug fix time, and expertise)  We assume that faster bug fixing time has higher developer cost Bug Bug Report 1 Report 2 50 days 2 days
  • 18. Goal  Goal  Find efficient bug-developer matching  Optimizing not only accuracy but also cost  Use modified CBCF approach Information & Database Systems Lab  Constructing developer profiles for cost  Challenge  Enhancing CBCF approach for sparse data  Extreme sparseness of the past bug fix history data  A bug fixed by a developer  Need to reduce sparseness for enhancing quality of CBCF Bug fix time from bug fix history
  • 19. Overview Cost Developer profiles Cost score Information & Database Systems Lab Recommended New Bug Aggregation Developer Report Accuracy Bug classifier Accuracy score <SVM>  Merging classifier’s scores and developer’s cost scores.  The accuracy scores are obtained using PureCBR [Anvik06]  The developer cost scores are obtained from “de-sparsified” bug fix history.  Two scores are then merged for prediction Assign a new bug to Dev 1 + = Accuracy scores Cost scores Hybrid scores
  • 20. CosTriage (Cost Estimation)  CosTriage: A Cost-aware Triage Algorithm for bug reporting system  Challenge to estimate the developer cost?  How to reduce the sparseness problem? Information & Database Systems Lab  Using a Topic Modeling Bug fix time from bug fix history Categorization bugs to reduce the sparseness
  • 21. CosTriage  Categorizing bugs  Topic Modeling  Latent Dirichlet Allocation (LDA) [BleiNg03]  Each topic is represented as a bug type Information & Database Systems Lab  The topic distribution of reports determine bug types  We adopt the divergence measure proposed in [Arun, R. PAKDD ‘10]  Finding the natural number of topics (# bug types) t is the natural number of bug types
  • 22. CosTriage  Developer profiles modeling  Developer Profiles  N-dimensional feature vector  The element of developer profiles, Pu[i], denotes the developer cost Information & Database Systems Lab for ith-type bugs T denotes the number of bug types  Developer Cost  The average time to fix ith type bugs
  • 23. CosTriage  Predicting missing values in profiles  Using CF for developer profiles  Similarity measure: Information & Database Systems Lab k=1
  • 24. CosTriage  Obtaining developer’s cost for a new bug report Bug type = 1 Information & Database Systems Lab New Bug Report Developer cost for a new bug
  • 25. Merging Cost Developer profiles Cost score <Bug types> Information & Database Systems Lab Recommended Bug Aggregation Reports Developer Accuracy Bug classifier <SVM> Accuracy score  Merging classifier’s scores and developer’s cost scores. + = Accuracy scores Cost scores (CosTriage) Hybrid scores [Anvik06]
  • 26. Experiments  Subject Systems  97,910 valid bug reports  255 active developers  From four open source projects Information & Database Systems Lab  Approaches  PureCBR: State of the art CBR-based approach  CBCF: Original CBCF  CosTraige: Our approach
  • 27. Experiments  Two research questions Q1. How much can our approach improve cost (bug fix time) without sacrificing bug assignment accuracy? Q2. What are the trade-offs between accuracy and cost (bug fix Information & Database Systems Lab time)?  Evaluation measures W is the set of bug reports predicted correctly. N is the number of bug reports in the test set.  The real fix time is unknown, we only use the fix time for correctly matched bugs.
  • 28. Experiments  Relative errors of expected bug fix time Information & Database Systems Lab  Improvement of bug fix time (Q1)  CosTriage improves the costs efficiently up to 30% without seriously compromising accuracy
  • 29. Experiments  Trade-off between accuracy and bug fix time (Q2) Information & Database Systems Lab
  • 30. Conclusion  We proposed a new bug triaging technique  Optimize not only accuracy but also cost  Solve data sparseness problem by using topic modeling Information & Database Systems Lab  Experiments using four real bug report corpora  Improve the cost without heavy losses of accuracy
  • 31. Q&A Information & Database Systems Lab Thank you! Do you have any questions?
  • 32. Contact Information & Database Systems Lab Jin-woo Park jwpark85@postech.ac.kr
  • 33. Back up  We adopt the divergence measure proposed in [Arun10]  Finding the natural number of topics (# bug types) Information & Database Systems Lab t is the natural number of bug types Mozilla Apache
  • 34. Back up - Bug features  Bug features  Keywords of title and description  Other meta data Information & Database Systems Lab Title : Traditional Memory Rendering refactoring request New Bug Description : Request additional refactoring so we can … Report Traditional Rendering. Remove stopwords Traditional Memory Rendering refactoring request Request refactoring … Traditional Rendering