SlideShare ist ein Scribd-Unternehmen logo
1 von 34
CHANGES and BUGS
Mining and Predicting Software Development Activities




                  Thomas Zimmermann
Software development



         Build
Collaboration




Comm.      Version      Bug
Archive    Archive    Database


  Mining Software Archives
MY THESIS                                                             .
additions analysis architecture archives aspects   bug cached calls
changes collaboration complexities component concerns cross-
cutting cvs data defects design development drawing dynamine
eclipse effort evolves failures fine-grained fix fix-inducing
graphs     hatari   history locate matching method mining
predicting program programmers report repositories
revision software support system taking transactions
version visualizing
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (Software Evolution 2008).
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {         co-added
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...                                                      close
    // add listener for editor page activation      open
    getSite().getPage().addPartListener(partListener);      println
}

public void dispose() {          co-added
    ...
    getSite().getPage().removePartListener(partListener);
}                                                             begin




           Co-added items = patterns
Fine-grained analysis
                  public static ïŹnal native void _XFree(int address);
                  public static ïŹnal void XFree(int /*long*/ address) {
                        lock.lock();
                        try {
                              _XFree(address);
                        } ïŹnally {
                              lock.unlock();
                        }
                  }

                                  D IN
                              N GE I O N S
                          CHA CAT
                         1284 LO


Crosscutting changes = aspect candidates
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (Software Evolution 2008).
Bugs! Bugs! Bugs!
Quality assurance is limited...

   ...by time...   ...and by money.
Spent resources on the
components that need it most,
  i.e., are most likely to fail.
Indicators of defects

Code complexity              Code churn
Complex Code is more         Changes are likely to
prone to defects.            introduce new defects.



History                      Dependencies
Code with past defects is    Using compiler packages
more likely to have future   is more difficult than using
defects,                     packages for UI.
2252 Binaries
28.3 MLOC
Hypotheses

Complexity of dependency graphs                             Sub
                                                          system
correlates with the number of post-release defects (H1)    level
can predict the number of post-release defects (H2)



Network measures on dependency graphs                     Binary
correlate with the number of post-release defects (H3)     level

can predict the number of post-release defects (H4)
can indicate critical “escrow” binaries (H5)
DATA.   .
Data collection
                      six months
 Release point for
                       to collect
Windows Server 2003
                        defects



  Dependencies

Network Measures

Complexity Metrics     Defects
Centrality




Degree                         Closeness                           Betweenness
Blue binary has dependencies   Blue binary is close to all other   Blue binary connects the left
to many other binaries         binaries (only two steps)           with the right graph (bridge)
Centrality
‱ Degreethe number dependencies
          centrality
   -
   counts

‱ Closeness centrality binaries into account
   -
   takes distance to all other
   - Closeness: How close are the other binaries?
   - Reach: How many binaries can be reached (weighted)?
   - Eigenvector: similar to Pagerank
‱ Betweenness centrality paths through a binary
   -
   counts the number of shortest
Complexity metrics
Group                  Metrics                                 Aggregation
Module metrics         # functions in B
for a binary B         # global variables in B
                       # executable lines in f()
                       # parameters in f()
Per-function metrics                                              Total
                       # functions calling f()
for a function f()                                                Max
                       # functions called by f()
                       McCabe’s cyclomatic complexity of f()
                       # methods in C
                       # subclasses of C
OO metrics                                                        Total
                       Depth of C in the inheritance tree
for a class C                                                     Max
                       Coupling between classes
                       Cyclic coupling between classes
RESULTS.   .
Prediction


Input metrics and measures   Model        Prediction
                               PCA
                             Regression
  Metrics                                     ClassiïŹcation
                 SNA

 Metrics+SNA                                   Ranking
ClassiïŹcation


Has a binary a defect or not?




            or
Ranking


Which binaries have the most defects?




    or                or ... or
Random splits




4×50×
ClassiïŹcation
 (logistic regression)
ClassiïŹcation
            (logistic regression)




SNA increases the recall by 0.10 (at p=0.01)
  while precision remains comparable.
Ranking
          (linear regression)




SNA+METRICS increases the correlation
    by 0.10 (signiïŹcant at p=0.01)
FUTURE WORK                                                        .
                                         bug cached calls
                          bug changes collaboration
additions analysis architecture archives aspects
analysis archives aspects
changes collaboration complexities component concerns cross-
complexities component concerns cross-cutting cvs data defects
cutting cvs data defects design development drawing dynamine
design development drawing eclipse erose evolves factor
eclipse effort evolvesfix-inducing fine-grained fix fix-inducing
failures fine-grained fix
                          failures
                                   fm graphs guide hatari
graphs hatari history locate matching method mining
history human matching mining networking
predicting program programmers report repositories
predicting program programmers system report repositories
revision software support
                               quality
                                        taking transactions
revision social software support system taking version
version visualizing
"Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
"Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
"Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (Software Evolution 2008).

Weitere Àhnliche Inhalte

Was ist angesagt?

Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...
Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...
Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...Studio Fiorenzi Security & Forensics
 
Cyber Threat hunting workshop
Cyber Threat hunting workshopCyber Threat hunting workshop
Cyber Threat hunting workshopArpan Raval
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphTrey Grainger
 
ATT&CK Metaverse - Exploring the Limitations of Applying ATT&CK
ATT&CK Metaverse - Exploring the Limitations of Applying ATT&CKATT&CK Metaverse - Exploring the Limitations of Applying ATT&CK
ATT&CK Metaverse - Exploring the Limitations of Applying ATT&CKMITRE ATT&CK
 
ATT&CKing the Red/Blue Divide
ATT&CKing the Red/Blue DivideATT&CKing the Red/Blue Divide
ATT&CKing the Red/Blue DivideMITRE ATT&CK
 
Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...
Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...
Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...Adam Pennington
 
State of the ATT&CK
State of the ATT&CKState of the ATT&CK
State of the ATT&CKMITRE ATT&CK
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Sujit Pal
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jNeo4j
 
[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun Yen
[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun Yen[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun Yen
[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun YenCODE BLUE
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph DatabasesAntonio Maccioni
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...Neo4j
 
2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backend
2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backend2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backend
2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backendAPIsecure_ Official
 
Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...
Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...
Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...Mario Heiderich
 
ATT&CK Updates- ATT&CK for ICS
ATT&CK Updates- ATT&CK for ICSATT&CK Updates- ATT&CK for ICS
ATT&CK Updates- ATT&CK for ICSMITRE ATT&CK
 
Containerizing your Security Operations Center
Containerizing your Security Operations CenterContainerizing your Security Operations Center
Containerizing your Security Operations CenterJimmy Mesta
 
In the Wake of Kerberoast
In the Wake of KerberoastIn the Wake of Kerberoast
In the Wake of Kerberoastken_kitahara
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataGregg Kellogg
 
2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdf2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdfssuserc3d7ec1
 

Was ist angesagt? (20)

Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...
Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...
Enterprise Digital Forensics and Secuiryt with Open Source tools: Automate Au...
 
Cyber Threat hunting workshop
Cyber Threat hunting workshopCyber Threat hunting workshop
Cyber Threat hunting workshop
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
ATT&CK Metaverse - Exploring the Limitations of Applying ATT&CK
ATT&CK Metaverse - Exploring the Limitations of Applying ATT&CKATT&CK Metaverse - Exploring the Limitations of Applying ATT&CK
ATT&CK Metaverse - Exploring the Limitations of Applying ATT&CK
 
ATT&CKing the Red/Blue Divide
ATT&CKing the Red/Blue DivideATT&CKing the Red/Blue Divide
ATT&CKing the Red/Blue Divide
 
Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...
Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...
Anomali Detect 19 - Nickels & Pennington - Turning Intelligence into Action w...
 
State of the ATT&CK
State of the ATT&CKState of the ATT&CK
State of the ATT&CK
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun Yen
[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun Yen[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun Yen
[CB20] Pwning OT: Going in Through the Eyes by Ta-Lun Yen
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backend
2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backend2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backend
2022 APIsecure_Method for exploiting IDOR on nodejs+mongodb based backend
 
Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...
Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...
Copy & Pest - A case-study on the clipboard, blind trust and invisible cross-...
 
ATT&CK Updates- ATT&CK for ICS
ATT&CK Updates- ATT&CK for ICSATT&CK Updates- ATT&CK for ICS
ATT&CK Updates- ATT&CK for ICS
 
Containerizing your Security Operations Center
Containerizing your Security Operations CenterContainerizing your Security Operations Center
Containerizing your Security Operations Center
 
In the Wake of Kerberoast
In the Wake of KerberoastIn the Wake of Kerberoast
In the Wake of Kerberoast
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked Data
 
2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdf2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdf
 

Ähnlich wie Changes and Bugs: Mining and Predicting Development Activities

Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsThomas Zimmermann
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software developmentMartin Pinzger
 
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...Stefano Dalla Palma
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your CodeNate Abele
 
Measuring Your Code 2.0
Measuring Your Code 2.0Measuring Your Code 2.0
Measuring Your Code 2.0Nate Abele
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
Measuring maintainability; software metrics explained
Measuring maintainability; software metrics explainedMeasuring maintainability; software metrics explained
Measuring maintainability; software metrics explainedDennis de Greef
 
Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)Michel Wermelinger
 
CSMR06a.ppt
CSMR06a.pptCSMR06a.ppt
CSMR06a.pptPtidej Team
 
MSR Asia Summit
MSR Asia SummitMSR Asia Summit
MSR Asia SummitPtidej Team
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosaPharo
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningGuido A. Ciollaro
 
Software Architecture - Quiz Questions
Software Architecture - Quiz QuestionsSoftware Architecture - Quiz Questions
Software Architecture - Quiz QuestionsGanesh Samarthyam
 
Software Architecture - Quiz Questions
Software Architecture - Quiz QuestionsSoftware Architecture - Quiz Questions
Software Architecture - Quiz QuestionsCodeOps Technologies LLP
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The EnterpriseDaniel Egan
 
Bayesian network based software reliability prediction
Bayesian network based software reliability predictionBayesian network based software reliability prediction
Bayesian network based software reliability predictionJULIO GONZALEZ SANZ
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Martin Pinzger
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsRoberto Natella
 

Ähnlich wie Changes and Bugs: Mining and Predicting Development Activities (20)

Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency Graphs
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your Code
 
Measuring Your Code 2.0
Measuring Your Code 2.0Measuring Your Code 2.0
Measuring Your Code 2.0
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
Measuring maintainability; software metrics explained
Measuring maintainability; software metrics explainedMeasuring maintainability; software metrics explained
Measuring maintainability; software metrics explained
 
Of Bugs and Men
Of Bugs and MenOf Bugs and Men
Of Bugs and Men
 
Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)
 
CSMR06a.ppt
CSMR06a.pptCSMR06a.ppt
CSMR06a.ppt
 
MSR Asia Summit
MSR Asia SummitMSR Asia Summit
MSR Asia Summit
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosa
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
 
Software Architecture - Quiz Questions
Software Architecture - Quiz QuestionsSoftware Architecture - Quiz Questions
Software Architecture - Quiz Questions
 
Software Architecture - Quiz Questions
Software Architecture - Quiz QuestionsSoftware Architecture - Quiz Questions
Software Architecture - Quiz Questions
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The Enterprise
 
Bayesian network based software reliability prediction
Bayesian network based software reliability predictionBayesian network based software reliability prediction
Bayesian network based software reliability prediction
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
 

Mehr von Thomas Zimmermann

Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing InformationThomas Zimmermann
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsThomas Zimmermann
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development Thomas Zimmermann
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedCharacterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedThomas Zimmermann
 
Klingon Countdown Timer
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown TimerThomas Zimmermann
 
Data driven games user research
Data driven games user researchData driven games user research
Data driven games user researchThomas Zimmermann
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsNot my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsThomas Zimmermann
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchEmpirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchThomas Zimmermann
 
Security trend analysis with CVE topic models
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic modelsThomas Zimmermann
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software developmentThomas Zimmermann
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedThomas Zimmermann
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect predictionThomas Zimmermann
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceThomas Zimmermann
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Thomas Zimmermann
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringThomas Zimmermann
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVSThomas Zimmermann
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentThomas Zimmermann
 
Unit testing with JUnit
Unit testing with JUnitUnit testing with JUnit
Unit testing with JUnitThomas Zimmermann
 

Mehr von Thomas Zimmermann (20)

Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
 
MSR 2013 Preview
MSR 2013 PreviewMSR 2013 Preview
MSR 2013 Preview
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedCharacterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get Reopened
 
Klingon Countdown Timer
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown Timer
 
Data driven games user research
Data driven games user researchData driven games user research
Data driven games user research
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsNot my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignments
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchEmpirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft Research
 
Security trend analysis with CVE topic models
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic models
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software development
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixed
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
 
Meet Tom and his Fish
Meet Tom and his FishMeet Tom and his Fish
Meet Tom and his Fish
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVS
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
 
Unit testing with JUnit
Unit testing with JUnitUnit testing with JUnit
Unit testing with JUnit
 

KĂŒrzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

KĂŒrzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Changes and Bugs: Mining and Predicting Development Activities

  • 1. CHANGES and BUGS Mining and Predicting Software Development Activities Thomas Zimmermann
  • 3. Collaboration Comm. Version Bug Archive Archive Database Mining Software Archives
  • 4. MY THESIS . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing
  • 5. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (Software Evolution 2008).
  • 6. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { ... getSite().getPage().removePartListener(partListener); }
  • 7. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); }
  • 8. Fine-grained analysis public void createPartControl(Composite parent) { ... close // add listener for editor page activation open getSite().getPage().addPartListener(partListener); println } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); } begin Co-added items = patterns
  • 9. Fine-grained analysis public static ïŹnal native void _XFree(int address); public static ïŹnal void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } ïŹnally { lock.unlock(); } } D IN N GE I O N S CHA CAT 1284 LO Crosscutting changes = aspect candidates
  • 10. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (Software Evolution 2008).
  • 12. Quality assurance is limited... ...by time... ...and by money.
  • 13. Spent resources on the components that need it most, i.e., are most likely to fail.
  • 14. Indicators of defects Code complexity Code churn Complex Code is more Changes are likely to prone to defects. introduce new defects. History Dependencies Code with past defects is Using compiler packages more likely to have future is more difficult than using defects, packages for UI.
  • 16. Hypotheses Complexity of dependency graphs Sub system correlates with the number of post-release defects (H1) level can predict the number of post-release defects (H2) Network measures on dependency graphs Binary correlate with the number of post-release defects (H3) level can predict the number of post-release defects (H4) can indicate critical “escrow” binaries (H5)
  • 17. DATA. .
  • 18. Data collection six months Release point for to collect Windows Server 2003 defects Dependencies Network Measures Complexity Metrics Defects
  • 19. Centrality Degree Closeness Betweenness Blue binary has dependencies Blue binary is close to all other Blue binary connects the left to many other binaries binaries (only two steps) with the right graph (bridge)
  • 20. Centrality ‱ Degreethe number dependencies centrality - counts ‱ Closeness centrality binaries into account - takes distance to all other - Closeness: How close are the other binaries? - Reach: How many binaries can be reached (weighted)? - Eigenvector: similar to Pagerank ‱ Betweenness centrality paths through a binary - counts the number of shortest
  • 21. Complexity metrics Group Metrics Aggregation Module metrics # functions in B for a binary B # global variables in B # executable lines in f() # parameters in f() Per-function metrics Total # functions calling f() for a function f() Max # functions called by f() McCabe’s cyclomatic complexity of f() # methods in C # subclasses of C OO metrics Total Depth of C in the inheritance tree for a class C Max Coupling between classes Cyclic coupling between classes
  • 22. RESULTS. .
  • 23. Prediction Input metrics and measures Model Prediction PCA Regression Metrics ClassiïŹcation SNA Metrics+SNA Ranking
  • 24. ClassiïŹcation Has a binary a defect or not? or
  • 25. Ranking Which binaries have the most defects? or or ... or
  • 28. ClassiïŹcation (logistic regression) SNA increases the recall by 0.10 (at p=0.01) while precision remains comparable.
  • 29. Ranking (linear regression) SNA+METRICS increases the correlation by 0.10 (signiïŹcant at p=0.01)
  • 30. FUTURE WORK . bug cached calls bug changes collaboration additions analysis architecture archives aspects analysis archives aspects changes collaboration complexities component concerns cross- complexities component concerns cross-cutting cvs data defects cutting cvs data defects design development drawing dynamine design development drawing eclipse erose evolves factor eclipse effort evolvesfix-inducing fine-grained fix fix-inducing failures fine-grained fix failures fm graphs guide hatari graphs hatari history locate matching method mining history human matching mining networking predicting program programmers report repositories predicting program programmers system report repositories revision software support quality taking transactions revision social software support system taking version version visualizing
  • 31. "Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
  • 32. "Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
  • 33. "Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
  • 34. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (Software Evolution 2008).