SlideShare ist ein Scribd-Unternehmen logo
1 von 136
Downloaden Sie, um offline zu lesen
Mining Software Archives
to Support Software Development




         Tom Zimmermann
         Saarland University
Software Development


                 Hello
         Build   Calgary!
Software Development



         Build
Collaboration
Collaboration
Collaboration




Comm.
Archive
Collaboration




          Version
Comm.
          Archive
Archive
Collaboration




          Version
Comm.                 Bug
          Archive
Archive             Database
Collaboration




           Version
Comm.                   Bug
           Archive
Archive               Database


  Mining Software Archives
Mining Software Archives
Mining Software Archives




eROSE    BugCache   Vulture
eROSE
              Related Changes
                    (ICSE 2004, TSE 2005)




Tom Zimmermann • Saarland University
  Peter Weißgerber • University of Trier
     Stephan Diehl • University of Trier
   Andreas Zeller • Saarland University
Developers who changed this function
also changed...
eROSE: Guiding Developers

       Customers who
     bought this item also
          bought...




Purchase
 History
eROSE: Guiding Developers

                                    Developers who
       Customers who
                                  changed this function
     bought this item also
                                     also changed...
          bought...




                             Version
Purchase
                             Archive
 History
eROSE suggests further locations.
eROSE prevents incomplete changes.
Processing CVS data
Processing CVS data
Processing CVS data




  1. Comparing files
  2. Building transactions
Comparing Files
Comparing Files
A()


B()


C()


D()


E()
Comparing Files
A()          A()


B()          F()


C()          B()


D()          D()


E()          E()
Comparing Files
A()          A()


B()          F()


C()          B()


D()          D()


E()          E()
Building Transactions


   CVS
150,000
Building Transactions

                2003-02-19 (aweinand): fixed #13332
   CVS
                createGeneralPage()
                createTextComparePage()
150,000
                fKeys[]
                initDefaults()
                buildnotes_compare.html
                PatchMessages.properties
                plugin.properties
Building Transactions
                   same author + message + time

                2003-02-19 (aweinand): fixed #13332
   CVS
                createGeneralPage()
                createTextComparePage()
150,000
                fKeys[]
                initDefaults()
                buildnotes_compare.html
                PatchMessages.properties
                plugin.properties
Mining Associations

User changes fKeys[] and initDefaults()
Mining Associations
Mining Associations
EROSE
finds past
transactions
Mining Associations
                    #756                #6721               #21078
EROSE               fKeys[]             fKeys[]             fKeys[]
                    initDefaults()      initDefaults()      initDefaults()
finds past           ...                 ...                 ...
transactions        plugin.properties   plugin.properties   plugin.properties

#42432              #51345              #59998              #71003
fKeys[]             fKeys[]             fKeys[]             fKeys[]
initDefaults()      initDefaults()      initDefaults()      initDefaults()
...                 ...                 ...                 ...
plugin.properties   plugin.properties   plugin.properties   plugin.properties

#87264              #91220              #101823             #104223
fKeys[]             fKeys[]             fKeys[]             fKeys[]
initDefaults()      initDefaults()      initDefaults()      initDefaults()
...                 ...                 ...                 ...
                    plugin.properties   plugin.properties   plugin.properties
Mining Associations
                   #756                     #6721               #21078
EROSE              fKeys[]                  fKeys[]             fKeys[]
                   initDefaults()           initDefaults()      initDefaults()
finds past          ...                      ...                 ...
transactions       plugin.properties        plugin.properties   plugin.properties

#42432             #51345                   #59998              #71003
{fKeys[], initDefaults()}                       {plugin.properties}
fKeys[]           fKeys[]                   fKeys[]        fKeys[]
initDefaults()    initDefaults()            initDefaults()      initDefaults()
 Support 10, Confidence 10/11           =   0.909
...               ...                       ...                 ...
plugin.properties plugin.properties         plugin.properties   plugin.properties

#87264             #91220                   #101823             #104223
fKeys[]            fKeys[]                  fKeys[]             fKeys[]
initDefaults()     initDefaults()           initDefaults()      initDefaults()
...                ...                      ...                 ...
                   plugin.properties        plugin.properties   plugin.properties
Evaluation

                        GIMP




         PostgreSQL



                      KOffice
jEdit
Evaluation

EROSE predicts 33% of all changed entities.
                                         GIMP
(files: 44%)



               PostgreSQL



                                    KOffice
 jEdit
Evaluation

EROSE predicts 33% of all changed entities.
                                         GIMP
(files: 44%)

In 70% of all transactions, EROSE’s topmost
three suggestions contain a changed entity.
                 PostgreSQL
(files: 72%)


                                      KOffice
 jEdit
Evaluation

EROSE predicts 33% of all changed entities.
                                         GIMP
(files: 44%)

In 70% of all transactions, EROSE’s topmost
three suggestions contain a changed entity.
                 PostgreSQL
(files: 72%)

EROSE learns quickly (within 30 days).
                                       KOffice
 jEdit
eROSE
        Related Changes
            (ICSE 2004, TSE 2005)



guides developers

 non-program elements
   (documentation)

           learns quickly
BugCache
            Predicting Defects
                    (ASE 2006, ICSE 2007)




                            `

                        Sung Kim • MIT
Tom Zimmermann • Saarland University
  Jim Whitehead • Univ. of California SC
    Andreas Zeller • Saarland University
The Problem

     How should we
 allocate our resources
 for quality assurance?
One Solution

    List with elements that
       (will) have defects




         List is adaptive, i.e.,
        it changes over time
One Solution

    List with elements that
       (will) have defects

        Cache
         List is adaptive, i.e.,
        it changes over time
The BugCache Model

                 What is loaded in the
                 cache?

                                         Cache size: 2




Hypothesis: Temporal locality between defects
The BugCache Model

                 What is loaded in the
                 cache?

                                         Cache size: 2




Hypothesis: Temporal locality between defects
The BugCache Model

                 What is loaded in the
                 cache?

                                         Cache size: 2




Hypothesis: Temporal locality between defects
The BugCache Model

                 What is loaded in the
                 cache?

                                         Cache size: 2




Hypothesis: Temporal locality between defects
The BugCache Model

                 What is loaded in the
                 cache?

                                         Cache size: 2




Hypothesis: Temporal locality between defects
The BugCache Model

                 What is loaded in the
                 cache?

                                         Cache size: 2




          Miss

Hypothesis: Temporal locality between defects
The BugCache Model

                 What is loaded in the
                 cache?

                                         Cache size: 2




          Miss

Hypothesis: Temporal locality between defects
The BugCache Model


            Cache size: 2




  Miss
The BugCache Model


            Cache size: 2




  Miss
The BugCache Model


               Cache size: 2




  Miss   Hit
The BugCache Model


               Cache size: 2




  Miss   Hit
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


                        Cache size: 2




      Miss    Hit          Miss

Hit rate = #Hits / #Defects = 33.3%
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss         Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss         Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss         Miss
Loading Elements

Temporal locality – as shown before
Spatial locality – load “nearby” elements
(i.e., co-changed before)
Changed-entity locality – load changed elements
New-entity locality – load new elements
Initial pre-fetch – start with a loaded cache
Evaluation



                        Mozilla
jEdit
        PostgreSQL   Columba
Hit Rates
                 Methods               Files
Project      BugCache FixCache BugCache FixCache
Apache 1.3    59.6%    61.5%       83.9%       81.5%
Columba       58.9%    67.6%       83.5%       83.0%
Eclipse       64.5%    71.6%       95.1%       95.0%
JEdit         50.5%    48.9%       85.7%       85.4%
Mozilla       49.3%    55.0%       93.3%       88.0%
PostgreSQL    61.9%    59.2%       73.9%       71.0%
Subversion    68.3%    43.8%       82.0%       81.3%

                Cache size = 10%
Hit Rates
                 Methods               Files
Project      BugCache FixCache BugCache FixCache
Apache 1.3    59.6%    61.5%       83.9%       81.5%
Columba       58.9%    67.6%       83.5%       83.0%
Eclipse       64.5%    71.6%       95.1%       95.0%
JEdit         50.5%    48.9%       85.7%       85.4%
Mozilla       49.3%    55.0%       93.3%       88.0%
PostgreSQL    61.9%    59.2%       73.9%       71.0%
Subversion    68.3%    43.8%       82.0%       81.3%

                Cache size = 10%
Reasons for Hits
                   Initial pre-fetch
Spatial locality          18%
     18%




                                       Initial pre-fetch
                                       Temporal locality
            Temporal locality          Spatial locality
                                       Changed-entity locality
                 60%                   New-entity locality
Warning Developers

 “Safe” Location
(not in FixCache)



  Risky Location
(red, in FixCache)
BugCache
       Predicting Defects
             (ASE 2006, ICSE 2007)




temporal locality

       adaptive
    hit rates of 71%~95%
Vulture
                   Predicting
      Security Vulnerabilities
                      (Work in Progress)




 Stephan Neuhaus • Saarland University
Tom Zimmermann • Saarland University
   Andreas Zeller • Saarland University
Firefox/Mozilla
  >700 developers         228,365 commits




 14,368 C/C++ files
                          1,012,512 revisions
(10,452 components)
>700 developers     228,365 commits




 14,368 C/C++ files
                      1,012,512 revisions
(10,452 components)
Vulnerabilities
Vulnerabilities
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities
      Security Advisory 2005-12
    Title: Livefeed bookmarks can steal cookies
    Impact: High
    Products: Firefox
    Description: Earlier versions of Firefox allowed
    javascript: and data: URLs as Livefeed bookmarks.
    When they updated the URL would be run in the
    context of the current page and could be used to
    steal cookies or data displayed on the page. If the
    user were on a page with elevated privileges (for
    example, about:config) when the Livefeed was
    updated, the feed URL could potentially run
    arbitrary code on the user's machine.




0
    Vulnerabilities
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities
      Security Advisory 2005-13
    Title: Window Injection Spoofing
    Severity: Low
    Products: Firefox, Mozilla Suite
    Description: A website can inject content into a
    popup opened by another site if the target name
    of the popup window is known. An attacker who
    knows you are going to visit that other site could
    spoof the contents of the popup.




0
    Vulnerabilities
Vulnerabilities
      Security Advisory 2005-15
                        2005-41
                        2005-16
                        2006-76
                        2005-14
    Title: Heap overflow possible security dialogs
    Title: Spoofing escalation via DOM property
            XSS quot;secure sitequot;window's Function
            Privilege download and in UTF8 to object
            SSL using outer indicator spoofing
    Impact: Moderate
    Unicode conversion
    overrides High
    with overlapping windows
    Severity:
    Products:Critical 2.0
    Severity: High
    Products: Firefox Mozilla Suite
                  Firefox,
    Description:Various schemesdemonstrated
    Products: Firefox, Thunderbird, Mozilla Suitethat
    Description: moz_bug_r_a4 were reported
                           Mozilla Suite
    Description: It thepossible forreportedstringin
    the Function prototype regressionlock icon to with
    that could causeMichael Kraxsitequot; UTF8 several
                      moz_bug_r_a4 a described
                        is quot;secure demonstrates that
    the download dialog trigger details overflow be
    bug 355161 couldto and security dialogs the
    exploitsand show attacker the ability tothe wrong
    invalid sequences certificate a heap bypass can of
    appear giving an be exploited to for install
    malicious could be data. by requiring would
    spoofed byUnicode cross Exploitability only
    convertedcode or steal data,phishers to an that
    site. These against used site script (XSS)
    protections partially covering them with make
    injection, which could be used to particularly a
    the user do commonplace users get click onin
    overlapping window. Some actionsstealthe string
    depend on the attackers abilityto may not notice
    their spoofs look more legitimate, like credentials
    or the buggyhide the and browser or perform
    link or window from arbitrary sitescommon
    thesensitive the context menu. Theshowing the
    intoOS opendataborderaddress barweb content is
    windows that converter. General         statusbar
    destructive actions on privileged rule out
    cause in what appears to be of a logged-in and
    bisectingeach case was behalf a single dialog,user.
    converted elsewhere but we can'tUI code the be
    true location.
    (quot;chromequot;) being overly attack.
    convinced by the spoofing text of the top-most
    possibility of a successfultrusting of DOM nodes
    from the content window.
    window to click on the quot;Allowquot; or quot;Openquot; button
    of the window below.




0
    Vulnerabilities
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities
10,452 components

    424 vulnerable

     4.05%
0
    Vulnerabilities
Vulnerabilities

             What other
           components are
             vulnerable?




0
    Vulnerabilities
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities




0
    Vulnerabilities                 ?
Vulnerabilities
             Is this new
          component likely
          to be vulnerable?




0
    Vulnerabilities                  ?
Vulture
                                Code
Vulnerability   Version          Code
                                  Code
 Database       Archive            Code
                          Redo diagram
Vulture
                                Code
Vulnerability   Version          Code
                                  Code
 Database       Archive            Code
                          Redo diagram




                Vulture
Vulture
                                     Code
   Vulnerability    Version           Code
                                       Code
    Database        Archive             Code
                               Redo diagram




                   Vulture




                   Component
Component                              Component
Vulture
                                     Code
   Vulnerability    Version           Code
                                       Code
    Database        Archive             Code
                               Redo diagram




                   Vulture                         Predictor




                   Component
Component                              Component
Vulture
                                     Code
   Vulnerability    Version           Code
                                       Code          Code
    Database        Archive             Code
                               Redo diagram




                   Vulture                         Predictor




                   Component
Component                              Component
Correlations
Correlations
Programmer            Code Complexity




  Language
Correlations

                    Code Complexity




Language
Correlations




Language
Correlations




Language
                   Problem Domain
Imports
Imports




GUI   Database   Certificates   OS
Imports




GUI   Database   Certificates   OS
Imports




GUI   Database   Certificates   OS
Example (1)


                     nsIContent.h




                  nsIContentUtils.h




              nsIScriptSecurityManager.h
Example (1)


                     nsIContent.h




    import
                  nsIContentUtils.h




              nsIScriptSecurityManager.h
Example (1)
                    ✘
✘           ✘
    ✘           ✘
        ✘                                nsIContent.h
                    ✘
✘           ✘
    ✘✘          ✘       import
                    ✘
✘           ✘                         nsIContentUtils.h

    ✘           ✘
                             95.5%
        ✘
        ✔   ✘
✘                   ✘
                                  nsIScriptSecurityManager.h
Example (2)



              nsIPrivateDOMEvent.h




               nsReadableUtils.h
Example (2)



    import    nsIPrivateDOMEvent.h




               nsReadableUtils.h
Example (2)
                    ✘
✘           ✘
    ✘           ✘
                    ✘
✘           ✘
    ✘           ✘       import    nsIPrivateDOMEvent.h


                    ✘
✘           ✘
    ✘
                             100%
                ✘
        ✘
            ✘
✘                   ✘              nsReadableUtils.h
Research Questions


• How well do imports predict vulnerabilities?
• Can imports be used for
  − classification (vulnerable or not) and for
  − regression (number of vulnerabilities)?
Input Data


     nsCOMArray              0
   nsIDocument.h             1
        nspr_md.h            0
 nsDOMClassInfo              10
 EmbedGTKTools               0
MozillaControl.cpp           0

       nsDOMClassInfo has had 10
     vulnerability-related bug reports
Input Data




                                                e. am t.h
                                                           h
                                                        e.
                                            re Fr c
                                         bt ack nne



                                                      e
                                                    or
                                            St o
                                            di h
                                             s/fi h




                                                  m
                                         ns PC
                                         st le.




                                         9, h
                                         ut o.h
                                         sy pl.




                                                9
                                             il.h
                                             IX
                                            Im




                                           05
                                         ns
                                         ss
     nsCOMArray              0           1   0   0    0   1   0    0
   nsIDocument.h             1           0   0   1    0   0   1    0
        nspr_md.h            0           0   1   1    0   0   1    0
 nsDOMClassInfo              10          0   0   1    0   1   0    0
 EmbedGTKTools               0           0   0   0    0   1   0    0
MozillaControl.cpp           0           0   1   0    1   0   0    0

       nsDOMClassInfo has had 10          nsDOMClassInfo imports
     vulnerability-related bug reports       “nsIXPConnect.h”
Distribution
ibution of MFSAs                                       Distribution of Bug Reports


                                               300
                        Number of Components

                                               20 50
                                               5
                                               12




5   7    9   11    13                                  13579         13   17         24

umber of MFSAs                                             Number of Bug Reports
Experiments

• 40 randomtraining set, 3,484 rows in validation set
                splits
  6,968 rows in

• Classification recall and precision
  Train SVM, compute

• Regression rank correlation on top 1%
  Train SVM, compute

• SVM: linear kernel10GB ofdefault parameters
                          with
  R implementation (up to      main memory)
Results

                          (a) Precision and Recall                                                                        (b) Rank Correlation
            0.55




                                                                                                      1.0
                                                                                                                                                                                                      ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                              ●
                                                                                                                                                                                          ●
                                           ●                                                                                                                                             ●
                                                                                                                                                                                     ●




                                                                            Cumulative Distribution
                                                                                                                                                                                 ●




                                                                                                      0.8
                                                                                                                                                                                ●
                                    ●
            0.50




                                                                                                                                                                               ●
                                                        ●                                                                                                                      ●
                                                                                                                                                                           ●
                                               ●                                                                                                                       ●
                               ●●                                                                                                                                      ●
                                                                                                                                                                      ●
                                     ●●●           ●●                                                                                                             ●
                           ●                            ●




                                                                                                      0.6
Precision




                                                                                                                                                                  ●
                                                                                                                                                                 ●
                                                                                                                                                                 ●
                    ●                                            ●
            0.45




                                                                                                                                                               ●
                                          ●●                                                                                                                   ●
                                                                                                                                                           ●
                                      ●       ●             ●                                                                                              ●
                                          ●                                                                                                                ●




                                                                                                      0.4
                                                                                                                                                          ●
                                  ●                          ●                                                                                      ●
                                    ●●●      ●              ●                                                                                      ●
                                  ●               ●                                                                                                ●
                                                                                                                                                   ●
                                        ●
            0.40




                                                                                                                                               ●
                                     ●                                                                                                         ●
                                                                                                                                           ●
                               ●    ●




                                                                                                      0.2
                                                                                                                                          ●
                                 ●●             ●                                                                                     ●
                                                                                                                                     ●
                     ●          ●                                                                                                ●
                                                                                                                                ●
                                                                                                                                ●
                                                                                                                            ●
            0.35




                                                                                                                      ●




                                                                                                      0.0
                                                                                                                  ●




                   0.55     0.60        0.65          0.70           0.75                                   0.2           0.3             0.4           0.5            0.6                        0.7

                                        Recall                                                                                       Rank Correlation
Results

                          (a) Precision and Recall                                                                        (b) Rank Correlation
            0.55




                                                                                                      1.0
                                                                                                                                                                                                      ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                              ●
                                                                                                                                                                                          ●
                                           ●                                                                                                                                             ●
                                                                                                                                                                                     ●




                                                                            Cumulative Distribution
                                                                                                                                                                                 ●




                                                                                                      0.8
                                                                                                                                                                                ●
                                    ●
            0.50




                                                                                                                                                                               ●
                                                        ●                                                                                                                      ●
                                                                                                                                                                           ●
                                               ●                                                                                                                       ●
                               ●●                                                                                                                                      ●
                                                                                                                                                                      ●
                                     ●●●           ●●                                                                                                             ●
                           ●                            ●




                                                                                                      0.6
Precision




                                                                                                                                                                  ●
                                                                                                                                                                 ●
                                                                                                                                                                 ●
                    ●                                            ●
            0.45




                                                                                                                                                               ●
                                          ●●                                                                                                                   ●
                                                                                                                                                           ●
                                      ●       ●             ●                                                                                              ●
                                          ●                                                                                                                ●




                                                                                                      0.4
                                                                                                                                                          ●
                                  ●                          ●                                                                                      ●
                                    ●●●      ●              ●                                                                                      ●
                                  ●               ●                                                                                                ●
                                                                                                                                                   ●
                                        ●
            0.40




                                                                                                                                               ●
                                     ●                                                                                                         ●
                                                                                                                                           ●
                               ●    ●




                                                                                                      0.2
                                                                                                                                          ●
                                 ●●             ●                                                                                     ●
                                                                                                                                     ●
                     ●          ●                                                                                                ●
                                                                                                                                ●
                                                                                                                                ●
                                                                                                                            ●
            0.35




                                                                                                                      ●




                                                                                                      0.0
                                                                                                                  ●




                   0.55     0.60        0.65          0.70           0.75                                   0.2           0.3             0.4           0.5            0.6                        0.7

                                        Recall                                                                                       Rank Correlation




45% (about 1/2) of predictions correct
Results

                          (a) Precision and Recall                                                                        (b) Rank Correlation
            0.55




                                                                                                      1.0
                                                                                                                                                                                                      ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                              ●
                                                                                                                                                                                          ●
                                           ●                                                                                                                                             ●
                                                                                                                                                                                     ●




                                                                            Cumulative Distribution
                                                                                                                                                                                 ●




                                                                                                      0.8
                                                                                                                                                                                ●
                                    ●
            0.50




                                                                                                                                                                               ●
                                                        ●                                                                                                                      ●
                                                                                                                                                                           ●
                                               ●                                                                                                                       ●
                               ●●                                                                                                                                      ●
                                                                                                                                                                      ●
                                     ●●●           ●●                                                                                                             ●
                           ●                            ●




                                                                                                      0.6
Precision




                                                                                                                                                                  ●
                                                                                                                                                                 ●
                                                                                                                                                                 ●
                    ●                                            ●
            0.45




                                                                                                                                                               ●
                                          ●●                                                                                                                   ●
                                                                                                                                                           ●
                                      ●       ●             ●                                                                                              ●
                                          ●                                                                                                                ●




                                                                                                      0.4
                                                                                                                                                          ●
                                  ●                          ●                                                                                      ●
                                    ●●●      ●              ●                                                                                      ●
                                  ●               ●                                                                                                ●
                                                                                                                                                   ●
                                        ●
            0.40




                                                                                                                                               ●
                                     ●                                                                                                         ●
                                                                                                                                           ●
                               ●    ●




                                                                                                      0.2
                                                                                                                                          ●
                                 ●●             ●                                                                                     ●
                                                                                                                                     ●
                     ●          ●                                                                                                ●
                                                                                                                                ●
                                                                                                                                ●
                                                                                                                            ●
            0.35




                                                                                                                      ●




                                                                                                      0.0
                                                                                                                  ●




                   0.55     0.60        0.65          0.70           0.75                                   0.2           0.3             0.4           0.5            0.6                        0.7

                                        Recall                                                                                       Rank Correlation



                2/3 of all vulnerable components detected
45% (about 1/2) of predictions correct
Results

                          (a) Precision and Recall                                                                        (b) Rank Correlation
            0.55




                                                                                                      1.0
                                                                                                                                                                                                      ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                              ●
                                                                                                                                                                                          ●
                                           ●                                                                                                                                             ●
                                                                                                                                                                                     ●




                                                                            Cumulative Distribution
                                                                                                                                                                                 ●




                                                                                                      0.8
                                                                                                                                                                                ●
                                    ●
            0.50




                                                                                                                                                                               ●
                                                        ●                                                                                                                      ●
                                                                                                                                                                           ●
                                               ●                                                                                                                       ●
                               ●●                                                                                                                                      ●
                                                                                                                                                                      ●
                                     ●●●           ●●                                                                                                             ●
                           ●                            ●




                                                                                                      0.6
Precision




                                                                                                                                                                  ●
                                                                                                                                                                 ●
                                                                                                                                                                 ●
                    ●                                            ●
            0.45




                                                                                                                                                               ●
                                          ●●                                                                                                                   ●
                                                                                                                                                           ●
                                      ●       ●             ●                                                                                              ●
                                          ●                                                                                                                ●




                                                                                                      0.4
                                                                                                                                                          ●
                                  ●                          ●                                                                                      ●
                                    ●●●      ●              ●                                                                                      ●
                                  ●               ●                                                                                                ●
                                                                                                                                                   ●
                                        ●
            0.40




                                                                                                                                               ●
                                     ●                                                                                                         ●
                                                                                                                                           ●
                               ●    ●




                                                                                                      0.2
                                                                                                                                          ●
                                 ●●             ●                                                                                     ●
                                                                                                                                     ●
                     ●          ●                                                                                                ●
                                                                                                                                ●
                                                                                                                                ●
                                                                                                                            ●
            0.35




                                                                                                                      ●




                                                                                                      0.0
                                                                                                                  ●




                   0.55     0.60        0.65          0.70           0.75                                   0.2           0.3             0.4           0.5            0.6                        0.7

                                        Recall                                                                                       Rank Correlation



                2/3 of all vulnerable components detected
45% (about 1/2) of predictions correct
Results
moderately strong correlation (mostly significant at p < 0.01)
                              (a) Precision and Recall                                                                        (b) Rank Correlation
                0.55




                                                                                                          1.0
                                                                                                                                                                                                          ●
                                                                                                                                                                                                      ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                              ●
                                               ●                                                                                                                                             ●
                                                                                                                                                                                         ●




                                                                                Cumulative Distribution
                                                                                                                                                                                     ●




                                                                                                          0.8
                                                                                                                                                                                    ●
                                        ●
                0.50




                                                                                                                                                                                   ●
                                                            ●                                                                                                                      ●
                                                                                                                                                                               ●
                                                   ●                                                                                                                       ●
                                   ●●                                                                                                                                      ●
                                                                                                                                                                          ●
                                         ●●●           ●●                                                                                                             ●
                               ●                            ●




                                                                                                          0.6
    Precision




                                                                                                                                                                      ●
                                                                                                                                                                     ●
                                                                                                                                                                     ●
                        ●                                            ●
                0.45




                                                                                                                                                                   ●
                                              ●●                                                                                                                   ●
                                                                                                                                                               ●
                                          ●       ●             ●                                                                                              ●
                                              ●                                                                                                                ●




                                                                                                          0.4
                                                                                                                                                              ●
                                      ●                          ●                                                                                      ●
                                        ●●●      ●              ●                                                                                      ●
                                      ●               ●                                                                                                ●
                                                                                                                                                       ●
                                            ●
                0.40




                                                                                                                                                   ●
                                         ●                                                                                                         ●
                                                                                                                                               ●
                                   ●    ●




                                                                                                          0.2
                                                                                                                                              ●
                                     ●●             ●                                                                                     ●
                                                                                                                                         ●
                         ●          ●                                                                                                ●
                                                                                                                                    ●
                                                                                                                                    ●
                                                                                                                                ●
                0.35




                                                                                                                          ●




                                                                                                          0.0
                                                                                                                      ●




                       0.55     0.60        0.65          0.70           0.75                                   0.2           0.3             0.4           0.5            0.6                        0.7

                                            Recall                                                                                       Rank Correlation



                   2/3 of all vulnerable components detected
   45% (about 1/2) of predictions correct
Ranking
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             95
 3     xpcprivate                  6
 4     jsxml                       2
 5     nsGenericHTMLElement        8
 6     jsgc                        3
 7     nsISEnvironment            12
 8     jsfun                       1
 9     nsHTMLLabelElement         18
 10    nsHttpTransaction          35
 ...   (3,474 components)
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             95
 3     xpcprivate                  6
 4     jsxml                       2
 5     nsGenericHTMLElement        8
 6     jsgc                        3
 7     nsISEnvironment            12
 8     jsfun                       1
 9     nsHTMLLabelElement         18
 10    nsHttpTransaction          35
 ...   (3,474 components)
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             95
 3     xpcprivate                  6
 4     jsxml                       2
 5     nsGenericHTMLElement        8
 6     jsgc                        3
 7     nsISEnvironment            12
 8     jsfun                       1
 9     nsHTMLLabelElement         18
 10    nsHttpTransaction          35
 ...   (3,474 components)
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             95
 3     xpcprivate                  6
 4     jsxml                       2
 5     nsGenericHTMLElement        8
 6     jsgc                        3
 7     nsISEnvironment            12
 8     jsfun                       1
 9     nsHTMLLabelElement         18
 10    nsHttpTransaction          35
 ...   (3,474 components)
Similar Results for Bugs

        Packages + Import relationships
        (ISESE 2006)


        Precision: 66.7% Recall: 69.4%


        Binaries + Dependencies
        (Internship @ Microsoft Research, 2006)


        Precision: 64.4% Recall: 75.3%
Vulture
                 Predicting
    Security Vulnerabilities
                (Work in Progress)




locates past + predicts new
       vulnerabilities


  problem domain
Future
 Work


    ?
#1: Mining across Projects


            • Complement source
              code search engines
              with mining techniques.
            • Large-scale mining
              (144,000 SF projects)
#2: Developer Buddy




               MOCKUP
eROSE   BugCache   Vulture
automatic




  eROSE     BugCache   Vulture
automatic
                       large-scale




  eROSE     BugCache        Vulture
automatic
                           large-scale




  eROSE         BugCache        Vulture


    tool-oriented
automatic
                    large-scale


       Empirical Software
        Engineering 2.0


    tool-oriented
automatic
                     large-scale


       Empirical Software
        Engineering 2.0


    tool-oriented   Thanks! Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

To Err Is Human
To Err Is HumanTo Err Is Human
To Err Is HumanAlex Liu
 
Approximating Change Sets at Philips Healthcare: A Case Study
Approximating Change Sets at Philips Healthcare: A Case StudyApproximating Change Sets at Philips Healthcare: A Case Study
Approximating Change Sets at Philips Healthcare: A Case StudyRahul Premraj
 
Oracle 10g Performance: chapter 04 new features
Oracle 10g Performance: chapter 04 new featuresOracle 10g Performance: chapter 04 new features
Oracle 10g Performance: chapter 04 new featuresKyle Hailey
 
Riak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup GroupRiak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup Groupsiculars
 
Programming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialProgramming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialJeff Smith
 
Java设置环境变量
Java设置环境变量Java设置环境变量
Java设置环境变量Zianed Hou
 
Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'Jen Andre
 
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...EPAM_Systems_Bulgaria
 
Yapc asia 2011_zigorou
Yapc asia 2011_zigorouYapc asia 2011_zigorou
Yapc asia 2011_zigorouToru Yamaguchi
 

Was ist angesagt? (11)

To Err Is Human
To Err Is HumanTo Err Is Human
To Err Is Human
 
Approximating Change Sets at Philips Healthcare: A Case Study
Approximating Change Sets at Philips Healthcare: A Case StudyApproximating Change Sets at Philips Healthcare: A Case Study
Approximating Change Sets at Philips Healthcare: A Case Study
 
Oracle 10g Performance: chapter 04 new features
Oracle 10g Performance: chapter 04 new featuresOracle 10g Performance: chapter 04 new features
Oracle 10g Performance: chapter 04 new features
 
Riak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup GroupRiak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup Group
 
T3dd10 git
T3dd10 gitT3dd10 git
T3dd10 git
 
Programming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialProgramming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorial
 
Java设置环境变量
Java设置环境变量Java设置环境变量
Java设置环境变量
 
Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'
 
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...
Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zook...
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
 
Yapc asia 2011_zigorou
Yapc asia 2011_zigorouYapc asia 2011_zigorou
Yapc asia 2011_zigorou
 

Andere mochten auch

Mining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR ModelsMining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR ModelsSAIL_QU
 
Mineograph Mining Automation Software
Mineograph Mining Automation SoftwareMineograph Mining Automation Software
Mineograph Mining Automation SoftwareMineograph Software
 
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차JM code group
 
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...Norihiro Yoshida
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison Esteban Alcaide
 
임태현, software catastrophe
임태현, software catastrophe임태현, software catastrophe
임태현, software catastrophe태현 임
 
Review of scheduling algorithms in Open Pit Mining
Review of scheduling algorithms in Open Pit MiningReview of scheduling algorithms in Open Pit Mining
Review of scheduling algorithms in Open Pit MiningJose Gonzales, MBA
 
Model Comparison for Delta-Compression
Model Comparison for Delta-CompressionModel Comparison for Delta-Compression
Model Comparison for Delta-CompressionMarkus Scheidgen
 
An Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesAn Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesSAIL_QU
 
MSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick TriggerMSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick TriggerXin Yang
 
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자Dylan Ko
 
MSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review RepositoriesMSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review RepositoriesXin Yang
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝Keunhyun Oh
 
고품질 Sw와 개발문화
고품질 Sw와 개발문화고품질 Sw와 개발문화
고품질 Sw와 개발문화도형 임
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation DefenseSung Kim
 
위대한개발문화
위대한개발문화위대한개발문화
위대한개발문화신승환
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 

Andere mochten auch (20)

Mining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR ModelsMining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR Models
 
Mineograph Mining Automation Software
Mineograph Mining Automation SoftwareMineograph Mining Automation Software
Mineograph Mining Automation Software
 
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
 
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison
 
임태현, software catastrophe
임태현, software catastrophe임태현, software catastrophe
임태현, software catastrophe
 
Review of scheduling algorithms in Open Pit Mining
Review of scheduling algorithms in Open Pit MiningReview of scheduling algorithms in Open Pit Mining
Review of scheduling algorithms in Open Pit Mining
 
Model Comparison for Delta-Compression
Model Comparison for Delta-CompressionModel Comparison for Delta-Compression
Model Comparison for Delta-Compression
 
An Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesAn Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub Repositories
 
MSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick TriggerMSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick Trigger
 
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
 
MSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review RepositoriesMSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review Repositories
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝
 
고품질 Sw와 개발문화
고품질 Sw와 개발문화고품질 Sw와 개발문화
고품질 Sw와 개발문화
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defense
 
위대한개발문화
위대한개발문화위대한개발문화
위대한개발문화
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 

Ähnlich wie Mining Software Archives to Support Software Development

TAROT2013 Testing School - Leonardo Mariani presentation
TAROT2013 Testing School - Leonardo Mariani presentationTAROT2013 Testing School - Leonardo Mariani presentation
TAROT2013 Testing School - Leonardo Mariani presentationHenry Muccini
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and ResilienceMike Brittain
 
Neal Ford Emergent Design And Evolutionary Architecture
Neal Ford Emergent Design And Evolutionary ArchitectureNeal Ford Emergent Design And Evolutionary Architecture
Neal Ford Emergent Design And Evolutionary ArchitectureThoughtworks
 
Hidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-PersistenceHidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-PersistenceSven Ruppert
 
Test First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in GrailsTest First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in GrailsTim Berglund
 
Advanced Topics in Continuous Deployment
Advanced Topics in Continuous DeploymentAdvanced Topics in Continuous Deployment
Advanced Topics in Continuous DeploymentMike Brittain
 
Test First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in GrailsTest First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in GrailsTim Berglund
 
maven-for-maine-jug-090226091601-phpapp02.ppt
maven-for-maine-jug-090226091601-phpapp02.pptmaven-for-maine-jug-090226091601-phpapp02.ppt
maven-for-maine-jug-090226091601-phpapp02.pptnikhilmahendranath1
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsMike Brittain
 
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart Docker, Inc.
 
Cより速いRubyプログラム
Cより速いRubyプログラムCより速いRubyプログラム
Cより速いRubyプログラムkwatch
 
GeeCON 2017 - TestContainers. Integration testing without the hassle
GeeCON 2017 - TestContainers. Integration testing without the hassleGeeCON 2017 - TestContainers. Integration testing without the hassle
GeeCON 2017 - TestContainers. Integration testing without the hassleAnton Arhipov
 
The Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To KnowThe Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To KnowAll Things Open
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"Daniel Bryant
 
【前端Mvc】之豆瓣说实践
【前端Mvc】之豆瓣说实践【前端Mvc】之豆瓣说实践
【前端Mvc】之豆瓣说实践taobao.com
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVMJohn Lee
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
Automated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE ApplicationsAutomated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE ApplicationsJelastic Multi-Cloud PaaS
 

Ähnlich wie Mining Software Archives to Support Software Development (20)

TAROT2013 Testing School - Leonardo Mariani presentation
TAROT2013 Testing School - Leonardo Mariani presentationTAROT2013 Testing School - Leonardo Mariani presentation
TAROT2013 Testing School - Leonardo Mariani presentation
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
 
Neal Ford Emergent Design And Evolutionary Architecture
Neal Ford Emergent Design And Evolutionary ArchitectureNeal Ford Emergent Design And Evolutionary Architecture
Neal Ford Emergent Design And Evolutionary Architecture
 
Hidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-PersistenceHidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-Persistence
 
Test First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in GrailsTest First Refresh Second: Test-Driven Development in Grails
Test First Refresh Second: Test-Driven Development in Grails
 
Advanced Topics in Continuous Deployment
Advanced Topics in Continuous DeploymentAdvanced Topics in Continuous Deployment
Advanced Topics in Continuous Deployment
 
Test First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in GrailsTest First, Refresh Second: Web App TDD in Grails
Test First, Refresh Second: Web App TDD in Grails
 
maven-for-maine-jug-090226091601-phpapp02.ppt
maven-for-maine-jug-090226091601-phpapp02.pptmaven-for-maine-jug-090226091601-phpapp02.ppt
maven-for-maine-jug-090226091601-phpapp02.ppt
 
Behat 3.0 meetup (March)
Behat 3.0 meetup (March)Behat 3.0 meetup (March)
Behat 3.0 meetup (March)
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart
Thinking Inside the Container: A Continuous Delivery Story by Maxfield Stewart
 
Cより速いRubyプログラム
Cより速いRubyプログラムCより速いRubyプログラム
Cより速いRubyプログラム
 
GeeCON 2017 - TestContainers. Integration testing without the hassle
GeeCON 2017 - TestContainers. Integration testing without the hassleGeeCON 2017 - TestContainers. Integration testing without the hassle
GeeCON 2017 - TestContainers. Integration testing without the hassle
 
The Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To KnowThe Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To Know
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
 
【前端Mvc】之豆瓣说实践
【前端Mvc】之豆瓣说实践【前端Mvc】之豆瓣说实践
【前端Mvc】之豆瓣说实践
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVM
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
Automated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE ApplicationsAutomated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE Applications
 

Mehr von Thomas Zimmermann

Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing InformationThomas Zimmermann
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsThomas Zimmermann
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development Thomas Zimmermann
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedCharacterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedThomas Zimmermann
 
Data driven games user research
Data driven games user researchData driven games user research
Data driven games user researchThomas Zimmermann
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsNot my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsThomas Zimmermann
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchEmpirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchThomas Zimmermann
 
Security trend analysis with CVE topic models
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic modelsThomas Zimmermann
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software developmentThomas Zimmermann
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedThomas Zimmermann
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect predictionThomas Zimmermann
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsThomas Zimmermann
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceThomas Zimmermann
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Thomas Zimmermann
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringThomas Zimmermann
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVSThomas Zimmermann
 

Mehr von Thomas Zimmermann (20)

Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
 
MSR 2013 Preview
MSR 2013 PreviewMSR 2013 Preview
MSR 2013 Preview
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedCharacterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get Reopened
 
Klingon Countdown Timer
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown Timer
 
Data driven games user research
Data driven games user researchData driven games user research
Data driven games user research
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsNot my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignments
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchEmpirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft Research
 
Security trend analysis with CVE topic models
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic models
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software development
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixed
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency Graphs
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
 
Meet Tom and his Fish
Meet Tom and his FishMeet Tom and his Fish
Meet Tom and his Fish
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVS
 

Kürzlich hochgeladen

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Kürzlich hochgeladen (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Mining Software Archives to Support Software Development

  • 1. Mining Software Archives to Support Software Development Tom Zimmermann Saarland University
  • 2. Software Development Hello Build Calgary!
  • 7. Collaboration Version Comm. Archive Archive
  • 8. Collaboration Version Comm. Bug Archive Archive Database
  • 9. Collaboration Version Comm. Bug Archive Archive Database Mining Software Archives
  • 12. eROSE Related Changes (ICSE 2004, TSE 2005) Tom Zimmermann • Saarland University Peter Weißgerber • University of Trier Stephan Diehl • University of Trier Andreas Zeller • Saarland University
  • 13.
  • 14.
  • 15.
  • 16. Developers who changed this function also changed...
  • 17. eROSE: Guiding Developers Customers who bought this item also bought... Purchase History
  • 18. eROSE: Guiding Developers Developers who Customers who changed this function bought this item also also changed... bought... Version Purchase Archive History
  • 19.
  • 20.
  • 21.
  • 23.
  • 27. Processing CVS data 1. Comparing files 2. Building transactions
  • 30. Comparing Files A() A() B() F() C() B() D() D() E() E()
  • 31. Comparing Files A() A() B() F() C() B() D() D() E() E()
  • 32. Building Transactions CVS 150,000
  • 33. Building Transactions 2003-02-19 (aweinand): fixed #13332 CVS createGeneralPage() createTextComparePage() 150,000 fKeys[] initDefaults() buildnotes_compare.html PatchMessages.properties plugin.properties
  • 34. Building Transactions same author + message + time 2003-02-19 (aweinand): fixed #13332 CVS createGeneralPage() createTextComparePage() 150,000 fKeys[] initDefaults() buildnotes_compare.html PatchMessages.properties plugin.properties
  • 35. Mining Associations User changes fKeys[] and initDefaults()
  • 38. Mining Associations #756 #6721 #21078 EROSE fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() finds past ... ... ... transactions plugin.properties plugin.properties plugin.properties #42432 #51345 #59998 #71003 fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() ... ... ... ... plugin.properties plugin.properties plugin.properties plugin.properties #87264 #91220 #101823 #104223 fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() ... ... ... ... plugin.properties plugin.properties plugin.properties
  • 39. Mining Associations #756 #6721 #21078 EROSE fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() finds past ... ... ... transactions plugin.properties plugin.properties plugin.properties #42432 #51345 #59998 #71003 {fKeys[], initDefaults()} {plugin.properties} fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() Support 10, Confidence 10/11 = 0.909 ... ... ... ... plugin.properties plugin.properties plugin.properties plugin.properties #87264 #91220 #101823 #104223 fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() ... ... ... ... plugin.properties plugin.properties plugin.properties
  • 40. Evaluation GIMP PostgreSQL KOffice jEdit
  • 41. Evaluation EROSE predicts 33% of all changed entities. GIMP (files: 44%) PostgreSQL KOffice jEdit
  • 42. Evaluation EROSE predicts 33% of all changed entities. GIMP (files: 44%) In 70% of all transactions, EROSE’s topmost three suggestions contain a changed entity. PostgreSQL (files: 72%) KOffice jEdit
  • 43. Evaluation EROSE predicts 33% of all changed entities. GIMP (files: 44%) In 70% of all transactions, EROSE’s topmost three suggestions contain a changed entity. PostgreSQL (files: 72%) EROSE learns quickly (within 30 days). KOffice jEdit
  • 44. eROSE Related Changes (ICSE 2004, TSE 2005) guides developers non-program elements (documentation) learns quickly
  • 45. BugCache Predicting Defects (ASE 2006, ICSE 2007) ` Sung Kim • MIT Tom Zimmermann • Saarland University Jim Whitehead • Univ. of California SC Andreas Zeller • Saarland University
  • 46. The Problem How should we allocate our resources for quality assurance?
  • 47. One Solution List with elements that (will) have defects List is adaptive, i.e., it changes over time
  • 48. One Solution List with elements that (will) have defects Cache List is adaptive, i.e., it changes over time
  • 49. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  • 50. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  • 51. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  • 52. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  • 53. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  • 54. The BugCache Model What is loaded in the cache? Cache size: 2 Miss Hypothesis: Temporal locality between defects
  • 55. The BugCache Model What is loaded in the cache? Cache size: 2 Miss Hypothesis: Temporal locality between defects
  • 56. The BugCache Model Cache size: 2 Miss
  • 57. The BugCache Model Cache size: 2 Miss
  • 58. The BugCache Model Cache size: 2 Miss Hit
  • 59. The BugCache Model Cache size: 2 Miss Hit
  • 60. The BugCache Model Cache size: 2 Miss Hit Miss
  • 61. The BugCache Model Cache size: 2 Miss Hit Miss
  • 62. The BugCache Model Cache size: 2 Miss Hit Miss Hit rate = #Hits / #Defects = 33.3%
  • 63. The BugCache Model Cache size: 2 Miss Hit Miss
  • 64. The BugCache Model Cache size: 2 Miss Hit Miss
  • 65. The BugCache Model Cache size: 2 Miss Hit Miss Miss
  • 66. The BugCache Model Cache size: 2 Miss Hit Miss Miss
  • 67. The BugCache Model Cache size: 2 Miss Hit Miss Miss
  • 68. Loading Elements Temporal locality – as shown before Spatial locality – load “nearby” elements (i.e., co-changed before) Changed-entity locality – load changed elements New-entity locality – load new elements Initial pre-fetch – start with a loaded cache
  • 69. Evaluation Mozilla jEdit PostgreSQL Columba
  • 70. Hit Rates Methods Files Project BugCache FixCache BugCache FixCache Apache 1.3 59.6% 61.5% 83.9% 81.5% Columba 58.9% 67.6% 83.5% 83.0% Eclipse 64.5% 71.6% 95.1% 95.0% JEdit 50.5% 48.9% 85.7% 85.4% Mozilla 49.3% 55.0% 93.3% 88.0% PostgreSQL 61.9% 59.2% 73.9% 71.0% Subversion 68.3% 43.8% 82.0% 81.3% Cache size = 10%
  • 71. Hit Rates Methods Files Project BugCache FixCache BugCache FixCache Apache 1.3 59.6% 61.5% 83.9% 81.5% Columba 58.9% 67.6% 83.5% 83.0% Eclipse 64.5% 71.6% 95.1% 95.0% JEdit 50.5% 48.9% 85.7% 85.4% Mozilla 49.3% 55.0% 93.3% 88.0% PostgreSQL 61.9% 59.2% 73.9% 71.0% Subversion 68.3% 43.8% 82.0% 81.3% Cache size = 10%
  • 72. Reasons for Hits Initial pre-fetch Spatial locality 18% 18% Initial pre-fetch Temporal locality Temporal locality Spatial locality Changed-entity locality 60% New-entity locality
  • 73. Warning Developers “Safe” Location (not in FixCache) Risky Location (red, in FixCache)
  • 74. BugCache Predicting Defects (ASE 2006, ICSE 2007) temporal locality adaptive hit rates of 71%~95%
  • 75. Vulture Predicting Security Vulnerabilities (Work in Progress) Stephan Neuhaus • Saarland University Tom Zimmermann • Saarland University Andreas Zeller • Saarland University
  • 76. Firefox/Mozilla >700 developers 228,365 commits 14,368 C/C++ files 1,012,512 revisions (10,452 components)
  • 77. >700 developers 228,365 commits 14,368 C/C++ files 1,012,512 revisions (10,452 components)
  • 80. Vulnerabilities 0 Vulnerabilities
  • 81. Vulnerabilities Security Advisory 2005-12 Title: Livefeed bookmarks can steal cookies Impact: High Products: Firefox Description: Earlier versions of Firefox allowed javascript: and data: URLs as Livefeed bookmarks. When they updated the URL would be run in the context of the current page and could be used to steal cookies or data displayed on the page. If the user were on a page with elevated privileges (for example, about:config) when the Livefeed was updated, the feed URL could potentially run arbitrary code on the user's machine. 0 Vulnerabilities
  • 82. Vulnerabilities 0 Vulnerabilities
  • 83. Vulnerabilities Security Advisory 2005-13 Title: Window Injection Spoofing Severity: Low Products: Firefox, Mozilla Suite Description: A website can inject content into a popup opened by another site if the target name of the popup window is known. An attacker who knows you are going to visit that other site could spoof the contents of the popup. 0 Vulnerabilities
  • 84. Vulnerabilities Security Advisory 2005-15 2005-41 2005-16 2006-76 2005-14 Title: Heap overflow possible security dialogs Title: Spoofing escalation via DOM property XSS quot;secure sitequot;window's Function Privilege download and in UTF8 to object SSL using outer indicator spoofing Impact: Moderate Unicode conversion overrides High with overlapping windows Severity: Products:Critical 2.0 Severity: High Products: Firefox Mozilla Suite Firefox, Description:Various schemesdemonstrated Products: Firefox, Thunderbird, Mozilla Suitethat Description: moz_bug_r_a4 were reported Mozilla Suite Description: It thepossible forreportedstringin the Function prototype regressionlock icon to with that could causeMichael Kraxsitequot; UTF8 several moz_bug_r_a4 a described is quot;secure demonstrates that the download dialog trigger details overflow be bug 355161 couldto and security dialogs the exploitsand show attacker the ability tothe wrong invalid sequences certificate a heap bypass can of appear giving an be exploited to for install malicious could be data. by requiring would spoofed byUnicode cross Exploitability only convertedcode or steal data,phishers to an that site. These against used site script (XSS) protections partially covering them with make injection, which could be used to particularly a the user do commonplace users get click onin overlapping window. Some actionsstealthe string depend on the attackers abilityto may not notice their spoofs look more legitimate, like credentials or the buggyhide the and browser or perform link or window from arbitrary sitescommon thesensitive the context menu. Theshowing the intoOS opendataborderaddress barweb content is windows that converter. General statusbar destructive actions on privileged rule out cause in what appears to be of a logged-in and bisectingeach case was behalf a single dialog,user. converted elsewhere but we can'tUI code the be true location. (quot;chromequot;) being overly attack. convinced by the spoofing text of the top-most possibility of a successfultrusting of DOM nodes from the content window. window to click on the quot;Allowquot; or quot;Openquot; button of the window below. 0 Vulnerabilities
  • 85. Vulnerabilities 0 Vulnerabilities
  • 86. Vulnerabilities 10,452 components 424 vulnerable 4.05% 0 Vulnerabilities
  • 87. Vulnerabilities What other components are vulnerable? 0 Vulnerabilities
  • 88. Vulnerabilities 0 Vulnerabilities
  • 89. Vulnerabilities 0 Vulnerabilities ?
  • 90. Vulnerabilities Is this new component likely to be vulnerable? 0 Vulnerabilities ?
  • 91. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram
  • 92. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram Vulture
  • 93. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram Vulture Component Component Component
  • 94. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram Vulture Predictor Component Component Component
  • 95. Vulture Code Vulnerability Version Code Code Code Database Archive Code Redo diagram Vulture Predictor Component Component Component
  • 97. Correlations Programmer Code Complexity Language
  • 98. Correlations Code Complexity Language
  • 100. Correlations Language Problem Domain
  • 102. Imports GUI Database Certificates OS
  • 103. Imports GUI Database Certificates OS
  • 104. Imports GUI Database Certificates OS
  • 105. Example (1) nsIContent.h nsIContentUtils.h nsIScriptSecurityManager.h
  • 106. Example (1) nsIContent.h import nsIContentUtils.h nsIScriptSecurityManager.h
  • 107. Example (1) ✘ ✘ ✘ ✘ ✘ ✘ nsIContent.h ✘ ✘ ✘ ✘✘ ✘ import ✘ ✘ ✘ nsIContentUtils.h ✘ ✘ 95.5% ✘ ✔ ✘ ✘ ✘ nsIScriptSecurityManager.h
  • 108. Example (2) nsIPrivateDOMEvent.h nsReadableUtils.h
  • 109. Example (2) import nsIPrivateDOMEvent.h nsReadableUtils.h
  • 110. Example (2) ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ import nsIPrivateDOMEvent.h ✘ ✘ ✘ ✘ 100% ✘ ✘ ✘ ✘ ✘ nsReadableUtils.h
  • 111. Research Questions • How well do imports predict vulnerabilities? • Can imports be used for − classification (vulnerable or not) and for − regression (number of vulnerabilities)?
  • 112. Input Data nsCOMArray 0 nsIDocument.h 1 nspr_md.h 0 nsDOMClassInfo 10 EmbedGTKTools 0 MozillaControl.cpp 0 nsDOMClassInfo has had 10 vulnerability-related bug reports
  • 113. Input Data e. am t.h h e. re Fr c bt ack nne e or St o di h s/fi h m ns PC st le. 9, h ut o.h sy pl. 9 il.h IX Im 05 ns ss nsCOMArray 0 1 0 0 0 1 0 0 nsIDocument.h 1 0 0 1 0 0 1 0 nspr_md.h 0 0 1 1 0 0 1 0 nsDOMClassInfo 10 0 0 1 0 1 0 0 EmbedGTKTools 0 0 0 0 0 1 0 0 MozillaControl.cpp 0 0 1 0 1 0 0 0 nsDOMClassInfo has had 10 nsDOMClassInfo imports vulnerability-related bug reports “nsIXPConnect.h”
  • 114. Distribution ibution of MFSAs Distribution of Bug Reports 300 Number of Components 20 50 5 12 5 7 9 11 13 13579 13 17 24 umber of MFSAs Number of Bug Reports
  • 115. Experiments • 40 randomtraining set, 3,484 rows in validation set splits 6,968 rows in • Classification recall and precision Train SVM, compute • Regression rank correlation on top 1% Train SVM, compute • SVM: linear kernel10GB ofdefault parameters with R implementation (up to main memory)
  • 116. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation
  • 117. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 45% (about 1/2) of predictions correct
  • 118. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 2/3 of all vulnerable components detected 45% (about 1/2) of predictions correct
  • 119. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 2/3 of all vulnerable components detected 45% (about 1/2) of predictions correct
  • 120. Results moderately strong correlation (mostly significant at p < 0.01) (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 2/3 of all vulnerable components detected 45% (about 1/2) of predictions correct
  • 122. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  • 123. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  • 124. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  • 125. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  • 126. Similar Results for Bugs Packages + Import relationships (ISESE 2006) Precision: 66.7% Recall: 69.4% Binaries + Dependencies (Internship @ Microsoft Research, 2006) Precision: 64.4% Recall: 75.3%
  • 127. Vulture Predicting Security Vulnerabilities (Work in Progress) locates past + predicts new vulnerabilities problem domain
  • 129. #1: Mining across Projects • Complement source code search engines with mining techniques. • Large-scale mining (144,000 SF projects)
  • 131. eROSE BugCache Vulture
  • 132. automatic eROSE BugCache Vulture
  • 133. automatic large-scale eROSE BugCache Vulture
  • 134. automatic large-scale eROSE BugCache Vulture tool-oriented
  • 135. automatic large-scale Empirical Software Engineering 2.0 tool-oriented
  • 136. automatic large-scale Empirical Software Engineering 2.0 tool-oriented Thanks! Questions?