SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Statistical distributions of software metrics: do
                      they matter?

                                     Israel Herraiz

                          Technical University of Madrid


                         israel.herraiz@upm.es


                               Grab these slides from
     http://slideshare.net/herraiz/statistical-distributions-of-metrics




Israel Herraiz, UPM       Statistical distributions of software metrics: do they matter?   1/17
Outline



1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   2/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   3/17
A (not so) long time ago...



Statistical distribution of software metrics
Software size follows a double Pareto distribution
Towards a theoretical model for software growth MSR 2007

More recently
Not only size, but some OO metrics too (and some complexity metrics)
On the Statistical Distribution of Object-Oriented System
Properties WETSoM 2012




Israel Herraiz, UPM    Statistical distributions of software metrics: do they matter?   4/17
OK, but what is that double Pareto thing?
           1e+00
           1e−02
P[X > x]




                          Data
                          Double Pareto
           1e−04




                          Lognormal


                      1                   100                                   10000

                                                  SLOC
Israel Herraiz, UPM           Statistical distributions of software metrics: do they matter?   5/17
But does it matter?




 Most of the files are on the
 lognormal side
             10 15 20 25 30 35
   % Files

             5
             0




                                 C   C++   Java   Python     Lisp




Israel Herraiz, UPM                               Statistical distributions of software metrics: do they matter?   6/17
But does it matter?




 Most of the files are on the                                                But the power law minority
 lognormal side                                                             matters a lot
             10 15 20 25 30 35




                                                                                       40
                                                                                       30
                                                                              % SLOC
   % Files




                                                                                       20
                                                                                       10
             5




                                                                                       0
             0




                                 C   C++   Java   Python     Lisp                            C        C++          Java   Python   Lisp




Israel Herraiz, UPM                               Statistical distributions of software metrics: do they matter?                          6/17
Large files have a large impact

Size estimation models
Some software size estimation models are based on the log-normality of size
metrics. These models systematically underestimate the size of software.

                                                  C                                                 C++
                           50




                                                                              50
                      RE




                                                                         RE
                           0




                                                                              0
                           −100




                                                                              −100
                                  2000    5000 10000             50000                2000    5000          20000     50000

                                                 SLOC                                               SLOC



                                                 Java                                           Python
                           50




                                                                              50
                      RE




                                                                         RE
                           0




                                                                              0
                           −100




                                                                              −100




                                   1000   2000          5000   10000                 1000    2000          5000     10000

                                                 SLOC                                               SLOC



On the distribution of source code file sizes ICSOFT 2011
Israel Herraiz, UPM                       Statistical distributions of software metrics: do they matter?                      7/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   8/17
Parameters of the statistical distribution

Power law parameters: λ and xmin
Transition from lognormal to power law
                             1e+00
                             1e−02
                  P[X > x]




                                            Data
                                            Double Pareto
                             1e−04




                                            Lognormal


                                     1                      100                           10000

                                                                   SLOC

Israel Herraiz, UPM                      Statistical distributions of software metrics: do they matter?   9/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   10/17
Probability of finding defects


Probability of finding defects
We have seen that files above xmin account for 40% of total size, being
only about ∼ 1% of the files.
What about defects? Probability of finding defects in three software
projects (using CYCLO as metric)

                      Project             Below xmin               Above xmin
                      Apache                   .4178                   .7708
                      OpenIntents              .2500                   .7500
                      Zxing                    .2143                   .4161

* Data extracted from “ReLink: Recovering Links between Bugs and Changes” FSE
2011.



Israel Herraiz, UPM         Statistical distributions of software metrics: do they matter?   11/17
Probability of finding defects




Probability of finding defects (normalized metrics)
Using CYCLO / WMC as metric (cyclomatic complex. per LOC)

                      Project             Below xmin               Above xmin
                      Apache                   .4159                   .6296
                      OpenIntents              .2813                   .5417
                      Zxing                    .3181                   .2389




Israel Herraiz, UPM         Statistical distributions of software metrics: do they matter?   12/17
Probability of finding defects

Defects density (only pre-release defects)
Using Number of Methods and number of pre-release defects per LOC

                                      Below xmin                                                Above xmin
                                                  Below xmin                                                 Above xmin
                      12000                                                         300




                      10000                                                         250




                       8000                                                         200




                       6000                                                         150




                       4000                                                         100




                       2000                                                          50




                          0                                                           0
                              0   1   2   3   4       5        6   7   8   9   10         0   0.05   0.1   0.15       0.2   0.25   0.3   0.35




                      Avg .Dens. = .2685                                            Avg .Dens. = .4565

* Data obtained from "Predicting Defects for Eclipse” PROMISE 2007

Israel Herraiz, UPM                               Statistical distributions of software metrics: do they matter?                                13/17
Probability of finding defects

Defects density (only post-release defects)
Using Number of Methods and number of post-release defects per LOC

                                           Below xmin                                                             Above xmin
                                                    Below xmin                                                             Above xmin
                      12000                                                                    300




                      10000                                                                    250




                       8000                                                                    200




                       6000                                                                    150




                       4000                                                                    100




                       2000                                                                     50




                          0                                                                      0
                              0    1   2    3   4       5         6   7   8   9   10                 0     0.05    0.1   0.15       0.2   0.25   0.3   0.35




                                  Avg .Dens. = .1437                                                     Avg .Dens. = .2690

Israel Herraiz, UPM                                              Statistical distributions of software metrics: do they matter?                               14/17
Probability of finding defects
Defects density (pre + post-release defects)
Using CYCLO/SLOC and number of total defects per LOC

                         0                                                  3
                        10                                                 10




                         −1                                                 2
                        10                                                 10
            Pr(X ≥ x)




                         −2                                                 1
                        10                                                 10




                         −3                                                 0
                        10                                                 10




                         −4                                                 −1
                        10 −1    1         3             5
                                                                           10
                                                                                 −1    0    1      2       3    4    5
                                                                                10    10   10     10      10   10   10
                          10    10       10            10
                                     x




                  Below xmin                                                   Above xmin
       Avg .Dens. = .3335 (>9000 files)                                Avg .Dens. = .7747 (364 files)
Israel Herraiz, UPM                      Statistical distributions of software metrics: do they matter?                  15/17
1    Some background


2    Statistical properties of software metrics


3    Evidence of impact on quality


4    Summary of findings and further work




Israel Herraiz, UPM      Statistical distributions of software metrics: do they matter?   16/17
Summary and further work

Summary of preliminary findings
        Some metrics have a transition from lognormal to power law
        Clear relation between normalized metrics and defects density
        Although the threshold might not be perfect (e.g., you might find a
        high defects density in a lower side file), it greatly reduces the search
        space for potentially problematic files

Further work
    Verify in more projects
                Do you have defects data at the file level?
        Find explanation for the transition and its influence on quality
        How do the statistical parameters change over time? Do defects
        evolve accordingly?

Israel Herraiz, UPM           Statistical distributions of software metrics: do they matter?   17/17

Weitere ähnliche Inhalte

Ähnlich wie Statistical Distribution of Metrics

(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline Pilot(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline PilotBIOVIA
 
2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)CAST
 
Software Cost Contingency Development
Software Cost Contingency DevelopmentSoftware Cost Contingency Development
Software Cost Contingency Developmentskillern
 
The Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleThe Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleIntel IT Center
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyAvere Systems
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
 
Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1C.T.Co
 
Data visualization short v1.1
Data visualization short v1.1Data visualization short v1.1
Data visualization short v1.1Adam Winkler
 
C3 Citrix Cloud Center
C3 Citrix Cloud CenterC3 Citrix Cloud Center
C3 Citrix Cloud CenterRui Lopes
 
Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)José Roberto Araújo
 
BPMN Usage Survey: Results
BPMN Usage Survey: ResultsBPMN Usage Survey: Results
BPMN Usage Survey: ResultsMichele Chinosi
 
5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized World5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized WorldCorrelsense
 
Xen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting FutureXen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting FutureThe Linux Foundation
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLABAshish Meshram
 
201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2Pedro Gallardo
 
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...Michael Gallo
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Massimiliano Di Penta
 

Ähnlich wie Statistical Distribution of Metrics (20)

(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline Pilot(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline Pilot
 
2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)2011/2012 CAST report on Application Software Quality (CRASH)
2011/2012 CAST report on Application Software Quality (CRASH)
 
Software Cost Contingency Development
Software Cost Contingency DevelopmentSoftware Cost Contingency Development
Software Cost Contingency Development
 
The Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleThe Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to Exascale
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial Survey
 
Dallas Meloon BI
Dallas Meloon   BIDallas Meloon   BI
Dallas Meloon BI
 
WETSoM 2011
WETSoM 2011WETSoM 2011
WETSoM 2011
 
Itn no 06 06 application vendor evaluation matrix
Itn no 06 06 application vendor evaluation matrixItn no 06 06 application vendor evaluation matrix
Itn no 06 06 application vendor evaluation matrix
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar Presentation
 
Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1Introduction to Performance Testing Part 1
Introduction to Performance Testing Part 1
 
Data visualization short v1.1
Data visualization short v1.1Data visualization short v1.1
Data visualization short v1.1
 
C3 Citrix Cloud Center
C3 Citrix Cloud CenterC3 Citrix Cloud Center
C3 Citrix Cloud Center
 
Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)
 
BPMN Usage Survey: Results
BPMN Usage Survey: ResultsBPMN Usage Survey: Results
BPMN Usage Survey: Results
 
5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized World5 APM and Capacity Planning Imperatives for a Virtualized World
5 APM and Capacity Planning Imperatives for a Virtualized World
 
Xen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting FutureXen.org: The past, the present and exciting Future
Xen.org: The past, the present and exciting Future
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
 
201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2
 
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?
 

Mehr von Israel Herraiz

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolutionIsrael Herraiz
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key CryptographyIsrael Herraiz
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPMIsrael Herraiz
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian DatabaseIsrael Herraiz
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsIsrael Herraiz
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costIsrael Herraiz
 
The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011Israel Herraiz
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptographyIsrael Herraiz
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 

Mehr von Israel Herraiz (9)

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolution
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian Database
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasets
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software cost
 
The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptography
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 

Kürzlich hochgeladen

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 

Kürzlich hochgeladen (20)

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 

Statistical Distribution of Metrics

  • 1. Statistical distributions of software metrics: do they matter? Israel Herraiz Technical University of Madrid israel.herraiz@upm.es Grab these slides from http://slideshare.net/herraiz/statistical-distributions-of-metrics Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 1/17
  • 2. Outline 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 2/17
  • 3. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 3/17
  • 4. A (not so) long time ago... Statistical distribution of software metrics Software size follows a double Pareto distribution Towards a theoretical model for software growth MSR 2007 More recently Not only size, but some OO metrics too (and some complexity metrics) On the Statistical Distribution of Object-Oriented System Properties WETSoM 2012 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 4/17
  • 5. OK, but what is that double Pareto thing? 1e+00 1e−02 P[X > x] Data Double Pareto 1e−04 Lognormal 1 100 10000 SLOC Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 5/17
  • 6. But does it matter? Most of the files are on the lognormal side 10 15 20 25 30 35 % Files 5 0 C C++ Java Python Lisp Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 6/17
  • 7. But does it matter? Most of the files are on the But the power law minority lognormal side matters a lot 10 15 20 25 30 35 40 30 % SLOC % Files 20 10 5 0 0 C C++ Java Python Lisp C C++ Java Python Lisp Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 6/17
  • 8. Large files have a large impact Size estimation models Some software size estimation models are based on the log-normality of size metrics. These models systematically underestimate the size of software. C C++ 50 50 RE RE 0 0 −100 −100 2000 5000 10000 50000 2000 5000 20000 50000 SLOC SLOC Java Python 50 50 RE RE 0 0 −100 −100 1000 2000 5000 10000 1000 2000 5000 10000 SLOC SLOC On the distribution of source code file sizes ICSOFT 2011 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 7/17
  • 9. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 8/17
  • 10. Parameters of the statistical distribution Power law parameters: λ and xmin Transition from lognormal to power law 1e+00 1e−02 P[X > x] Data Double Pareto 1e−04 Lognormal 1 100 10000 SLOC Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 9/17
  • 11. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 10/17
  • 12. Probability of finding defects Probability of finding defects We have seen that files above xmin account for 40% of total size, being only about ∼ 1% of the files. What about defects? Probability of finding defects in three software projects (using CYCLO as metric) Project Below xmin Above xmin Apache .4178 .7708 OpenIntents .2500 .7500 Zxing .2143 .4161 * Data extracted from “ReLink: Recovering Links between Bugs and Changes” FSE 2011. Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 11/17
  • 13. Probability of finding defects Probability of finding defects (normalized metrics) Using CYCLO / WMC as metric (cyclomatic complex. per LOC) Project Below xmin Above xmin Apache .4159 .6296 OpenIntents .2813 .5417 Zxing .3181 .2389 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 12/17
  • 14. Probability of finding defects Defects density (only pre-release defects) Using Number of Methods and number of pre-release defects per LOC Below xmin Above xmin Below xmin Above xmin 12000 300 10000 250 8000 200 6000 150 4000 100 2000 50 0 0 0 1 2 3 4 5 6 7 8 9 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Avg .Dens. = .2685 Avg .Dens. = .4565 * Data obtained from "Predicting Defects for Eclipse” PROMISE 2007 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 13/17
  • 15. Probability of finding defects Defects density (only post-release defects) Using Number of Methods and number of post-release defects per LOC Below xmin Above xmin Below xmin Above xmin 12000 300 10000 250 8000 200 6000 150 4000 100 2000 50 0 0 0 1 2 3 4 5 6 7 8 9 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Avg .Dens. = .1437 Avg .Dens. = .2690 Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 14/17
  • 16. Probability of finding defects Defects density (pre + post-release defects) Using CYCLO/SLOC and number of total defects per LOC 0 3 10 10 −1 2 10 10 Pr(X ≥ x) −2 1 10 10 −3 0 10 10 −4 −1 10 −1 1 3 5 10 −1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 x Below xmin Above xmin Avg .Dens. = .3335 (>9000 files) Avg .Dens. = .7747 (364 files) Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 15/17
  • 17. 1 Some background 2 Statistical properties of software metrics 3 Evidence of impact on quality 4 Summary of findings and further work Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 16/17
  • 18. Summary and further work Summary of preliminary findings Some metrics have a transition from lognormal to power law Clear relation between normalized metrics and defects density Although the threshold might not be perfect (e.g., you might find a high defects density in a lower side file), it greatly reduces the search space for potentially problematic files Further work Verify in more projects Do you have defects data at the file level? Find explanation for the transition and its influence on quality How do the statistical parameters change over time? Do defects evolve accordingly? Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 17/17