SlideShare ist ein Scribd-Unternehmen logo
1 von 46
A Principled
Methodology
A Dozen Principles of
   Software Effort
     Estimation


 Ekrem Kocaguneli, 11/07/2012
2



         Agenda
• Introduction
• Publications
• What to Know
   • 8 Questions
• Answers
   • 12 Principles
• Validity Issues
• Future Work
3



                      Introduction
Software effort estimation (SEE) is the process of estimating the total
   effort required to complete a software project (Keung2008 [1]).

  Successful estimation is critical for an organizations
       Over-estimation: Killing promising projects
       Under-estimation: Wasting entire effort! E.g. NASA’s
       launch-control system cancelled after initial estimate of
       $200M was overrun by another $200M [22]

        Among IT projects developed in 2009, only 32% were
    successfully completed within time with full functionality [23]
4



               Introduction (cntd.)
 We will discuss algorithms, but it would be irresponsible to say
that SEE is merely an algorithmic problem. Organizational factors
                       are just as important


  E.g. common experiences of data collection and user interaction
          in organizations operating in different domains
5



             Introduction (cntd.)

This presentation is not about a single algorithm/answer targeting a
                           single problem.

               Because there is not just one question.


           It is (unfortunately) not everything about SEE.


      It brings together critical questions and related solutions.
6



                         What to know?
1    When do I have perfect data?       What is the best effort
                                    2
                                         estimation method?
3    Can I use multiple methods?
                                          4
                                              ABE methods are easy to use.
5    What if I lack resources                  How can I improve them?
        for local data?
                                          7   Are all attributes and all
6      I don’t believe in size                 instances necessary?
    attributes. What can I do?
                                          8    How to experiment, which
                                               sampling method to use?
7



                                 Publications
Journals
•   E. Kocaguneli, T. Menzies, J. Keung, “On the Value of Ensemble Effort Estimation”, IEEE Transactions on
    Software Engineering, 2011.
•   E. Kocaguneli, T. Menzies, A. Bener, J. Keung, “Exploiting the Essential Assumptions of Analogy-based
    Effort Estimation”, IEEE Transactions on Software Engineering, 2011.
•   E. Kocaguneli, T. Menzies, J. Keung, “Kernel Methods for Software Effort Estimation”, Empirical
    Software Engineering Journal, 2011.
•   J. Keung, E. Kocaguneli, T. Menzies, “A Ranking Stability Indicator for Selecting the Best Effort Estimator
    in Software Cost Estimation”, Journal of Automated Software Engineering, 2012.
Under review Journals
•   E. Kocaguneli, T. Menzies, J. Keung, “Active Learning for Effort Estimation”, third round review at IEEE
    Transactions on Software Engineering.
•   E. Kocaguneli, T. Menzies, E. Mendes, “Transfer Learning in Effort Estimation”, submitted to ACM
    Transactions on Software Engineering.
•   E. Kocaguneli, T. Menzies, “Software Effort Models Should be Assessed Via Leave-One-Out Validation”,
    under second round review at Journal of Systems and Software.
•   E. Kocaguneli, T. Menzies, E. Mendes, “Towards Theoretical Maximum Prediction Accuracy Using D-
    ABE”, submitted to IEEE Transactions on Software Engineering.
Conference
•   E. Kocaguneli, T. Menzies, J. Hihn, Byeong Ho Kang, “Size Doesn‘t Matter? On the Value of Software Size
    Features for Effort Estimation”, Predictive Models in Software Engineering (PROMISE) 2012.
•   E. Kocaguneli, T. Menzies, “How to Find Relevant Data for Effort Estimation”, International Symposium
    on Empirical Software Engineering and Measurement (ESEM) 2011
•   E. Kocaguneli, G. Gay, Y. Yang, T. Menzies, “When to Use Data from Other Projects for Effort Estimation”,
    International Conference on Automated Software Engineering (ASE) 2010, Short-paper.
8



  1       When do I have the perfect data?
Principle #1: Know your domain
Domain knowledge is important in every step (Fayyad1996 [2])
Yet, this knowledge takes time and effort to gain,
e.g. percentage commit information
                       Principle #2: Let the experts talk
                     Initial results may be off according to domain experts
                   Success is to create a discussion, interest and suggestions

                             Principle #3: Suspect your data
             “Curiosity” to question is a key characteristic (Rauser2011 [3])
                     e.g. in an SEE project, 200+ test cases, 0 bugs

       Principle #4: Data collection is cyclic
              Any step from mining till presentation may be repeated
9


2   What is the best effort estimation
                method?
 There is no agreed upon            Methods change ranking w.r.t.
 best estimation method           conditions such as data sets, error
   (Shepperd2001 [4])               measures (Myrtveit2005 [5])
Experimenting with: 90 solo-
methods, 20 public data sets, 7      Top 13 methods are CART & ABE
error measures                       methods (1NN, 5NN)
10


 3       How to use superior subset of
                  methods?
  We have a set of      Assembling solo-methods
superior methods to     may be a good idea, e.g.
    recommend           fusion of 3 biometric
                        modalities (Ross2003 [20])
But the previous evidence of            Baker2007 [7], Kocaguneli2009
assembling multiple methods in          [8], Khoshgoftaar2009 [9] failed to
SEE is discouraging                     outperform solo-methods

Combine top
2,4,8,13 solo-
methods via mean,
median and IRWM
11

2 How to use superior subset of methods?
3 What is the best effort estimation method?

                                Principle #5: Use a ranking stability indicator

  Principle #6: Assemble superior solo-methods




        A method to identify successful methods using their rank changes
                         A novel scheme for assembling solo-methods
                       Multi-methods that outperform all solo-methods
This research published at: .
• Kocaguneli, T. Menzies, J. Keung, “On the Value of Ensemble Effort Estimation”, IEEE Transactions on
    Software Engineering, 2011.
• J. Keung, E. Kocaguneli, T. Menzies, “A Ranking Stability Indicator for Selecting the Best Effort Estimator in
    Software Cost Estimation”, Journal of Automated Software Engineering, 2012.
12


4    How can we improve ABE methods?

Analogy based methods         They are very widely used
make use of similar past      (Walkerden1999 [10]) as:
projects for estimation       • No model-calibration to local data
                              • Can better handle outliers
                              • Can work with 1 or more attributes
                              • Easy to explain


     Two promising research areas
     • weighting the selected analogies
       (Mendes2003 [11], Mosley2002[12])
     • improving design options (Keung2008 [1])
13

     How can we improve ABE methods?
                  (cntd.)
   Building on the previous research (Mendes2003 [11], Mosley2002[12]
   ,Keung2008 [1]), we adopted two different strategies


   a) Weighting analogies

We used kernel weighting to
weigh selected analogies

 Compare performance of
 each k-value with and
 without weighting.

 In none of the scenarios did we
 see a significant improvement
14

    How can we improve ABE methods?
b) Designing ABE methods
                         (cntd.) D-ABE
Easy-path: Remove training           • Get best estimates of all training
instance that violate assumptions      instances
                                     • Remove all the training instances
 TEAK will be discussed later.         within half of the worst MRE (acc.
D-ABE: Built on theoretical            to TMPA).
maximum prediction accuracy          • Return closest neighbor’s estimate
(TMPA) (Keung2008 [1])                 to the test instance.
                                      Training Instances
           Test instance


               t                           a
                                                           Close to the
                                       b              d    worst MRE
                   Return the    c
                   closest                     e
                   neighbor’s
                   estimate                                f
                                                      Worst MRE
15

   How can we improve ABE methods?
                (cntd.)

DABE Comparison to     DABE Comparison to
static k w.r.t. MMRE   static k w.r.t. win, tie, loss
16

     How can we improve ABE methods?
                  (cntd.)

                  Principle #7: Weighting analogies is overelaboration

                            Principle #8: Use easy-path design

          Investigation of an unexplored and promising ABE option
          of kernel-weighting
               A negative result published at ESE Journal
            An ABE design option that can be applied to different
                         ABE methods (D-ABE, TEAK)

This research published at: .
• E. Kocaguneli, T. Menzies, A. Bener, J. Keung, “Exploiting the Essential Assumptions of Analogy-based Effort
    Estimation”, IEEE Transactions on Software Engineering, 2011.
• E. Kocaguneli, T. Menzies, J. Keung, “Kernel Methods for Software Effort Estimation”, Empirical Software
    Engineering Journal, 2011.
17



  5       How to handle lack of local data?
   Finding enough local training               Merits of using cross-data from
   data is a fundamental problem               another company is questionable
   of SEE (Turhan2009 [13]).                   (Kitchenham2007 [14]).
                    We use a relevancy filtering method called TEAK
                    on public and proprietary data sets.



Similar projects,
                                      Similar projects,
dissimilar effort
                                      similar effort
values, hence
                                      values, hence
high variance
                                      low variance

                    Cross data works as well as within data for 6 out
                    of 8 proprietary data sets, 19 out of 21 public data
                    sets after TEAK’s relevancy filtering
18

          How to handle lack of local data?
                      (cntd.)

                             Principle #9: Use relevancy filtering



                       A novel method to handle lack of local data
             Successful application on public as well as proprietary data



This research published at: .
• E. Kocaguneli, T. Menzies, “How to Find Relevant Data for Effort Estimation”, International Symposium on
    Empirical Software Engineering and Measurement (ESEM) 2011
• E. Kocaguneli, G. Gay, Y. Yang, T. Menzies, “When to Use Data from Other Projects for Effort Estimation”,
    International Conference on Automated Software Engineering (ASE) 2010, Short-paper.
19



           E(k) matrices & Popularity
    This concept helps the next 2 problems: size features and the
    essential content, i.e. pop1NN and QUICK algorithms, respectively




A similar concept is reverse nearest neighbor (RNN) in ML, used to find
instances whose k-NN’s are included in a specific query (Achtert2006 *26+).
20



     E(k) matrices & Popularity (cntd.)
         Outlier pruning                         Sample steps
1.   Calculate “popularity” of
     instances
2.   Sorting by popularity,
3.   Label one instance at a time
4.   Find the stopping point
5.   Return estimate from labeled
     training data

                       Finding the stopping point
       1. If all popular instances are exhausted.
       2. Or if there is no MRE improvement for n consecutive times.
       3. Or if the ∆ between the best and the worst error of the last n
          instances is very small. (∆ = 0.1; n = 3)
21



    E(k) matrices & Popularity (cntd.)
  Picking random       More popular instances
training instance is                          One of the stopping
                       in the active pool
  not a good idea                             point conditions fire
                       decreases error
22



6     Do I have to use size attributes?
  At the heart of widely accepted        COCOMO uses LOC (Boehm1981
  SEE methods lies the software          [15]), whereas FP (Albrecht1983
  size attributes                        [16]) uses logical transactions

        Size attributes are beneficial if used properly (Lum2002
        [17]); e.g. DoD and NASA uses successfully.


      Yet, the size attributes may not be trusted or may not be estimated
      at the early stages. That disrupts adoption of SEE methods.
Measuring software
productivity by lines of code is         This is a very costly measuring
like measuring progress on an            unit because it encourages the
airplane by how much it weighs           writing of insipid code - E. Dijkstra
– B. Gates
23



Do I have to use size attributes? (cntd.)

pop1NN (w/o size) vs. CART and 1NN (w/ size)

                                          Given enough resources
                                          for correct collection and
                                          estimation, size features
                                          are helpful


                                          If not, then outlier pruning
                                          helps.
24



  Do I have to use size attributes? (cntd.)

                              Principle #10: Use outlier pruning




            Promotion of SEE methods that can compensate the lack
                         of the software size features
              A method called pop1NN that shows that size features
                               are not a “must”.


This research published at: .
• E. Kocaguneli, T. Menzies, J. Hihn, Byeong Ho Kang, “Size Doesn‘t Matter? On the Value of Software Size
    Features for Effort Estimation”, Predictive Models in Software Engineering (PROMISE) 2012.
25


 7      What is the essential content of SEE
                       data?
 SEE is populated with overly
                                  In a matrix of N instances and F
 complex methods for
                                  features, the essential content is N ′ ∗ F ′
 marginal performance
 increase (Jorgensen2007 [18])
                                         QUICK is an active learning
       Synonym pruning                   method combines outlier
1. Transpose normalized matrix           removal and synonym pruning
   and calculate the popularity      Removal of features based on
   of features                       distance seemed to be reserved
2. Select non-popular features.      for instances.
  Similar tasks both remove        ABE method as a two dimensional
  cells in the hypercube of all    reduction (Ahn2007 [25])
  cases times all columns
                                   In our lab variance-based feature
  (Lipowezky1998 [24])
                                   selector is used as a row selector
26

What is the essential content of SEE
           data? (cntd.)
                              At most 31% of all
                                  the cells


                               On median 10%


                       There is a consensus in the high-dimensional
                      data analysis community that the only
                      reason any methods work in very high
                      dimensions is that, in fact, the data are not
                      truly high-dimensional. (Levina & Bickel 2005)




                                Performance?
27

    What is the essential content of SEE
QUICK vs passiveNN (1NN)
                         data? (cntd.) QUICK vs CART
                    Only one dataset
                    where QUICK is
                    significantly worse
                    than passiveNN




                    4 such data sets
                    when QUICK is
                    compared to CART
28

      What is the essential content of SEE
                 data? (cntd.)
                Principle #11: Combine outlier and synonym pruning



               An unsupervised method to find the essential content of
                      SEE data sets and reduce the data needs
                 Promoting research to elaborate on the data, not on the
                                       algorithm



This research is under 3rd round review: .
• E. Kocaguneli, T. Menzies, J. Keung, “Active Learning for Effort Estimation”, third round review at IEEE
    Transactions on Software Engineering.
29



8       How should I choose the right SM?
               Expectation
          (Kitchenham2007 [7])                    Observed




    No significant difference for B&V values among 90 methods

              Only minutes of run time difference (<15)
LOO is not probabilistic and results can be easily shared
30

       How should I choose the right SM?
                     (cntd.)

              Principle #12: Be aware of sampling method trade-off




                 The first investigation of B&V trade-off in SEE domain

                  Recommendation based on experimental concerns


This research is under 2nd round review: .
• E. Kocaguneli, T. Menzies, “Software Effort Models Should be Assessed Via Leave-One-Out Validation”,
    under second round review at Journal of Systems and Software.
31




  1.
                     What to know?
      Know your domain
  2. Let the experts talk
  3.When do I your data
      Suspect have perfect data?   What isathe best effort
                                   5. Use ranking stability
  4. Data collection is cyclic      estimation method?
                                           indicator
  6. Assemble superior solo-
   Can I use multiple methods?
            methods
                                     7.ABE methods are easy to use.
                                        Weighting analogies is over-
 What if I lack resources            elaboration I improve them?
                                         How can
 9. Use relevancy filtering          8. Use easy-path design
     for local data?
                                       Are all attributes andand
                                       11. Combine outlier all
   I don’t believe in size              instances necessary?
                                           synonym pruning
  10. Use outlier pruning
attributes. What can I do?
                                      How Be experiment, which
                                       12. to aware of sampling
                                           method trade-off
                                      sampling method to use?
32



                       Validity Issues
 Construct validity, i.e. do we measure what
           we intend to measure?
                   Use of previously recommended estimation
                       methods, error measures and data sets

External validity, i.e. can we generalize results
        outside current specifications
                   Difficult to assert that results will definitely hold
              Yet we use almost all the publicly available SEE data sets.

                         Median value of projects used by the studies
                       reviewed is 186 projects (Kitchenham2007 [14])
                          Our experimentation uses 1000+ projects
33



          Future Work
       Application on publicly
       accessible big data sets

300K projects, 2M users   250K open source projects


     Smarter, larger scale algorithms
     for general conclusions                          Application to different
                                                       domains, e.g. defect
Current methods may face                                    prediction
scalability issues. Improving
common ideas for scalability, e.g.             Combining intrinsic dimensionality
linear time NN methods                         techniques in ML for lower bound
                                                  dimensions of SEE data sets
                                                       (Levina2004 [27])
What have we covered?




                        34
35
36



                                   References
[1] J. W. Keung, “Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost
Estimation,” 15th Asia-Pacific Software Engineering Conference, pp. 495– 502, 2008.
[2] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “The kdd process for extracting useful knowledge
from volumes of data,” Commun. ACM, vol. 39, no. 11, pp. 27–34, Nov. 1996.
[3] J. Rauser, “What is a career in big data?” 2011. [Online]. Available: http:
//strataconf.com/stratany2011/public/schedule/speaker/10070
[4] M. Shepperd and G. Kadoda, “Comparing Software Prediction Techniques Using Simulation,” IEEE
Trans. Softw. Eng., vol. 27, no. 11, pp. 1014–1022, 2001.
[5] I. Myrtveit, E. Stensrud, and M. Shepperd, “Reliability and validity in comparative studies of
software prediction models,” IEEE Trans. Softw. Eng., vol. 31, no. 5, pp. 380–391, May 2005.
[6] E. Alpaydin, “Techniques for combining multiple learners,” Proceedings of Engineering of Intelligent
Systems, vol. 2, pp. 6–12, 1998.
[7] D. Baker, “A hybrid approach to expert and model-based effort estimation,” Master’s thesis, Lane
Department of Computer Science and Electrical Engineering, West Virginia University, 2007.
[8] E. Kocaguneli, Y. Kultur, and A. Bener, “Combining multiple learners induced on multiple datasets
for software effort prediction,” in International Symposium on Software Reliability Engineering (ISSRE),
2009, student Paper.
[9] T. M. Khoshgoftaar, P. Rebours, and N. Seliya, “Software quality analysis by combining multiple
projects and learners,” Software Quality Control, vol. 17, no. 1, pp. 25–49, 2009.
[10] F. Walkerden and R. Jeffery, “An empirical study of analogy-based software effort estima- tion,”
Empirical Software Engineering, vol. 4, no. 2, pp. 135–158, 1999.
[11] E. Mendes, I. D. Watson, C. Triggs, N. Mosley, and S. Counsell, “A comparative study of cost
estimation models for web hypermedia applications,” Empirical Software Engineering, vol. 8, no. 2, pp.
163–196, 2003.
[12] E. Mendes and N. Mosley, “Further investigation into the use of cbr and stepwise regression to       37
predict development effort for web hypermedia applications,” in International Symposium on Empirical
Software Engineering, 2002.
[13] B. Turhan, T. Menzies, A. Bener, and J. Di Stefano, “On the relative value of cross-company and
within-company data for defect prediction,” Empirical Software Engineering, vol. 14, no. 5, pp. 540–
578, 2009.
[14] B. A. Kitchenham, E. Mendes, and G. H. Travassos, “Cross versus within-company cost
estimation studies: A systematic review,” IEEE Trans. Softw. Eng., vol. 33, no. 5, pp. 316– 329, 2007.
[15] B. W. Boehm, C. Abts, A. W. Brown, S. Chulani, B. K. Clark, E. Horowitz, R. Madachy, D. J.
Reifer, and B. Steece, Software Cost Estimation with Cocomo II. Upper Saddle River, NJ, USA:
Prentice Hall PTR, 2000.
[16] A. Albrecht and J. Gaffney, “Software function, source lines of code and development effort
prediction: A software science validation,” IEEE Trans. Softw. Eng., vol. 9, pp. 639–648, 1983.
[17] K. Lum, J. Powell, and J. Hihn, “Validation of spacecraft cost estimation models for flight and
ground systems,” in ISPA’02: Conference Proceedings, Software Modeling Track, 2002.
[18] M. Jorgensen and M. Shepperd, “A systematic review of software development cost estimation
studies,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 33–53, 2007.
[19] ] B. A. Kitchenham, E. Mendes, and G. H. Travassos, “Cross versus within-company cost
estimation studies: A systematic review,” IEEE Trans. Softw. Eng., vol. 33, no. 5, pp. 316– 329, 2007.
[20] A. Ross, “Information fusion in biometrics,” Pattern Recognition Letters, vol. 24, no. 13, pp. 2115–
2125, Sep. 2003.
[21] Raymond P. L. Buse, Thomas Zimmermann: Information needs for software development
analytics. ICSE 2012: 987-996
[22] Spareref.com. Nasa to shut down checkout & launch control system, August 26, 2002.
http://www.spaceref.com/news/viewnews.html?id=475.
[23] Standish Group (2004).    CHAOS Report(Report). West Yarmouth, Massachusetts: Standish
Group.
[24] U. Lipowezky, Selection of the optimal prototype subset for 1-NN classification, Pattern
Recognition Lett. 19 (1998) 907}918.
[25] Hyunchul Ahn, Kyoung-jae Kim, Ingoo Han, A case-based reasoning system with the two-
dimensional reduction technique for customer classification, Expert Systems with Applications, Volume
32, Issue 4, May 2007, Pages 1011-1019, ISSN 0957-4174, 10.1016/j.eswa.2006.02.021.
[26] Elke Achtert, Christian Böhm, Peer Kröger, Peter Kunath, Alexey Pryakhin, and Matthias Renz.
2006. Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In Proceedings of the
2006 ACM SIGMOD international conference on Management of data (SIGMOD '06)
[27] E. Levina and P.J. Bickel. Maximum likelihood estimation of intrinsic dimension. In Advances in
Neural Information Processing Systems, volume 17, Cambridge, MA, USA, 2004. The MIT Press.
38




Detail Slides
39




Pre-processors and learners
40

    What is the best effort estimation
             method? (cntd.)
1. Rank methods acc. to win, loss
       and win-loss values

  2. δr is the max. rank change

3. Sort methods acc. to loss and
        observe δr values
41

What is the best effort estimation
         method? (cntd.)
What about aggregate results reflecting on specific
       scenarios? (question of a reviewer)
      Sort methods according to increasing MdMRE
    Group MRE values that are statistically the same

                                         Highlighted are the
                                       cases, where superior-
                                       methods do not occur
                                          in the top group

                                         Note how superior
                                           solo-methods
                                       correspond to the best
                                        (lowest MRE) groups
42

     How can we improve ABE methods?
                  (cntd.)

We used kernel weighting
with 4 kernels with 5
bandwidth values plus
IRWM to weigh selected
analogies (5 different k
values)



 A total of 2090 settings:
 • 19 datasets * 5 k-values = 95
 • 19 datasets * 5 k values * 4 kernels * 5 bandwidths = 1900
 • IRWM: 19 datasets * 5 k values = 95
43

    How can we improve ABE methods?
                 (cntd.)
We used kernel weighting
to weigh selected
analogies

  Compare performance of
  each k-value with and
  without weighting.

 • o = tie for 3 or more k values
 • - = loss for 3 or more k values
 • + = win for 3 or more k values

 In none of the scenarios did we
 see a significant improvement
44

How to handle lack of local data?
            (cntd.)
TEAK on proprietary data   TEAK on public data
45



Do I have to use size attributes? (cntd.)
    Can standard methods tolerate the lack of size attributes?

 CART w/o size vs. CART w/ size                CART and 1NN
46



8    How should I choose the right SM?
 Only one work (Kitchenham2007
[7]) discusses implications of
sampling method (SM) on the      Expectations is
bias and variance                LOO: high variance, low bias
                                 3Way: low variance, high bias
                                 10Way: in between



                                     Does expectation hold?
                                     What about run time
                                     and ease-of replication?

Weitere ähnliche Inhalte

Was ist angesagt?

Cost estimation using cocomo model
Cost estimation using cocomo modelCost estimation using cocomo model
Cost estimation using cocomo modelNitesh Bichwani
 
software effort estimation
 software effort estimation software effort estimation
software effort estimationBesharam Dil
 
Software cost estimation techniques presentation
Software cost estimation techniques presentationSoftware cost estimation techniques presentation
Software cost estimation techniques presentationKudzai Rerayi
 
Software Estimation
Software EstimationSoftware Estimation
Software EstimationDinesh Singh
 
Spm software effort estimation
Spm software effort estimationSpm software effort estimation
Spm software effort estimationKanchana Devi
 
Software estimation techniques
Software estimation techniquesSoftware estimation techniques
Software estimation techniquesTan Tran
 
Software cost estimation project
Software  cost estimation projectSoftware  cost estimation project
Software cost estimation projectShashank Puppala
 
Software Project Managment
Software Project ManagmentSoftware Project Managment
Software Project ManagmentSaqib Naveed
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsProgrameter
 
Software cost estimation
Software cost estimationSoftware cost estimation
Software cost estimationHaitham Ahmed
 
Estimation
EstimationEstimation
Estimationweebill
 
Software Size Estimation
Software Size EstimationSoftware Size Estimation
Software Size EstimationMuhammad Asim
 

Was ist angesagt? (20)

Cost estimation using cocomo model
Cost estimation using cocomo modelCost estimation using cocomo model
Cost estimation using cocomo model
 
software effort estimation
 software effort estimation software effort estimation
software effort estimation
 
Software cost estimation techniques presentation
Software cost estimation techniques presentationSoftware cost estimation techniques presentation
Software cost estimation techniques presentation
 
Software Estimation
Software EstimationSoftware Estimation
Software Estimation
 
Spm software effort estimation
Spm software effort estimationSpm software effort estimation
Spm software effort estimation
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 
Software Sizing
Software SizingSoftware Sizing
Software Sizing
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
Software Cost Estimation
Software Cost EstimationSoftware Cost Estimation
Software Cost Estimation
 
Software estimation techniques
Software estimation techniquesSoftware estimation techniques
Software estimation techniques
 
Cocomo
CocomoCocomo
Cocomo
 
Software cost estimation project
Software  cost estimation projectSoftware  cost estimation project
Software cost estimation project
 
Software Project Managment
Software Project ManagmentSoftware Project Managment
Software Project Managment
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and Metrics
 
Cocomo models
Cocomo modelsCocomo models
Cocomo models
 
Software cost estimation
Software cost estimationSoftware cost estimation
Software cost estimation
 
COCOMO MODEL
COCOMO MODELCOCOMO MODEL
COCOMO MODEL
 
Estimation
EstimationEstimation
Estimation
 
Metrics
MetricsMetrics
Metrics
 
Software Size Estimation
Software Size EstimationSoftware Size Estimation
Software Size Estimation
 

Ähnlich wie Principles of effort estimation

Ekrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense PresentationEkrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense PresentationEkrem Kocagüneli
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071CS, NcState
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyAbdel Salam Sayyad
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionUT, San Antonio
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icseSAIL_QU
 
Parameter tuning or default values
Parameter tuning or default valuesParameter tuning or default values
Parameter tuning or default valuesVivek Nair
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...Abdel Salam Sayyad
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...ijseajournal
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Darius Silingas - From Model Driven Testing to Test Driven Modelling
Darius Silingas - From Model Driven Testing to Test Driven ModellingDarius Silingas - From Model Driven Testing to Test Driven Modelling
Darius Silingas - From Model Driven Testing to Test Driven ModellingTEST Huddle
 
OO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented DevelopmentOO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented DevelopmentRandy Connolly
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEKiranKumar671235
 
Strategies oled optimization jmp 2016 09-19
Strategies oled optimization jmp 2016 09-19Strategies oled optimization jmp 2016 09-19
Strategies oled optimization jmp 2016 09-19David Lee
 
Strategies for Optimization of an OLED Device
Strategies for Optimization of an OLED DeviceStrategies for Optimization of an OLED Device
Strategies for Optimization of an OLED DeviceDavid Lee
 

Ähnlich wie Principles of effort estimation (20)

Ekrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense PresentationEkrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense Presentation
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icse
 
Parameter tuning or default values
Parameter tuning or default valuesParameter tuning or default values
Parameter tuning or default values
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Research proposal
Research proposalResearch proposal
Research proposal
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Darius Silingas - From Model Driven Testing to Test Driven Modelling
Darius Silingas - From Model Driven Testing to Test Driven ModellingDarius Silingas - From Model Driven Testing to Test Driven Modelling
Darius Silingas - From Model Driven Testing to Test Driven Modelling
 
OO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented DevelopmentOO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented Development
 
10 best practices in operational analytics
10 best practices in operational analytics 10 best practices in operational analytics
10 best practices in operational analytics
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEE
 
Strategies oled optimization jmp 2016 09-19
Strategies oled optimization jmp 2016 09-19Strategies oled optimization jmp 2016 09-19
Strategies oled optimization jmp 2016 09-19
 
Strategies for Optimization of an OLED Device
Strategies for Optimization of an OLED DeviceStrategies for Optimization of an OLED Device
Strategies for Optimization of an OLED Device
 

Mehr von CS, NcState

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdecCS, NcState
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).CS, NcState
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 

Mehr von CS, NcState (20)

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 

Kürzlich hochgeladen

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Kürzlich hochgeladen (20)

YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

Principles of effort estimation

  • 1. A Principled Methodology A Dozen Principles of Software Effort Estimation Ekrem Kocaguneli, 11/07/2012
  • 2. 2 Agenda • Introduction • Publications • What to Know • 8 Questions • Answers • 12 Principles • Validity Issues • Future Work
  • 3. 3 Introduction Software effort estimation (SEE) is the process of estimating the total effort required to complete a software project (Keung2008 [1]). Successful estimation is critical for an organizations Over-estimation: Killing promising projects Under-estimation: Wasting entire effort! E.g. NASA’s launch-control system cancelled after initial estimate of $200M was overrun by another $200M [22] Among IT projects developed in 2009, only 32% were successfully completed within time with full functionality [23]
  • 4. 4 Introduction (cntd.) We will discuss algorithms, but it would be irresponsible to say that SEE is merely an algorithmic problem. Organizational factors are just as important E.g. common experiences of data collection and user interaction in organizations operating in different domains
  • 5. 5 Introduction (cntd.) This presentation is not about a single algorithm/answer targeting a single problem. Because there is not just one question. It is (unfortunately) not everything about SEE. It brings together critical questions and related solutions.
  • 6. 6 What to know? 1 When do I have perfect data? What is the best effort 2 estimation method? 3 Can I use multiple methods? 4 ABE methods are easy to use. 5 What if I lack resources How can I improve them? for local data? 7 Are all attributes and all 6 I don’t believe in size instances necessary? attributes. What can I do? 8 How to experiment, which sampling method to use?
  • 7. 7 Publications Journals • E. Kocaguneli, T. Menzies, J. Keung, “On the Value of Ensemble Effort Estimation”, IEEE Transactions on Software Engineering, 2011. • E. Kocaguneli, T. Menzies, A. Bener, J. Keung, “Exploiting the Essential Assumptions of Analogy-based Effort Estimation”, IEEE Transactions on Software Engineering, 2011. • E. Kocaguneli, T. Menzies, J. Keung, “Kernel Methods for Software Effort Estimation”, Empirical Software Engineering Journal, 2011. • J. Keung, E. Kocaguneli, T. Menzies, “A Ranking Stability Indicator for Selecting the Best Effort Estimator in Software Cost Estimation”, Journal of Automated Software Engineering, 2012. Under review Journals • E. Kocaguneli, T. Menzies, J. Keung, “Active Learning for Effort Estimation”, third round review at IEEE Transactions on Software Engineering. • E. Kocaguneli, T. Menzies, E. Mendes, “Transfer Learning in Effort Estimation”, submitted to ACM Transactions on Software Engineering. • E. Kocaguneli, T. Menzies, “Software Effort Models Should be Assessed Via Leave-One-Out Validation”, under second round review at Journal of Systems and Software. • E. Kocaguneli, T. Menzies, E. Mendes, “Towards Theoretical Maximum Prediction Accuracy Using D- ABE”, submitted to IEEE Transactions on Software Engineering. Conference • E. Kocaguneli, T. Menzies, J. Hihn, Byeong Ho Kang, “Size Doesn‘t Matter? On the Value of Software Size Features for Effort Estimation”, Predictive Models in Software Engineering (PROMISE) 2012. • E. Kocaguneli, T. Menzies, “How to Find Relevant Data for Effort Estimation”, International Symposium on Empirical Software Engineering and Measurement (ESEM) 2011 • E. Kocaguneli, G. Gay, Y. Yang, T. Menzies, “When to Use Data from Other Projects for Effort Estimation”, International Conference on Automated Software Engineering (ASE) 2010, Short-paper.
  • 8. 8 1 When do I have the perfect data? Principle #1: Know your domain Domain knowledge is important in every step (Fayyad1996 [2]) Yet, this knowledge takes time and effort to gain, e.g. percentage commit information Principle #2: Let the experts talk Initial results may be off according to domain experts Success is to create a discussion, interest and suggestions Principle #3: Suspect your data “Curiosity” to question is a key characteristic (Rauser2011 [3]) e.g. in an SEE project, 200+ test cases, 0 bugs Principle #4: Data collection is cyclic Any step from mining till presentation may be repeated
  • 9. 9 2 What is the best effort estimation method? There is no agreed upon Methods change ranking w.r.t. best estimation method conditions such as data sets, error (Shepperd2001 [4]) measures (Myrtveit2005 [5]) Experimenting with: 90 solo- methods, 20 public data sets, 7 Top 13 methods are CART & ABE error measures methods (1NN, 5NN)
  • 10. 10 3 How to use superior subset of methods? We have a set of Assembling solo-methods superior methods to may be a good idea, e.g. recommend fusion of 3 biometric modalities (Ross2003 [20]) But the previous evidence of Baker2007 [7], Kocaguneli2009 assembling multiple methods in [8], Khoshgoftaar2009 [9] failed to SEE is discouraging outperform solo-methods Combine top 2,4,8,13 solo- methods via mean, median and IRWM
  • 11. 11 2 How to use superior subset of methods? 3 What is the best effort estimation method? Principle #5: Use a ranking stability indicator Principle #6: Assemble superior solo-methods A method to identify successful methods using their rank changes A novel scheme for assembling solo-methods Multi-methods that outperform all solo-methods This research published at: . • Kocaguneli, T. Menzies, J. Keung, “On the Value of Ensemble Effort Estimation”, IEEE Transactions on Software Engineering, 2011. • J. Keung, E. Kocaguneli, T. Menzies, “A Ranking Stability Indicator for Selecting the Best Effort Estimator in Software Cost Estimation”, Journal of Automated Software Engineering, 2012.
  • 12. 12 4 How can we improve ABE methods? Analogy based methods They are very widely used make use of similar past (Walkerden1999 [10]) as: projects for estimation • No model-calibration to local data • Can better handle outliers • Can work with 1 or more attributes • Easy to explain Two promising research areas • weighting the selected analogies (Mendes2003 [11], Mosley2002[12]) • improving design options (Keung2008 [1])
  • 13. 13 How can we improve ABE methods? (cntd.) Building on the previous research (Mendes2003 [11], Mosley2002[12] ,Keung2008 [1]), we adopted two different strategies a) Weighting analogies We used kernel weighting to weigh selected analogies Compare performance of each k-value with and without weighting. In none of the scenarios did we see a significant improvement
  • 14. 14 How can we improve ABE methods? b) Designing ABE methods (cntd.) D-ABE Easy-path: Remove training • Get best estimates of all training instance that violate assumptions instances • Remove all the training instances TEAK will be discussed later. within half of the worst MRE (acc. D-ABE: Built on theoretical to TMPA). maximum prediction accuracy • Return closest neighbor’s estimate (TMPA) (Keung2008 [1]) to the test instance. Training Instances Test instance t a Close to the b d worst MRE Return the c closest e neighbor’s estimate f Worst MRE
  • 15. 15 How can we improve ABE methods? (cntd.) DABE Comparison to DABE Comparison to static k w.r.t. MMRE static k w.r.t. win, tie, loss
  • 16. 16 How can we improve ABE methods? (cntd.) Principle #7: Weighting analogies is overelaboration Principle #8: Use easy-path design Investigation of an unexplored and promising ABE option of kernel-weighting A negative result published at ESE Journal An ABE design option that can be applied to different ABE methods (D-ABE, TEAK) This research published at: . • E. Kocaguneli, T. Menzies, A. Bener, J. Keung, “Exploiting the Essential Assumptions of Analogy-based Effort Estimation”, IEEE Transactions on Software Engineering, 2011. • E. Kocaguneli, T. Menzies, J. Keung, “Kernel Methods for Software Effort Estimation”, Empirical Software Engineering Journal, 2011.
  • 17. 17 5 How to handle lack of local data? Finding enough local training Merits of using cross-data from data is a fundamental problem another company is questionable of SEE (Turhan2009 [13]). (Kitchenham2007 [14]). We use a relevancy filtering method called TEAK on public and proprietary data sets. Similar projects, Similar projects, dissimilar effort similar effort values, hence values, hence high variance low variance Cross data works as well as within data for 6 out of 8 proprietary data sets, 19 out of 21 public data sets after TEAK’s relevancy filtering
  • 18. 18 How to handle lack of local data? (cntd.) Principle #9: Use relevancy filtering A novel method to handle lack of local data Successful application on public as well as proprietary data This research published at: . • E. Kocaguneli, T. Menzies, “How to Find Relevant Data for Effort Estimation”, International Symposium on Empirical Software Engineering and Measurement (ESEM) 2011 • E. Kocaguneli, G. Gay, Y. Yang, T. Menzies, “When to Use Data from Other Projects for Effort Estimation”, International Conference on Automated Software Engineering (ASE) 2010, Short-paper.
  • 19. 19 E(k) matrices & Popularity This concept helps the next 2 problems: size features and the essential content, i.e. pop1NN and QUICK algorithms, respectively A similar concept is reverse nearest neighbor (RNN) in ML, used to find instances whose k-NN’s are included in a specific query (Achtert2006 *26+).
  • 20. 20 E(k) matrices & Popularity (cntd.) Outlier pruning Sample steps 1. Calculate “popularity” of instances 2. Sorting by popularity, 3. Label one instance at a time 4. Find the stopping point 5. Return estimate from labeled training data Finding the stopping point 1. If all popular instances are exhausted. 2. Or if there is no MRE improvement for n consecutive times. 3. Or if the ∆ between the best and the worst error of the last n instances is very small. (∆ = 0.1; n = 3)
  • 21. 21 E(k) matrices & Popularity (cntd.) Picking random More popular instances training instance is One of the stopping in the active pool not a good idea point conditions fire decreases error
  • 22. 22 6 Do I have to use size attributes? At the heart of widely accepted COCOMO uses LOC (Boehm1981 SEE methods lies the software [15]), whereas FP (Albrecht1983 size attributes [16]) uses logical transactions Size attributes are beneficial if used properly (Lum2002 [17]); e.g. DoD and NASA uses successfully. Yet, the size attributes may not be trusted or may not be estimated at the early stages. That disrupts adoption of SEE methods. Measuring software productivity by lines of code is This is a very costly measuring like measuring progress on an unit because it encourages the airplane by how much it weighs writing of insipid code - E. Dijkstra – B. Gates
  • 23. 23 Do I have to use size attributes? (cntd.) pop1NN (w/o size) vs. CART and 1NN (w/ size) Given enough resources for correct collection and estimation, size features are helpful If not, then outlier pruning helps.
  • 24. 24 Do I have to use size attributes? (cntd.) Principle #10: Use outlier pruning Promotion of SEE methods that can compensate the lack of the software size features A method called pop1NN that shows that size features are not a “must”. This research published at: . • E. Kocaguneli, T. Menzies, J. Hihn, Byeong Ho Kang, “Size Doesn‘t Matter? On the Value of Software Size Features for Effort Estimation”, Predictive Models in Software Engineering (PROMISE) 2012.
  • 25. 25 7 What is the essential content of SEE data? SEE is populated with overly In a matrix of N instances and F complex methods for features, the essential content is N ′ ∗ F ′ marginal performance increase (Jorgensen2007 [18]) QUICK is an active learning Synonym pruning method combines outlier 1. Transpose normalized matrix removal and synonym pruning and calculate the popularity Removal of features based on of features distance seemed to be reserved 2. Select non-popular features. for instances. Similar tasks both remove ABE method as a two dimensional cells in the hypercube of all reduction (Ahn2007 [25]) cases times all columns In our lab variance-based feature (Lipowezky1998 [24]) selector is used as a row selector
  • 26. 26 What is the essential content of SEE data? (cntd.) At most 31% of all the cells On median 10% There is a consensus in the high-dimensional data analysis community that the only reason any methods work in very high dimensions is that, in fact, the data are not truly high-dimensional. (Levina & Bickel 2005) Performance?
  • 27. 27 What is the essential content of SEE QUICK vs passiveNN (1NN) data? (cntd.) QUICK vs CART Only one dataset where QUICK is significantly worse than passiveNN 4 such data sets when QUICK is compared to CART
  • 28. 28 What is the essential content of SEE data? (cntd.) Principle #11: Combine outlier and synonym pruning An unsupervised method to find the essential content of SEE data sets and reduce the data needs Promoting research to elaborate on the data, not on the algorithm This research is under 3rd round review: . • E. Kocaguneli, T. Menzies, J. Keung, “Active Learning for Effort Estimation”, third round review at IEEE Transactions on Software Engineering.
  • 29. 29 8 How should I choose the right SM? Expectation (Kitchenham2007 [7]) Observed No significant difference for B&V values among 90 methods Only minutes of run time difference (<15) LOO is not probabilistic and results can be easily shared
  • 30. 30 How should I choose the right SM? (cntd.) Principle #12: Be aware of sampling method trade-off The first investigation of B&V trade-off in SEE domain Recommendation based on experimental concerns This research is under 2nd round review: . • E. Kocaguneli, T. Menzies, “Software Effort Models Should be Assessed Via Leave-One-Out Validation”, under second round review at Journal of Systems and Software.
  • 31. 31 1. What to know? Know your domain 2. Let the experts talk 3.When do I your data Suspect have perfect data? What isathe best effort 5. Use ranking stability 4. Data collection is cyclic estimation method? indicator 6. Assemble superior solo- Can I use multiple methods? methods 7.ABE methods are easy to use. Weighting analogies is over- What if I lack resources elaboration I improve them? How can 9. Use relevancy filtering 8. Use easy-path design for local data? Are all attributes andand 11. Combine outlier all I don’t believe in size instances necessary? synonym pruning 10. Use outlier pruning attributes. What can I do? How Be experiment, which 12. to aware of sampling method trade-off sampling method to use?
  • 32. 32 Validity Issues Construct validity, i.e. do we measure what we intend to measure? Use of previously recommended estimation methods, error measures and data sets External validity, i.e. can we generalize results outside current specifications Difficult to assert that results will definitely hold Yet we use almost all the publicly available SEE data sets. Median value of projects used by the studies reviewed is 186 projects (Kitchenham2007 [14]) Our experimentation uses 1000+ projects
  • 33. 33 Future Work Application on publicly accessible big data sets 300K projects, 2M users 250K open source projects Smarter, larger scale algorithms for general conclusions Application to different domains, e.g. defect Current methods may face prediction scalability issues. Improving common ideas for scalability, e.g. Combining intrinsic dimensionality linear time NN methods techniques in ML for lower bound dimensions of SEE data sets (Levina2004 [27])
  • 34. What have we covered? 34
  • 35. 35
  • 36. 36 References [1] J. W. Keung, “Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation,” 15th Asia-Pacific Software Engineering Conference, pp. 495– 502, 2008. [2] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “The kdd process for extracting useful knowledge from volumes of data,” Commun. ACM, vol. 39, no. 11, pp. 27–34, Nov. 1996. [3] J. Rauser, “What is a career in big data?” 2011. [Online]. Available: http: //strataconf.com/stratany2011/public/schedule/speaker/10070 [4] M. Shepperd and G. Kadoda, “Comparing Software Prediction Techniques Using Simulation,” IEEE Trans. Softw. Eng., vol. 27, no. 11, pp. 1014–1022, 2001. [5] I. Myrtveit, E. Stensrud, and M. Shepperd, “Reliability and validity in comparative studies of software prediction models,” IEEE Trans. Softw. Eng., vol. 31, no. 5, pp. 380–391, May 2005. [6] E. Alpaydin, “Techniques for combining multiple learners,” Proceedings of Engineering of Intelligent Systems, vol. 2, pp. 6–12, 1998. [7] D. Baker, “A hybrid approach to expert and model-based effort estimation,” Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University, 2007. [8] E. Kocaguneli, Y. Kultur, and A. Bener, “Combining multiple learners induced on multiple datasets for software effort prediction,” in International Symposium on Software Reliability Engineering (ISSRE), 2009, student Paper. [9] T. M. Khoshgoftaar, P. Rebours, and N. Seliya, “Software quality analysis by combining multiple projects and learners,” Software Quality Control, vol. 17, no. 1, pp. 25–49, 2009. [10] F. Walkerden and R. Jeffery, “An empirical study of analogy-based software effort estima- tion,” Empirical Software Engineering, vol. 4, no. 2, pp. 135–158, 1999. [11] E. Mendes, I. D. Watson, C. Triggs, N. Mosley, and S. Counsell, “A comparative study of cost estimation models for web hypermedia applications,” Empirical Software Engineering, vol. 8, no. 2, pp. 163–196, 2003.
  • 37. [12] E. Mendes and N. Mosley, “Further investigation into the use of cbr and stepwise regression to 37 predict development effort for web hypermedia applications,” in International Symposium on Empirical Software Engineering, 2002. [13] B. Turhan, T. Menzies, A. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empirical Software Engineering, vol. 14, no. 5, pp. 540– 578, 2009. [14] B. A. Kitchenham, E. Mendes, and G. H. Travassos, “Cross versus within-company cost estimation studies: A systematic review,” IEEE Trans. Softw. Eng., vol. 33, no. 5, pp. 316– 329, 2007. [15] B. W. Boehm, C. Abts, A. W. Brown, S. Chulani, B. K. Clark, E. Horowitz, R. Madachy, D. J. Reifer, and B. Steece, Software Cost Estimation with Cocomo II. Upper Saddle River, NJ, USA: Prentice Hall PTR, 2000. [16] A. Albrecht and J. Gaffney, “Software function, source lines of code and development effort prediction: A software science validation,” IEEE Trans. Softw. Eng., vol. 9, pp. 639–648, 1983. [17] K. Lum, J. Powell, and J. Hihn, “Validation of spacecraft cost estimation models for flight and ground systems,” in ISPA’02: Conference Proceedings, Software Modeling Track, 2002. [18] M. Jorgensen and M. Shepperd, “A systematic review of software development cost estimation studies,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 33–53, 2007. [19] ] B. A. Kitchenham, E. Mendes, and G. H. Travassos, “Cross versus within-company cost estimation studies: A systematic review,” IEEE Trans. Softw. Eng., vol. 33, no. 5, pp. 316– 329, 2007. [20] A. Ross, “Information fusion in biometrics,” Pattern Recognition Letters, vol. 24, no. 13, pp. 2115– 2125, Sep. 2003. [21] Raymond P. L. Buse, Thomas Zimmermann: Information needs for software development analytics. ICSE 2012: 987-996 [22] Spareref.com. Nasa to shut down checkout & launch control system, August 26, 2002. http://www.spaceref.com/news/viewnews.html?id=475. [23] Standish Group (2004).  CHAOS Report(Report). West Yarmouth, Massachusetts: Standish Group. [24] U. Lipowezky, Selection of the optimal prototype subset for 1-NN classification, Pattern Recognition Lett. 19 (1998) 907}918. [25] Hyunchul Ahn, Kyoung-jae Kim, Ingoo Han, A case-based reasoning system with the two- dimensional reduction technique for customer classification, Expert Systems with Applications, Volume 32, Issue 4, May 2007, Pages 1011-1019, ISSN 0957-4174, 10.1016/j.eswa.2006.02.021. [26] Elke Achtert, Christian Böhm, Peer Kröger, Peter Kunath, Alexey Pryakhin, and Matthias Renz. 2006. Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06) [27] E. Levina and P.J. Bickel. Maximum likelihood estimation of intrinsic dimension. In Advances in Neural Information Processing Systems, volume 17, Cambridge, MA, USA, 2004. The MIT Press.
  • 40. 40 What is the best effort estimation method? (cntd.) 1. Rank methods acc. to win, loss and win-loss values 2. δr is the max. rank change 3. Sort methods acc. to loss and observe δr values
  • 41. 41 What is the best effort estimation method? (cntd.) What about aggregate results reflecting on specific scenarios? (question of a reviewer) Sort methods according to increasing MdMRE Group MRE values that are statistically the same Highlighted are the cases, where superior- methods do not occur in the top group Note how superior solo-methods correspond to the best (lowest MRE) groups
  • 42. 42 How can we improve ABE methods? (cntd.) We used kernel weighting with 4 kernels with 5 bandwidth values plus IRWM to weigh selected analogies (5 different k values) A total of 2090 settings: • 19 datasets * 5 k-values = 95 • 19 datasets * 5 k values * 4 kernels * 5 bandwidths = 1900 • IRWM: 19 datasets * 5 k values = 95
  • 43. 43 How can we improve ABE methods? (cntd.) We used kernel weighting to weigh selected analogies Compare performance of each k-value with and without weighting. • o = tie for 3 or more k values • - = loss for 3 or more k values • + = win for 3 or more k values In none of the scenarios did we see a significant improvement
  • 44. 44 How to handle lack of local data? (cntd.) TEAK on proprietary data TEAK on public data
  • 45. 45 Do I have to use size attributes? (cntd.) Can standard methods tolerate the lack of size attributes? CART w/o size vs. CART w/ size CART and 1NN
  • 46. 46 8 How should I choose the right SM? Only one work (Kitchenham2007 [7]) discusses implications of sampling method (SM) on the Expectations is bias and variance LOO: high variance, low bias 3Way: low variance, high bias 10Way: in between Does expectation hold? What about run time and ease-of replication?