SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Data Collection

Process and Integrity of Data Collection for
Later Software Cost Estimation Calibrations

              Gerrit Klaschke
What is covered?
   Data Collection
       What is it?
       Process
       Best Practices
   Data Integrity
       Checklist
       Additional Tips
Data Collection Process
   A data collection process should cover several important parts:
        Ensure high quality data (see Data Collection Integrity)
        How to collect data from the sources
        How to store the data for later retrieval (analyses and calibration)

   The process itself must be refined to the point where data received
    has some confidence to it - Not just taking what someone wrote on
    the form at face value!

   The reason for collecting data and what data needs to be collected
    is manifold.
        Goal-Question-Metric approach can help defining what metrics you need
         to answer certain questions and reach goals. Goals can range from
         quality improvement to schedule decrease.
Data Collection Process
   When to Collect Data
       When scoping a new project
       During development for management and to
        identify issues and progress
       Post Mortem to improve corporate history
        repository (database of completed projects)
       During maintenance to continue improving
Data Collection Process
   Suggested Central Repository
    Requirements
       Database must be extensible so new fields can be
        added easily
       Must be open, not a proprietary database
       Approach allows hosting on standalone laptop for
        traveling users etc
       Additional speed over browser based versions
       Read information into excel or access via ODBC. Not
        limited to provided functionality like many browser
        based applications.
Data Collection Process
   Basic Flow
       Individuals or organizations will send their data of completed
        projects to the metrics analyst or person responsible for
        collection/analysis.
       All incoming data must be stored (which should includes
        versioning in case updates come in from the same source) and
        then reviewed for integrity and completeness. If there are
        uncertainties, the metrics analyst has to clarify the points.
        Having a RDBMS system makes tracking and updates very easy.
       If a normalization process is required, save both versions of the
        data.
       Once a completed project passes the QA, it will be available in
        the database for retrieval. This includes retrieval for the purpose
        of ‘estimate by analogy’, more analysis (GQM or finding new
        correlations) and calibration of estimation models.
Data Collection Process – Lessons
                 Learned
   Identify business goals. Use GQM. Setting goals enables a metric
    program to enhance business results, reduce cost by keeping a
    program well-defined and focused, and ensure a basis for improving
    a business’ return of investment for IT.
   Clear definitions are essential but people will not always follow
    them. Personally talk to them and interview to capture data. Do not
    just take a form as face value. Doing this will improve the quality of
    data as the interviewer can ask questions to clarify.
   People don’t read instructions. They might provide ‘just a
    number’ off the top of their head. Some people might misinterpret
    the data on purpose to make them look better on by mistake.
    Personally talk to them.
   Sensitive data: if people/departments/companies don’t want to
    share sensitive data or have concerns, try to sanitize the data.
Data Collection Process – Lessons
                 Learned
   Cost of data collection: some will claim that data collection cost
    too much. Go through the list of benefits and back it up by data
    showing that estimation/project success increases when using a
    historical database/calibrated models.
        E.g. tell your manager for instance “software metrics will help us reduce
         the number of faults reported in newly developed software by 25%
         without increasing project schedules. The resulting savings in support
         costs should drive a 150% ROI in the first year”.
   Cost of data collection 2: some developers will claim they are not
    paid to collect data. Determine their claimed CMM/CMMI rating. If it
    is 3 or higher, collecting data is required. Ask for that data in their
    format and offer to fill in the forms yourself.
Data Collection Process – Lessons
                 Learned
   Use a good code counter. See the list of code
    counters on the QSM.com site. The ‘understand’ code
    counter is also used quite often in companies.
   Be sure to discriminate auto-generated code from
    hand generated code. Auto-generated code does not
    have the same correlation to effort as hand generated.
   Collect completed project actuals first: Start by
    collecting data from completed projects first and THEN
    collect from projects that are still underway.
Data Collection Process – Lessons
                 Learned
   Qualify the data quality: Some data collected will be
    nonsensical. There are 2 approaches to handle this:
       Eliminate this data altogether. (not really recommended as data
        is lost)
       Include a qualifier on the data rating it ‘a’ to ‘f’. The ISBSG
        database has a rating similar to this.
   Capture both total size and amount of reuse:
    Reuse is an essential part of software size. Just
    collecting total size will skew the size/effort correlation.
   Don’t eliminate data points just because of the
    programming language: size can be converted from one
    language to another!
Data Collection Process – Lessons
                 Learned
   Have a normalization process and keep the data
    both in raw and normalized forms.
       Data will be collected in varying phases, labor categories, size
        definitions etc. Keep the raw data. And have a standard, well
        documented normalization process that is rigorously followed to
        normalize to a standard set of activities, phases etc.
   Have a structure for data storage: An excel sheet
    can be used but will become unworkable as the
    database grows. Get the data into an open database
    asap.
   Offer them something in return: this could be a
    sanitized copy of the database or at least a benchmark
    showing how data fits with the rest of the database!
Data Integrity
   Good quality data is paramount to ensure
    good calibration results.
Data Integrity - Checklist
   Review the goal of the data collection
       What is the data being used for? E.g. project type
        calibration, later use for estimation by analogy etc.
        This drives the data being collected.
   Ensure the integrity of the data collection
    process
       Have the groups providing data been trained with
        regard to the required data?
   Definitions
       Are different projects providing data using the same
        data definitions?
Data Integrity - Checklist
   Approval of Inputs
       Has at least one designated individual approved the
        inputs for each project?
   Missing Data
       Has any missing data been identified?
   Estimates/Actuals
       Are estimates of data items used in place of missing
        actual data?
   Rationale
       Provide written rationale for any estimates used in
        the calibration
Data Integrity - Checklist
   Sensitivity Analysis
       If estimates are used in lieu of actuals, has a
        sensitivity analysis been done to evaluate the impact
        on the calibration of varying assumptions with respect
        to the estimates?
   Extra Data
       Has any extra data or different definitions been used?
   Changes
       Describe any changes made and the rationale for
        them.
Data Integrity - Checklist
   Additional Data
       Has any additional data been collected that
        can be used for later purposes?
       Identify the extra data and how it might be
        used. Examples include effort and schedule
        portions for detailed phases and activities.
   Size Conversion
       Have all size measures been converted to
        eSLOC or another base unit?
Data Integrity - Checklist
   Counting Conventions
       What SLOC counting conventions were followed
        (logical SLOC, physical SLOC etc)?
       If SLOC is not used, what definitions were followed
        (such as IFPUG 4.2 standard, use cases 2.0)
   Reuse
       Are all reuse parameters provided for reused,
        modified and COTS software portions?
       Has all reuse and modification been accounted for
        and converted into equivalent SLOC?
Data Integrity - Checklist
   Reused/Modified
       Does the total equivalent size include all new software and the
        equivalent sizes of reused and modified software?
   Evolution
       Has Requirements Evolution been reported?
   Input Ranges
       Make sure that there are no ranges in the volume input, as that
        would indicate previously estimated values.
   Factors
       Has the environment and scaling factors been updated?
   Hours per Month
       Has the correct HPM been applied?
Data Integrity – Additional Tips
   Actual Phase Information: all activities may NOT be included. E.g.
    system concept and integration is excluded.
   Actual Labor Information: all activities may NOT be included. E.g.
    configuration and quality assurance is excluded.
   Was the schedule ‘stop and start’?
   Resources: where there hard-hitting resource constraints?
   Volatility: did requirements undergo extraordinary evolution?
   Manager’s objectives: was the project to complete in ‘minimum
    time’ or ‘least cost’?
   Effort: are effort figures actually derived from cost figures?
   Always run sanity checks on data. E.g. one million lines of
    code cannot be developed in 3 months.

Weitere ähnliche Inhalte

Was ist angesagt?

Data analysis market research
Data analysis   market researchData analysis   market research
Data analysis market researchsachinudepurkar
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collectionYogeshSorot
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataRoqui Malijan
 
Research Methodology: Questionnaire, Sampling, Data Preparation
Research Methodology: Questionnaire, Sampling, Data PreparationResearch Methodology: Questionnaire, Sampling, Data Preparation
Research Methodology: Questionnaire, Sampling, Data Preparationamitsethi21985
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniquesUjjwal 'Shanu'
 
Initial analysis of data metpen
Initial analysis of data metpenInitial analysis of data metpen
Initial analysis of data metpenGfv Gfv
 
Data Collection Methods
Data Collection MethodsData Collection Methods
Data Collection MethodsSOMASUNDARAM T
 
Quantitative data 2
Quantitative data 2Quantitative data 2
Quantitative data 2Illi Elas
 
Analysing qualitative data from information organizations
Analysing qualitative data from information organizationsAnalysing qualitative data from information organizations
Analysing qualitative data from information organizationsAleeza Ahmad
 
Data collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysisData collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysisRobinsonRaja1
 
Business Research Method - Unit III, AKTU, Lucknow Syllabus
Business Research Method - Unit III, AKTU, Lucknow SyllabusBusiness Research Method - Unit III, AKTU, Lucknow Syllabus
Business Research Method - Unit III, AKTU, Lucknow SyllabusKartikeya Singh
 
Types of research Designs
Types of research DesignsTypes of research Designs
Types of research DesignsAbu Bashar
 
Editing, coding and tabulation of data
Editing, coding and tabulation of dataEditing, coding and tabulation of data
Editing, coding and tabulation of dataSiddharth Gupta
 

Was ist angesagt? (19)

Data analysis market research
Data analysis   market researchData analysis   market research
Data analysis market research
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collection
 
Stat and prob a recap
Stat and prob   a recapStat and prob   a recap
Stat and prob a recap
 
Analyzing survey data
Analyzing survey dataAnalyzing survey data
Analyzing survey data
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of Data
 
Research Methodology: Questionnaire, Sampling, Data Preparation
Research Methodology: Questionnaire, Sampling, Data PreparationResearch Methodology: Questionnaire, Sampling, Data Preparation
Research Methodology: Questionnaire, Sampling, Data Preparation
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniques
 
Initial analysis of data metpen
Initial analysis of data metpenInitial analysis of data metpen
Initial analysis of data metpen
 
Data Collection Methods
Data Collection MethodsData Collection Methods
Data Collection Methods
 
Data analysis copy
Data analysis   copyData analysis   copy
Data analysis copy
 
Quantitative data 2
Quantitative data 2Quantitative data 2
Quantitative data 2
 
Data
DataData
Data
 
Analysing qualitative data from information organizations
Analysing qualitative data from information organizationsAnalysing qualitative data from information organizations
Analysing qualitative data from information organizations
 
Data collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysisData collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysis
 
Business Research Method - Unit III, AKTU, Lucknow Syllabus
Business Research Method - Unit III, AKTU, Lucknow SyllabusBusiness Research Method - Unit III, AKTU, Lucknow Syllabus
Business Research Method - Unit III, AKTU, Lucknow Syllabus
 
Types of research Designs
Types of research DesignsTypes of research Designs
Types of research Designs
 
Editing, coding and tabulation of data
Editing, coding and tabulation of dataEditing, coding and tabulation of data
Editing, coding and tabulation of data
 
Data analysis aug-11
Data analysis aug-11Data analysis aug-11
Data analysis aug-11
 

Andere mochten auch

Mba2216 business research week 5 data collection part 1 0713
Mba2216 business research week 5 data collection part 1 0713Mba2216 business research week 5 data collection part 1 0713
Mba2216 business research week 5 data collection part 1 0713Stephen Ong
 
Data and data collection in qualitative research
Data and data collection in qualitative researchData and data collection in qualitative research
Data and data collection in qualitative researchRizky Amelia
 
Research data challenge presentation
Research data challenge presentationResearch data challenge presentation
Research data challenge presentationJisc
 
Data Collection in Research
Data Collection in ResearchData Collection in Research
Data Collection in ResearchAbhijeet Birari
 
Methods of data collection (research methodology)
Methods of data collection  (research methodology)Methods of data collection  (research methodology)
Methods of data collection (research methodology)Muhammed Konari
 
Data collection presentation
Data collection presentationData collection presentation
Data collection presentationKanchan Agarwal
 
Data Collection-Primary & Secondary
Data Collection-Primary & SecondaryData Collection-Primary & Secondary
Data Collection-Primary & SecondaryPrathamesh Parab
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collectionsimij
 
Methods of data collection
Methods of data collection Methods of data collection
Methods of data collection PRIYAN SAKTHI
 

Andere mochten auch (12)

Mba2216 business research week 5 data collection part 1 0713
Mba2216 business research week 5 data collection part 1 0713Mba2216 business research week 5 data collection part 1 0713
Mba2216 business research week 5 data collection part 1 0713
 
Data and data collection in qualitative research
Data and data collection in qualitative researchData and data collection in qualitative research
Data and data collection in qualitative research
 
Research data challenge presentation
Research data challenge presentationResearch data challenge presentation
Research data challenge presentation
 
Data Collection in Research
Data Collection in ResearchData Collection in Research
Data Collection in Research
 
Methods of data collection (research methodology)
Methods of data collection  (research methodology)Methods of data collection  (research methodology)
Methods of data collection (research methodology)
 
Community linkages
Community linkagesCommunity linkages
Community linkages
 
Historical Research
Historical ResearchHistorical Research
Historical Research
 
Data collection presentation
Data collection presentationData collection presentation
Data collection presentation
 
Data Collection-Primary & Secondary
Data Collection-Primary & SecondaryData Collection-Primary & Secondary
Data Collection-Primary & Secondary
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collection
 
Chapter 9-METHODS OF DATA COLLECTION
Chapter 9-METHODS OF DATA COLLECTIONChapter 9-METHODS OF DATA COLLECTION
Chapter 9-METHODS OF DATA COLLECTION
 
Methods of data collection
Methods of data collection Methods of data collection
Methods of data collection
 

Ähnlich wie Data Collection Process And Integrity

Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overviewdublinx
 
DMAIC addressed Bearnson S-N tracking for all product.
DMAIC addressed Bearnson S-N tracking for all product.DMAIC addressed Bearnson S-N tracking for all product.
DMAIC addressed Bearnson S-N tracking for all product.Bill Bearnson
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Chain Sys Corporation
 
Data science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughData science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughTristan Wiggill
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementationDavid Pedreno
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementationDavid Pedreno
 
Asset Finance Systems Implementation
Asset Finance Systems ImplementationAsset Finance Systems Implementation
Asset Finance Systems ImplementationDavid Pedreno
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Using iga to promote students
Using iga to promote studentsUsing iga to promote students
Using iga to promote studentsThỏ Ngọc
 
Bi Capacity Planning
Bi Capacity PlanningBi Capacity Planning
Bi Capacity Planningmstmike
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...Agile Testing Alliance
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
Process and Project Metrics-1
Process and Project Metrics-1Process and Project Metrics-1
Process and Project Metrics-1Saqib Raza
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23Jason Packer
 
Predicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and AnalysisPredicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and AnalysisBooz Allen Hamilton
 

Ähnlich wie Data Collection Process And Integrity (20)

Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
DMAIC addressed Bearnson S-N tracking for all product.
DMAIC addressed Bearnson S-N tracking for all product.DMAIC addressed Bearnson S-N tracking for all product.
DMAIC addressed Bearnson S-N tracking for all product.
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
Planning Data Warehouse
Planning Data WarehousePlanning Data Warehouse
Planning Data Warehouse
 
Data science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughData science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enough
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
 
Asset Finance Systems Implementation
Asset Finance Systems ImplementationAsset Finance Systems Implementation
Asset Finance Systems Implementation
 
Focus
FocusFocus
Focus
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Using iga to promote students
Using iga to promote studentsUsing iga to promote students
Using iga to promote students
 
Bi Capacity Planning
Bi Capacity PlanningBi Capacity Planning
Bi Capacity Planning
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
Agile Data
Agile DataAgile Data
Agile Data
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Process and Project Metrics-1
Process and Project Metrics-1Process and Project Metrics-1
Process and Project Metrics-1
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23
 
Predicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and AnalysisPredicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and Analysis
 

Data Collection Process And Integrity

  • 1. Data Collection Process and Integrity of Data Collection for Later Software Cost Estimation Calibrations Gerrit Klaschke
  • 2. What is covered?  Data Collection  What is it?  Process  Best Practices  Data Integrity  Checklist  Additional Tips
  • 3. Data Collection Process  A data collection process should cover several important parts:  Ensure high quality data (see Data Collection Integrity)  How to collect data from the sources  How to store the data for later retrieval (analyses and calibration)  The process itself must be refined to the point where data received has some confidence to it - Not just taking what someone wrote on the form at face value!  The reason for collecting data and what data needs to be collected is manifold.  Goal-Question-Metric approach can help defining what metrics you need to answer certain questions and reach goals. Goals can range from quality improvement to schedule decrease.
  • 4. Data Collection Process  When to Collect Data  When scoping a new project  During development for management and to identify issues and progress  Post Mortem to improve corporate history repository (database of completed projects)  During maintenance to continue improving
  • 5. Data Collection Process  Suggested Central Repository Requirements  Database must be extensible so new fields can be added easily  Must be open, not a proprietary database  Approach allows hosting on standalone laptop for traveling users etc  Additional speed over browser based versions  Read information into excel or access via ODBC. Not limited to provided functionality like many browser based applications.
  • 6. Data Collection Process  Basic Flow  Individuals or organizations will send their data of completed projects to the metrics analyst or person responsible for collection/analysis.  All incoming data must be stored (which should includes versioning in case updates come in from the same source) and then reviewed for integrity and completeness. If there are uncertainties, the metrics analyst has to clarify the points. Having a RDBMS system makes tracking and updates very easy.  If a normalization process is required, save both versions of the data.  Once a completed project passes the QA, it will be available in the database for retrieval. This includes retrieval for the purpose of ‘estimate by analogy’, more analysis (GQM or finding new correlations) and calibration of estimation models.
  • 7. Data Collection Process – Lessons Learned  Identify business goals. Use GQM. Setting goals enables a metric program to enhance business results, reduce cost by keeping a program well-defined and focused, and ensure a basis for improving a business’ return of investment for IT.  Clear definitions are essential but people will not always follow them. Personally talk to them and interview to capture data. Do not just take a form as face value. Doing this will improve the quality of data as the interviewer can ask questions to clarify.  People don’t read instructions. They might provide ‘just a number’ off the top of their head. Some people might misinterpret the data on purpose to make them look better on by mistake. Personally talk to them.  Sensitive data: if people/departments/companies don’t want to share sensitive data or have concerns, try to sanitize the data.
  • 8. Data Collection Process – Lessons Learned  Cost of data collection: some will claim that data collection cost too much. Go through the list of benefits and back it up by data showing that estimation/project success increases when using a historical database/calibrated models.  E.g. tell your manager for instance “software metrics will help us reduce the number of faults reported in newly developed software by 25% without increasing project schedules. The resulting savings in support costs should drive a 150% ROI in the first year”.  Cost of data collection 2: some developers will claim they are not paid to collect data. Determine their claimed CMM/CMMI rating. If it is 3 or higher, collecting data is required. Ask for that data in their format and offer to fill in the forms yourself.
  • 9. Data Collection Process – Lessons Learned  Use a good code counter. See the list of code counters on the QSM.com site. The ‘understand’ code counter is also used quite often in companies.  Be sure to discriminate auto-generated code from hand generated code. Auto-generated code does not have the same correlation to effort as hand generated.  Collect completed project actuals first: Start by collecting data from completed projects first and THEN collect from projects that are still underway.
  • 10. Data Collection Process – Lessons Learned  Qualify the data quality: Some data collected will be nonsensical. There are 2 approaches to handle this:  Eliminate this data altogether. (not really recommended as data is lost)  Include a qualifier on the data rating it ‘a’ to ‘f’. The ISBSG database has a rating similar to this.  Capture both total size and amount of reuse: Reuse is an essential part of software size. Just collecting total size will skew the size/effort correlation.  Don’t eliminate data points just because of the programming language: size can be converted from one language to another!
  • 11. Data Collection Process – Lessons Learned  Have a normalization process and keep the data both in raw and normalized forms.  Data will be collected in varying phases, labor categories, size definitions etc. Keep the raw data. And have a standard, well documented normalization process that is rigorously followed to normalize to a standard set of activities, phases etc.  Have a structure for data storage: An excel sheet can be used but will become unworkable as the database grows. Get the data into an open database asap.  Offer them something in return: this could be a sanitized copy of the database or at least a benchmark showing how data fits with the rest of the database!
  • 12. Data Integrity  Good quality data is paramount to ensure good calibration results.
  • 13. Data Integrity - Checklist  Review the goal of the data collection  What is the data being used for? E.g. project type calibration, later use for estimation by analogy etc. This drives the data being collected.  Ensure the integrity of the data collection process  Have the groups providing data been trained with regard to the required data?  Definitions  Are different projects providing data using the same data definitions?
  • 14. Data Integrity - Checklist  Approval of Inputs  Has at least one designated individual approved the inputs for each project?  Missing Data  Has any missing data been identified?  Estimates/Actuals  Are estimates of data items used in place of missing actual data?  Rationale  Provide written rationale for any estimates used in the calibration
  • 15. Data Integrity - Checklist  Sensitivity Analysis  If estimates are used in lieu of actuals, has a sensitivity analysis been done to evaluate the impact on the calibration of varying assumptions with respect to the estimates?  Extra Data  Has any extra data or different definitions been used?  Changes  Describe any changes made and the rationale for them.
  • 16. Data Integrity - Checklist  Additional Data  Has any additional data been collected that can be used for later purposes?  Identify the extra data and how it might be used. Examples include effort and schedule portions for detailed phases and activities.  Size Conversion  Have all size measures been converted to eSLOC or another base unit?
  • 17. Data Integrity - Checklist  Counting Conventions  What SLOC counting conventions were followed (logical SLOC, physical SLOC etc)?  If SLOC is not used, what definitions were followed (such as IFPUG 4.2 standard, use cases 2.0)  Reuse  Are all reuse parameters provided for reused, modified and COTS software portions?  Has all reuse and modification been accounted for and converted into equivalent SLOC?
  • 18. Data Integrity - Checklist  Reused/Modified  Does the total equivalent size include all new software and the equivalent sizes of reused and modified software?  Evolution  Has Requirements Evolution been reported?  Input Ranges  Make sure that there are no ranges in the volume input, as that would indicate previously estimated values.  Factors  Has the environment and scaling factors been updated?  Hours per Month  Has the correct HPM been applied?
  • 19. Data Integrity – Additional Tips  Actual Phase Information: all activities may NOT be included. E.g. system concept and integration is excluded.  Actual Labor Information: all activities may NOT be included. E.g. configuration and quality assurance is excluded.  Was the schedule ‘stop and start’?  Resources: where there hard-hitting resource constraints?  Volatility: did requirements undergo extraordinary evolution?  Manager’s objectives: was the project to complete in ‘minimum time’ or ‘least cost’?  Effort: are effort figures actually derived from cost figures?  Always run sanity checks on data. E.g. one million lines of code cannot be developed in 3 months.