SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Data Quality
Issues and “Fixes”
Two Definitions of Quality
• Conformance to Requirements
• (Traditional Producer-Oriented
  Definition)
• Fitness for Use
• (Modern Client-Oriented
  Definition)
Definition of Process Quality
• Process Improvements Focus
• (Do It Right the First Time)
• Can be Reduced to Slogans
• Can also lead to Continuous
  Improvements
• Kaisen
Be Real Four Quality Costs
• Costs of Reputation and Loss of
  Business from Inaction
• Cost of Prevention to Avoid Errors
• Cost of Detection to Find Errors
• Cost of Repairing Errors Found
Quality and Cost 2 Worlds
Repair Methods
• Goal is “Fixing” to Fit Use
• Data Editing
• Data Imputation
• Data Fabrication
• Raking at NSS
Data Editing
• Honest Differences of Opinion or
  Real Errors?
• Need for Redundancy in System for
  Can’t Fail Items
• Achieving Measurability to Frame
  Expectations and Improvements
Data Editing Techniques
•   Minimizing Processing Errors
•   Definitional (e.g., Range) Tests
•   Deterministic Tests
•   Probabilistic Tests
    – Outlier Tests
    – Ratio Tests
Types of Edits Illustrated
• Range Test
    Age Negative
• Deterministic Tests
    If Age =14, then code as Child
• Probabilistic Tests
    If Income $1,000,000, take a look
Practical Editing Tips
• Edit for Diagnosis, not just
  Correction
• Don’t Edit Outside Your Confidence
  Interval
• Preserve the Original Dataset as
  Backup to Avoid Irreversible
  Changes
• Keep Tallies of all Errors Found
Not all errors need to be
        corrected
  Resist your Perfectionist
         Tendencies
More Practical Edit Tips
• Use your skilled staff to
  improve system rather than
  just edit data
• Never just depend on Intuition
  but still use it too!
• Employ Redundancy, Frugally!
Capture Recapture Methods
    (Double Keying Example)
• Two-by-Two Table with Cells
                A   B
                C   D
• Comparing Data Keyed the Same each
  time (A) with Errors Detected, (B and C)
• How to Estimate D?
• One Model D = BC/A?
Bottom Line Take-Away
• Use Data Checking to
  Understand Data’s Fitness for
  Use
• Edit but Don’t Over-Edit
• Use Edit Checks to Prevent
  Future Errors
Data Editing and Data
        Imputation
• Joint Role of Imputation and
  Editing No Clear Line?
• Editing “fixes” Often are
  Model-Based Hunches
• Data Quality (editing)
• Information Quality
  (imputation)

Weitere ähnliche Inhalte

Was ist angesagt?

Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanKirti Bhushan
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
E&P data management: Implementing data standards
E&P data management: Implementing data standardsE&P data management: Implementing data standards
E&P data management: Implementing data standardsETLSolutions
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningTechWell
 
20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)Ruud Kapteijn
 
Data migration methodology_for_sap_v01a
Data migration methodology_for_sap_v01aData migration methodology_for_sap_v01a
Data migration methodology_for_sap_v01aAbhaya Sarangi
 
Introducing SPSS customer overview
Introducing SPSS customer overviewIntroducing SPSS customer overview
Introducing SPSS customer overviewebuc
 
Audit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data AnalyticsAudit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data AnalyticsCaseWare IDEA
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarellitruongthuthuy47
 
Why You Need to STOP Using Spreadsheets for Audit Analysis
Why You Need to STOP Using Spreadsheets for Audit AnalysisWhy You Need to STOP Using Spreadsheets for Audit Analysis
Why You Need to STOP Using Spreadsheets for Audit AnalysisCaseWare IDEA
 
DMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it rightDMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it rightETLSolutions
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
IDEA 10.3 Launch Webinar
IDEA 10.3 Launch WebinarIDEA 10.3 Launch Webinar
IDEA 10.3 Launch WebinarCaseWare IDEA
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your auditsCaseWare IDEA
 
IPT Tools 2
IPT Tools 2IPT Tools 2
IPT Tools 2MR Z
 
Introduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for AuditorsIntroduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for AuditorsCaseWare IDEA
 

Was ist angesagt? (20)

Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
E&P data management: Implementing data standards
E&P data management: Implementing data standardsE&P data management: Implementing data standards
E&P data management: Implementing data standards
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the Planning
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)
 
Data analysis
Data analysisData analysis
Data analysis
 
Data migration methodology_for_sap_v01a
Data migration methodology_for_sap_v01aData migration methodology_for_sap_v01a
Data migration methodology_for_sap_v01a
 
Introducing SPSS customer overview
Introducing SPSS customer overviewIntroducing SPSS customer overview
Introducing SPSS customer overview
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Audit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data AnalyticsAudit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data Analytics
 
Preparing Your Data for ECM
Preparing Your Data for ECMPreparing Your Data for ECM
Preparing Your Data for ECM
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli
 
Why You Need to STOP Using Spreadsheets for Audit Analysis
Why You Need to STOP Using Spreadsheets for Audit AnalysisWhy You Need to STOP Using Spreadsheets for Audit Analysis
Why You Need to STOP Using Spreadsheets for Audit Analysis
 
DMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it rightDMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it right
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
IDEA 10.3 Launch Webinar
IDEA 10.3 Launch WebinarIDEA 10.3 Launch Webinar
IDEA 10.3 Launch Webinar
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your audits
 
IPT Tools 2
IPT Tools 2IPT Tools 2
IPT Tools 2
 
Introduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for AuditorsIntroduction to CaseWare IDEA - Designed by Auditors for Auditors
Introduction to CaseWare IDEA - Designed by Auditors for Auditors
 

Ähnlich wie Data Quality: Issues and Fixes

Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesPeter Varhol
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesPeter Varhol
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
Quality Knowledge, Certification, ASQ
Quality Knowledge, Certification, ASQQuality Knowledge, Certification, ASQ
Quality Knowledge, Certification, ASQJohn Karlin RN
 
ACC presentation for QA Club Kiev
ACC presentation for QA Club KievACC presentation for QA Club Kiev
ACC presentation for QA Club KievNikita Knysh
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learningShishir Choudhary
 
Amp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataAmp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataTechWell
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxellamangapis2003
 
Darim's Synagogue Data Series, Part 3
Darim's Synagogue Data Series, Part 3Darim's Synagogue Data Series, Part 3
Darim's Synagogue Data Series, Part 3Idealware
 
Data Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesData Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesCarl Anderson
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data WarehousingDavide Mauri
 
How to choose the right Martech stack and Data for your organization
How to choose the right Martech stack and Data for your organization How to choose the right Martech stack and Data for your organization
How to choose the right Martech stack and Data for your organization DemandGen
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersMahmoud Jalajel
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers ProductFolks
 
Oracle performance project public
Oracle performance project publicOracle performance project public
Oracle performance project publicCarlos Oliveira
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 Georgina Tilby
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptxkpcp
 
Root cause analysis arg sc
Root cause analysis arg scRoot cause analysis arg sc
Root cause analysis arg scManish Chaurasia
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesPeter Varhol
 

Ähnlich wie Data Quality: Issues and Fixes (20)

Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Quality Knowledge, Certification, ASQ
Quality Knowledge, Certification, ASQQuality Knowledge, Certification, ASQ
Quality Knowledge, Certification, ASQ
 
ACC presentation for QA Club Kiev
ACC presentation for QA Club KievACC presentation for QA Club Kiev
ACC presentation for QA Club Kiev
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
Amp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataAmp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test Data
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Darim's Synagogue Data Series, Part 3
Darim's Synagogue Data Series, Part 3Darim's Synagogue Data Series, Part 3
Darim's Synagogue Data Series, Part 3
 
Data Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesData Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practices
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
 
How to choose the right Martech stack and Data for your organization
How to choose the right Martech stack and Data for your organization How to choose the right Martech stack and Data for your organization
How to choose the right Martech stack and Data for your organization
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product Managers
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers
 
Oracle performance project public
Oracle performance project publicOracle performance project public
Oracle performance project public
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
 
Root cause analysis arg sc
Root cause analysis arg scRoot cause analysis arg sc
Root cause analysis arg sc
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps Practices
 

Mehr von CRRC-Armenia

CRRC Data Initiative 2009
CRRC Data Initiative 2009CRRC Data Initiative 2009
CRRC Data Initiative 2009CRRC-Armenia
 
Social Networking Sites
Social Networking SitesSocial Networking Sites
Social Networking SitesCRRC-Armenia
 
Presentation on Corruption Survey Results
Presentation on Corruption Survey ResultsPresentation on Corruption Survey Results
Presentation on Corruption Survey ResultsCRRC-Armenia
 
Towards The Result Based Utility Sector In Armenia
Towards The Result Based Utility Sector In ArmeniaTowards The Result Based Utility Sector In Armenia
Towards The Result Based Utility Sector In ArmeniaCRRC-Armenia
 
The Number and Reintegration of Armenian Migrants Returned to Homeland from t...
The Number and Reintegration of Armenian Migrants Returned to Homeland from t...The Number and Reintegration of Armenian Migrants Returned to Homeland from t...
The Number and Reintegration of Armenian Migrants Returned to Homeland from t...CRRC-Armenia
 
Domestically Issued Public Debt As A Sustainable Alternative
Domestically Issued Public Debt As A Sustainable AlternativeDomestically Issued Public Debt As A Sustainable Alternative
Domestically Issued Public Debt As A Sustainable AlternativeCRRC-Armenia
 
Corruption Survey of Enterprises 2009
Corruption Survey of Enterprises 2009Corruption Survey of Enterprises 2009
Corruption Survey of Enterprises 2009CRRC-Armenia
 
Western Writing And Publishing Workshop
Western Writing And Publishing WorkshopWestern Writing And Publishing Workshop
Western Writing And Publishing WorkshopCRRC-Armenia
 
eResources for Research
eResources for ResearcheResources for Research
eResources for ResearchCRRC-Armenia
 
Humanitarian Intervention
Humanitarian InterventionHumanitarian Intervention
Humanitarian InterventionCRRC-Armenia
 
Household Corruption Survey 2009
Household Corruption Survey 2009Household Corruption Survey 2009
Household Corruption Survey 2009CRRC-Armenia
 
State of Armenian Irregular Migrants in Turkey
State of Armenian Irregular Migrants in TurkeyState of Armenian Irregular Migrants in Turkey
State of Armenian Irregular Migrants in TurkeyCRRC-Armenia
 
IMF Regional Economic Outlook for the Caucasus and Central Asia
IMF Regional Economic Outlook for the Caucasus and Central AsiaIMF Regional Economic Outlook for the Caucasus and Central Asia
IMF Regional Economic Outlook for the Caucasus and Central AsiaCRRC-Armenia
 
The Sources and Uses of Survey Data on Armenia
The Sources and Uses of Survey Data on ArmeniaThe Sources and Uses of Survey Data on Armenia
The Sources and Uses of Survey Data on ArmeniaCRRC-Armenia
 
Institutional Sources of Corruption in the Case of Armenia
Institutional Sources of Corruption in the Case of ArmeniaInstitutional Sources of Corruption in the Case of Armenia
Institutional Sources of Corruption in the Case of ArmeniaCRRC-Armenia
 
Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...
Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...
Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...CRRC-Armenia
 
Civil Society and Corruption
Civil Society and CorruptionCivil Society and Corruption
Civil Society and CorruptionCRRC-Armenia
 
Freedom of Expression and Censorship in Armenia
Freedom of Expression and Censorship in ArmeniaFreedom of Expression and Censorship in Armenia
Freedom of Expression and Censorship in ArmeniaCRRC-Armenia
 

Mehr von CRRC-Armenia (20)

CRRC Data Initiative 2009
CRRC Data Initiative 2009CRRC Data Initiative 2009
CRRC Data Initiative 2009
 
Social Networking Sites
Social Networking SitesSocial Networking Sites
Social Networking Sites
 
Presentation on Corruption Survey Results
Presentation on Corruption Survey ResultsPresentation on Corruption Survey Results
Presentation on Corruption Survey Results
 
Towards The Result Based Utility Sector In Armenia
Towards The Result Based Utility Sector In ArmeniaTowards The Result Based Utility Sector In Armenia
Towards The Result Based Utility Sector In Armenia
 
The Number and Reintegration of Armenian Migrants Returned to Homeland from t...
The Number and Reintegration of Armenian Migrants Returned to Homeland from t...The Number and Reintegration of Armenian Migrants Returned to Homeland from t...
The Number and Reintegration of Armenian Migrants Returned to Homeland from t...
 
Domestically Issued Public Debt As A Sustainable Alternative
Domestically Issued Public Debt As A Sustainable AlternativeDomestically Issued Public Debt As A Sustainable Alternative
Domestically Issued Public Debt As A Sustainable Alternative
 
Corruption Survey of Enterprises 2009
Corruption Survey of Enterprises 2009Corruption Survey of Enterprises 2009
Corruption Survey of Enterprises 2009
 
CRRC Armenia
CRRC Armenia CRRC Armenia
CRRC Armenia
 
Western Writing And Publishing Workshop
Western Writing And Publishing WorkshopWestern Writing And Publishing Workshop
Western Writing And Publishing Workshop
 
eResources for Research
eResources for ResearcheResources for Research
eResources for Research
 
Humanitarian Intervention
Humanitarian InterventionHumanitarian Intervention
Humanitarian Intervention
 
Household Corruption Survey 2009
Household Corruption Survey 2009Household Corruption Survey 2009
Household Corruption Survey 2009
 
State of Armenian Irregular Migrants in Turkey
State of Armenian Irregular Migrants in TurkeyState of Armenian Irregular Migrants in Turkey
State of Armenian Irregular Migrants in Turkey
 
IMF Regional Economic Outlook for the Caucasus and Central Asia
IMF Regional Economic Outlook for the Caucasus and Central AsiaIMF Regional Economic Outlook for the Caucasus and Central Asia
IMF Regional Economic Outlook for the Caucasus and Central Asia
 
Rewire Your Brain
Rewire Your BrainRewire Your Brain
Rewire Your Brain
 
The Sources and Uses of Survey Data on Armenia
The Sources and Uses of Survey Data on ArmeniaThe Sources and Uses of Survey Data on Armenia
The Sources and Uses of Survey Data on Armenia
 
Institutional Sources of Corruption in the Case of Armenia
Institutional Sources of Corruption in the Case of ArmeniaInstitutional Sources of Corruption in the Case of Armenia
Institutional Sources of Corruption in the Case of Armenia
 
Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...
Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...
Chicken or the Hen Dilemma or Understanding of the Perception of Corruption i...
 
Civil Society and Corruption
Civil Society and CorruptionCivil Society and Corruption
Civil Society and Corruption
 
Freedom of Expression and Censorship in Armenia
Freedom of Expression and Censorship in ArmeniaFreedom of Expression and Censorship in Armenia
Freedom of Expression and Censorship in Armenia
 

Data Quality: Issues and Fixes

  • 2. Two Definitions of Quality • Conformance to Requirements • (Traditional Producer-Oriented Definition) • Fitness for Use • (Modern Client-Oriented Definition)
  • 3. Definition of Process Quality • Process Improvements Focus • (Do It Right the First Time) • Can be Reduced to Slogans • Can also lead to Continuous Improvements • Kaisen
  • 4. Be Real Four Quality Costs • Costs of Reputation and Loss of Business from Inaction • Cost of Prevention to Avoid Errors • Cost of Detection to Find Errors • Cost of Repairing Errors Found
  • 5. Quality and Cost 2 Worlds
  • 6. Repair Methods • Goal is “Fixing” to Fit Use • Data Editing • Data Imputation • Data Fabrication • Raking at NSS
  • 7. Data Editing • Honest Differences of Opinion or Real Errors? • Need for Redundancy in System for Can’t Fail Items • Achieving Measurability to Frame Expectations and Improvements
  • 8. Data Editing Techniques • Minimizing Processing Errors • Definitional (e.g., Range) Tests • Deterministic Tests • Probabilistic Tests – Outlier Tests – Ratio Tests
  • 9. Types of Edits Illustrated • Range Test Age Negative • Deterministic Tests If Age =14, then code as Child • Probabilistic Tests If Income $1,000,000, take a look
  • 10. Practical Editing Tips • Edit for Diagnosis, not just Correction • Don’t Edit Outside Your Confidence Interval • Preserve the Original Dataset as Backup to Avoid Irreversible Changes • Keep Tallies of all Errors Found
  • 11. Not all errors need to be corrected Resist your Perfectionist Tendencies
  • 12. More Practical Edit Tips • Use your skilled staff to improve system rather than just edit data • Never just depend on Intuition but still use it too! • Employ Redundancy, Frugally!
  • 13. Capture Recapture Methods (Double Keying Example) • Two-by-Two Table with Cells A B C D • Comparing Data Keyed the Same each time (A) with Errors Detected, (B and C) • How to Estimate D? • One Model D = BC/A?
  • 14. Bottom Line Take-Away • Use Data Checking to Understand Data’s Fitness for Use • Edit but Don’t Over-Edit • Use Edit Checks to Prevent Future Errors
  • 15. Data Editing and Data Imputation • Joint Role of Imputation and Editing No Clear Line? • Editing “fixes” Often are Model-Based Hunches • Data Quality (editing) • Information Quality (imputation)
  • 16. Imputation Versus Editing • What is Imputation? • Handles Missing and Misreported Data • Imputation Goal is roughly right! Information Quality • Editing Goal often “correction” Exactly right? Data Quality
  • 17. Data Imputation Techniques • Imputation Needs More Justification when Data Quality is the Goal • Must be no more than Cosmetic in Nature, if done at all • Can only be Aggressively applied for Information Quality Goal
  • 18. Fellegi-Holt Example • Identify Errors with Automated Edit Detection Software • Hot Deck acceptable values from Records that Pass Edits • Can be worth doing if errors are minor or cosmetic (e.g., Rounding)
  • 19. More on Imputation • Treat Influential Errors Individually not just Automatically • That Said, Software Fixes can lead to Better Documentation (Paradata Matters) • Need to Measure Variance Impacts • Provide a natural break to Overediting but seldom used for this.
  • 20. Edit/Imputation Summary • Most Editing Mainly Eliminates the Bad • Replacing it with a (Good?)Guess of some Sort • Imputation emphasizes Guessing even more
  • 21. More Editing/Imputation • Best Imputation Practice tries to quantify Guessing impact on Information Quality • Editing has not improved as much as Imputation • Editing/Imputation needs more Joint Theory, especially to Measure and Use Mean Square Error Impacts
  • 22. First Illustrative Example • Fabrication/Falsification • Illustrate the General Points about Editing and Imputation • Emphasize Importance of Fabrication threat to Quality
  • 23. Fabrication/Falsification • Respondent/Interviewer Make up Data • How Common? • How to Reduce? • How to Detect?
  • 24. Right Structure Right Resources • Examine Practice Elsewhere? • www.amstat.org Website • Key is right incentives • Good staff/training • But Eternal Vigilance
  • 25. Second Illustration • Raking Application at NSS • To link up to Next Talk • To illustrate Information Quality that is fit for use despite Data Quality
  • 26. Raking Quality “Fix” • What is Raking? • How does it improve quality? Not Data Quality But Information Quality • Sometimes both -- Better Point Estimates More Stable (smaller variances)
  • 27. Quality Summary • Editing Data Quality • Imputation Information Quality • Raking Information Quality • Fabrication Can Harm Both • Must be guarded against always
  • 28. Almost Done Now • Tried to Stay Practical, with a Frank Discussion of Key Weaknesses in Current Practice • Deeper Understanding of Data Quality • But at an Applied Level
  • 29. ÞÝáñѳϳÉáõ ÃÛáõ Ý Fritz Scheuren Scheuren@aol.com